Semantic Search


Modern systems in Information Technology need to access the huge amount of information that is stored and constantly produced in the Web. Most human knowledge is represented and expressed using language and the proper application of Natural Language Processing (NLP) techniques is crucial in exploiting such data.

Traditionally, Information Retrieval (IR) dealt with representation, storage, organization of, and access to information items, e.g. documents, as described in [van Rijsbergen(1979), Baeza-Yates and Ribeiro-Neto(1999)]. According to this view, the most common approach of IR was processing the meaning of documents just exploiting the word occurrences. Even though these fully lexicalized models are well established, in recent years syntactic and semantic structures expressing richer linguistic information are becoming essential in tackling complex IR tasks; these include Open Domain Question Classification shown in the IBM Watson System. As the main objective of a Question/Answering system is to automatically answer a question posed in natural language or retrieving that answer in a document collection, the key problem is that fine-grained phenomena are targeted, and lexical information alone is not sufficient.

The capabilities of the pure lexicalized  models of retrieval do not always provide a robust solution to these real retrieval needs. In this group several approaches and paradigms are studied in order to improve the quality of the Retrieval System in order to enable a more Semantic Search. 


Roberto BasiliDanilo CroceDiego De Cao,  Valerio Storch

