Semantic Search

Introduction

Modern systems in Information Technology need to access the huge amount of information that is stored and constantly produced in the Web. Most human knowledge is represented and expressed using language and the proper application of Natural Language Processing (NLP) techniques is crucial in exploiting such data.

Traditionally, Information Retrieval (IR) dealt with representation, storage, organization of, and access to information items, e.g. documents, as described in [van Rijsbergen(1979), Baeza-Yates and Ribeiro-Neto(1999)]. According to this view, the most common approach of IR was processing the meaning of documents just exploiting the word occurrences. Even though these fully lexicalized models are well established, in recent years syntactic and semantic structures expressing richer linguistic information are becoming essential in tackling complex IR tasks; these include Open Domain Question Classification shown in the IBM Watson System. As the main objective of a Question/Answering system is to automatically answer a question posed in natural language or retrieving that answer in a document collection, the key problem is that fine-grained phenomena are targeted, and lexical information alone is not sufficient.

The capabilities of the pure lexicalized  models of retrieval do not always provide a robust solution to these real retrieval needs. In this group several approaches and paradigms are studied in order to improve the quality of the Retrieval System in order to enable a more Semantic Search. 


People

Roberto BasiliDanilo CroceDiego De Cao,  Valerio Storch


Related Projects

InSearch


References

[van Rijsbergen(1979)] van Rijsbergen, C. J. (1979). “Information Retrieval. Butterworth“.

[Baeza-Yates and Ribeiro-Neto(1999)] Baeza-Yates, R. A. and Ribeiro-Neto, B. (1999). “Modern Information Retrieval“. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA.


SAG Publications

  • Diego De Cao, Valerio Storch, Danilo Croce, Roberto Basili (2013): INSEARCH: A Platform for Enterprise Semantic Search. In: IIR, pp. 104-115, 2013.
  • Roberto Basili, Armando Stellato, Daniele Previtali, Paolo Salvatore, Jorg Wurzer (2012): Innovation-Related Enterprise Semantic Search: The INSEARCH Experience. In: Semantic Computing (ICSC), 2012 IEEE Sixth International Conference on, pp. 194–201, IEEE 2012.
  • Diego De Cao, Roberto Basili, Matteo Luciani, Francesco Mesiano, Riccardo Rossi (2010): Robust and efficient page rank for word sense disambiguation. In: Proceedings of the 2010 Workshop on Graph-based Methods for Natural Language Processing, pp. 24–32, Association for Computational Linguistics 2010.
  • Silvia Quarteroni, Alessandro Moschitti, Suresh Manandhar, Roberto Basili (2007): Advanced structural representations for question classification and answer re-ranking. In: Advances in Information Retrieval, pp. 234–245, Springer Berlin Heidelberg, 2007.