Project Description
The system targeted in the PROGRESS-IT project, funded by Regione Lazio (FILAS-CR-2011-1089) makes use of some of the results of the European INSEARCH EU project whose focus is the design and development of a useful Search Platform for the SMEs. The PROGRESS-IT platform automatically collects documents expressing Project Ideas, Organization descriptions, Grants, Patents, Scientific Papers and Work Programs. Three domains have been targeted: Aerospace, ICT and Security. The system thus allows to access this huge amount of information through Standard or Advanced Information Retrieval techniques.
From one hand the quality of the Keyword Search have been improved by applying Distributional models of Lexical Semantics in order to improve the quality of the ranking function or providing effective query expansion. Moreover, an advanced query set has been defined and several Dashboards have been designed to enable richer queries avoiding the complexity of their definition. These advanced query workflows rely on a (record-like) documents structures, providing different textual fields written in natural language. The choice of the best representative fields among the ones provided by each document type structure has been carried out. As an example, let us consider a user interested in searching useful Project Ideas or Grants for a given Organization. We investigated the possibility to use the organization description as the best representative field for the description of the organization itself.
Advanced Information retrieval is made available for exploratory search by the analysts based on a Structured Semantic Textual Similarity function, discussed in (Croce et al, 2013), as a fine-grain measure of the relatedness between two homogeneous or heterogeneous texts. The resulting system is based on Solr, i.e. an open source enterprise search platform from the Apache Lucene project. Progress-IT inherits the Solr advantages, i.e. it is highly reliable, scalable and fault tolerant, providing distributed indexing, replication and load-balanced querying, automated failover and recovery, centralized configuration and more.
People
Roberto Basili, Danilo Croce, Valerio Storch
SAG Publications
