Distributional Semantics


The representation of word meaning in texts is a central problem in Computational Linguistics. When language learning techonologies are applied to generalize linguistic observations of the target phenomena, the information on the meaning of words plays a crucial role in the quality of the underlying statistical models. When the availability of training data is scarce, pure lexical information can be affected by data sparseness and a generalization is then needed.

The distributional analysis of large-scale corpora is the instrument we apply to acquire and generalize lexical information. It represents a general learning algorithm providing an effective lexical generalization without a strong dependency from hand built resources. In this view, words are modeled according to a geometrical perspective, i.e. points in a high-dimensional space, in a way that similar or related concepts are near in the space.


Roberto BasiliDanilo Croce

Related Projects

Wordspace page


SAG Publications

Danilo Croce, Valerio Storch, Paolo Annesi, Roberto Basili (2012): Distributional Compositional Semantics and Text Similarity. In: ICSC, pp. 242-249, 2012.

Danilo Croce, Simone Filice, Roberto Basili (2012): Distributional Models and Lexical Semantics in Convolution Kernels. In: CICLing (1), pp. 336-348, 2012.

Paolo Annesi, Valerio Storch, Roberto Basili (2012): Space projections as distributional models for semantic composition. In: Computational Linguistics and Intelligent Text Processing, pp. 323–335, Springer Berlin Heidelberg, 2012.

Danilo Croce, Alessandro Moschitti, Roberto Basili, Martha Palmer (2012): Verb Classification using Distributional Similarity in Syntactic and Semantic Structures. In: ACL (1), pp. 263-272, 2012.

Roberto Basili, Danilo Croce, Cristina Giannone, Diego De Cao (2010): Acquiring IE Patterns through Distributional Lexical Semantic Models. In: CICLing, pp. 512-524, 2010.

Roberto Basili, Marco Pennacchiotti (2010): Distributional lexical semantics: Toward uniform representation paradigms for advanced acquisition and processing tasks. In: Natural Language Engineering, 16 (4), pp. 347–358, 2010.

Diego De Cao, Roberto Basili (2009): Combining distributional and paradigmatic information in a lexical substitution task. In: Proceedings of EVALITA workshop, 11th Congress of Italian Association for Artificial Intelligence, 2009.

Marco Pennacchiotti, Diego De Cao, Roberto Basili, Danilo Croce, Michael Roth (2008): Automatic induction of FrameNet lexical units. In: EMNLP, pp. 457-465, 2008.

Marco Pennacchiotti, Diego De Cao, Paolo Marocco, Roberto Basili (2008): Towards a Vector Space Model for FrameNet-like Resources.. In: LREC, 2008.