Distributional Semantics

Introduction

The representation of word meaning in texts is a central problem in Computational Linguistics. When language learning techonologies are applied to generalize linguistic observations of the target phenomena, the information on the meaning of words plays a crucial role in the quality of the underlying statistical models. When the availability of training data is scarce, pure lexical information can be affected by data sparseness and a generalization is then needed.

The distributional analysis of large-scale corpora is the instrument we apply to acquire and generalize lexical information. It represents a general learning algorithm providing an effective lexical generalization without a strong dependency from hand built resources. In this view, words are modeled according to a geometrical perspective, i.e. points in a high-dimensional space, in a way that similar or related concepts are near in the space.


People

Roberto BasiliDanilo Croce


Related Projects

Wordspace page


References

SAG Publications

Danilo Croce, Valerio Storch, Paolo Annesi, Roberto Basili (2012): Distributional Compositional Semantics and Text Similarity. In: ICSC, pp. 242-249, 2012.

Danilo Croce, Simone Filice, Roberto Basili (2012): Distributional Models and Lexical Semantics in Convolution Kernels. In: CICLing (1), pp. 336-348, 2012.

Paolo Annesi, Valerio Storch, Roberto Basili (2012): Space projections as distributional models for semantic composition. In: Computational Linguistics and Intelligent Text Processing, pp. 323–335, Springer Berlin Heidelberg, 2012.

Danilo Croce, Alessandro Moschitti, Roberto Basili, Martha Palmer (2012): Verb Classification using Distributional Similarity in Syntactic and Semantic Structures. In: ACL (1), pp. 263-272, 2012.

Roberto Basili, Danilo Croce, Cristina Giannone, Diego De Cao (2010): Acquiring IE Patterns through Distributional Lexical Semantic Models. In: CICLing, pp. 512-524, 2010.

Roberto Basili, Marco Pennacchiotti (2010): Distributional lexical semantics: Toward uniform representation paradigms for advanced acquisition and processing tasks. In: Natural Language Engineering, 16 (4), pp. 347–358, 2010.

Diego De Cao, Roberto Basili (2009): Combining distributional and paradigmatic information in a lexical substitution task. In: Proceedings of EVALITA workshop, 11th Congress of Italian Association for Artificial Intelligence, 2009.

Marco Pennacchiotti, Diego De Cao, Roberto Basili, Danilo Croce, Michael Roth (2008): Automatic induction of FrameNet lexical units. In: EMNLP, pp. 457-465, 2008.

Marco Pennacchiotti, Diego De Cao, Paolo Marocco, Roberto Basili (2008): Towards a Vector Space Model for FrameNet-like Resources.. In: LREC, 2008.