Distributional Analysis of Unstructured Data

Introduction

The distributional analysis of large-scale corpora is the instrument we apply to acquire and generalize lexical information, as discussed here. Once the lexical semantics is captured by a high-dimensional Word Space, the space topology can be exploited to acquire more expressive representations.

Non-linear dimensionality reduction techniques have been defined in literature to employ effective and efficient algorithms for constructing nonlinear low-dimensional manifolds from sample data points embedded in high-dimensional spaces.

The main idea is to employ effective and efficient algorithms for constructing nonlinear low-dimensional manifolds from sample data points embedded in high-dimensional spaces. Several algorithms are defined, including Isometric feature mapping (ISOMAP) (Tenenbaum et al., 2000), Locally Linear Embedding (LLE) (Roweis and Saul, 2000), Local Tangent Space alignment (LTSA) (Zhang and Zha, 2004) and Locality Preserving Projection (LPP) (He and Niyogi, 2003) and they have been successfully applied in several computer vision and pattern recognition problems.

 


People

Roberto BasiliDanilo Croce


References

S.T. Roweis and L.K. Saul. 2000. Nonlinear dimen- sionality reduction by locally linear embedding. Sci- ence, 290(5500):2323–2326.

J. B. Tenenbaum, V. Silva, and J. C. Langford. 2000. A Global Geometric Framework for Nonlinear Di- mensionality Reduction. Science, 290(5500):2319– 2323.

Xiaofei He and Partha Niyogi. 2003. Locality preserv- ing projections. In Proceedings of NIPS03, Vancou- ver, Canada.

Zhenyue Zhang and Hongyuan Zha. 2004. Princi- pal manifolds and nonlinear dimensionality reduc- tion via tangent space alignment. SIAM J. Scientific Computing, 26(1):313–338.

 


SAG Publications

Danilo Croce, Daniele Previtali (2010): Manifold learning for the semi-supervised induction of framenet predicates: An empirical investigation. In: Proceedings of the 2010 Workshop on GEometrical Models of Natural Language Semantics, pp. 7–16, Association for Computational Linguistics 2010.