In machine learning, Kernel Methods are a class of algorithms for pattern analysis such as Support Vector Machine or Online Learning algorithms. A kernel function allows to express the similarity between two objects, that are explanatory of a target problem, in rich representation spaces. They implicitly map an example in a new (richer) feature space, where examples could become separable exploiting kernel functions and the so called “kernel trick”. It means that a kernel function enable to compute the inner product in a richer (implicit) space without ever computing the coordinates of the data in that space. This operation is often computationally cheaper than the explicit computation of the coordinates. Kernel functions have been introduced for sequence data, graphs, text, images, as well as vectors.
In NLP kernel functions have been employed in order to provide a statistical model able to separate the problem representation from the learning algorithm, for example Sequence Kernels or Tree Kernels. The main idea of this kind of functions is that the algorithm can effectively learn the target phenomenon by only focusing on the notion of similarity among observations that are not fully expressed in the representation. A linguistic phenomenon can thus be modeled at a more abstract level making the modeling process easier.
Christopher M. Bishop. 2006. Pattern Recognition and Machine Learning (Information Science and Statistics). Springer-Verlag New York, Inc., Secaucus, NJ, USA.
Michael Collins and Nigel Duffy. 2002. New ranking algorithms for parsing and tagging: kernels over discrete structures, and the voted perceptron. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics (ACL ’02). Stroudsburg, PA, USA, 263-270