Machine Learning

Statistical learning methods make the assumption that lexical or grammatical observations are useful hints for modeling different semantic inferences. Linguistic observations provide features for a learning method that are generalized into predictive components in the final model, induced from the training examples. In (Mitchell, 1997), Tom Mitchell provided an interesting definition of a learning program:

A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.

In Natural Language Processing (NLP) such formulation allows to define learning systems that can be applied to Software Engineering. In particular:

  • T represents a linguistic task, usually an interpretation, such as in semantic annotation or document classification tasks. For example, in the Document Classification task, texts are mapped to a set of classes that characterize the document topics, e.g. a document refers to sport, economics or politics. The objective is this the acquisition (from data) of a function y = f(x) that is able to associate each text x to is corresponding class y.
  • P represents the performances, thus measuring the quality of the resulting interpretation power. It depends on the task objectives and the learning system requirements. For example, if one is interested in the quality of a document classification system, the accuracy score can be employed as the percentage of correctly classified texts. However, if the learning algorithm improves the performance according to other aspects, e.g. the time needed for classification or the resource requirements of the produced learning system, other performance measures can be employed.
  • E is represented by data as observations available about the target task. The idea is that a learning system exploits such information in order to acquire competences to resolve the target problem; the more information are observed, the highest are the performances P to solve the task T. In the classification task, experience is provided by the document themselves that are examples x providing different aspects of the target problem in terms of linguistic observations, such as lexical, grammatical or syntactic information.

Different Machine Learning algorithms exist in order to exploit data evidences and acquire a model of the target task, as discussed in (Bishop, 2006). In the results, the Support Vector Machine (SVM) learning algorithm, discussed in (Vapnik V. N., 1998) and (Basili & Moschitti, 2005), will be employed as it provides an effectively learning paradigm to satisfy our objectives. SVMs can be thought of as methods for constructing classifiers with theoretical guarantees of good predictive performance in terms of the quality of classification on unseen data. The theoretical foundation of this method is given by statistical learning, discussed in (Vapnik V. N., 1998).

In the following the pages of this section are shown:


Mitchell, T. (1997). Machine Learning.

Vapnik, V. (1998). Statistical learning theory. Wiley.

Basili, R., & Moschitti, A. (2005). Automatic Text Categorization: from Information Retrieval to Sup- port Vector Learning. Aracne.

Bishop, C. M. (2006). Pattern Recognition and Machine Learning.