Hello (kernel) Learning!

Let’s start with a very simple Example, that is a classification example based on a linear version of the Passive Aggressive algorithm. The full code of this example can be found in the GitHub repository kelp-full, in particular in the source file HelloLearning.java.

Dataset used are the ones used as examples in the svmlight. They have been modified to be read by KeLP. In fact, a single row in KeLP must indicate what kind of vectors your are using, Sparse or Dense. In the svmlight dataset there are Sparse vectors, so if you open the train.dat and test.dat files you can notice that each vector is enclosed in BeginVector (|BV|) and EndVector (|EV|) tags.

The following example will work by adding the online-large-margin Maven dependency to your project.

This example will consider a dataset composed by:

Training set (2000 examples, 1000 of class “+1” (positive), and 1000 of class “-1” (negative))
Test set (600 examples, 300 of class “+1” (positive), and 300 of class “-1” (negative))

Let’s start doing some Java code.

First of all, we need to load dataset in memory and define what is the positive class of the classification problem.

// Read a dataset into a trainingSet variable
SimpleDataset trainingSet = new SimpleDataset();
trainingSet.populate("train.dat");
// Read a dataset into a test variable
SimpleDataset testSet = new SimpleDataset();
testSet.populate("test.dat");
// define the positive class
StringLabel positiveClass = new StringLabel("+1");

// Read a dataset into a trainingSet variable

SimpleDataset trainingSet = new SimpleDataset();

trainingSet.populate("train.dat");

// Read a dataset into a test variable

SimpleDataset testSet = new SimpleDataset();

testSet.populate("test.dat");

// define the positive class

StringLabel positiveClass = new StringLabel("+1");

If you want, you can print some statistics about dataset through some useful built-in methods.

// print some statistics
System.out.println("Training set statistics");
System.out.print("Examples number ");
System.out.println(trainingSet.getNumberOfExamples());
System.out.print("Positive examples ");
System.out.println(trainingSet.getNumberOfPositiveExamples(positiveClass));
System.out.print("Negative examples ");
System.out.println(trainingSet.getNumberOfNegativeExamples(positiveClass));
System.out.println("Test set statistics");
System.out.print("Examples number ");
System.out.println(testSet.getNumberOfExamples());
System.out.print("Positive examples ");
System.out.println(testSet.getNumberOfPositiveExamples(positiveClass));
System.out.print("Negative examples ");
System.out.println(testSet.getNumberOfNegativeExamples(positiveClass));

// print some statistics

System.out.println("Training set statistics");

System.out.print("Examples number ");

System.out.println(trainingSet.getNumberOfExamples());

System.out.print("Positive examples ");

System.out.println(trainingSet.getNumberOfPositiveExamples(positiveClass));

System.out.print("Negative examples ");

System.out.println(trainingSet.getNumberOfNegativeExamples(positiveClass));

System.out.println("Test set statistics");

System.out.print("Examples number ");

System.out.println(testSet.getNumberOfExamples());

System.out.print("Positive examples ");

System.out.println(testSet.getNumberOfPositiveExamples(positiveClass));

System.out.print("Negative examples ");

System.out.println(testSet.getNumberOfNegativeExamples(positiveClass));

Then, instantiate a new Passive Aggressive algorithm and set some parameter on it.

// instantiate a passive aggressive algorithm
LinearPassiveAggressive passiveAggressiveAlgorithm = new LinearPassiveAggressive();
// use the first (and only here) representation
passiveAggressiveAlgorithm.setRepresentation("0");
// indicate to the learner what is the positive class
passiveAggressiveAlgorithm.setLabel(positiveClass);
// set an aggressiveness parameter
passiveAggressiveAlgorithm.setAggressiveness(0.01f);

// instantiate a passive aggressive algorithm

LinearPassiveAggressive passiveAggressiveAlgorithm = new LinearPassiveAggressive();

// use the first (and only here) representation

passiveAggressiveAlgorithm.setRepresentation("0");

// indicate to the learner what is the positive class

passiveAggressiveAlgorithm.setLabel(positiveClass);

// set an aggressiveness parameter

passiveAggressiveAlgorithm.setAggressiveness(0.01f);

Learn a model on the trainingSet obtaining a Classifier

// learn and get the prediction function
Classifier f = passiveAggressiveAlgorithm.learn(trainingSet);

1 2	// learn and get the prediction function Classifier f = passiveAggressiveAlgorithm.learn(trainingSet);

Finally, we classify each example in the test set and compute some performance measure.

int correct=0;
for (Example e : testSet.getExamples()) {
    ClassificationOutput p = f.predict(testSet.getNextExample());
    if (p.getScore(positiveClass) > 0 && e.isExampleOf(positiveClass))
        correct++;
    else if (p.getScore(positiveClass) < 0 && !e.isExampleOf(positiveClass))
        correct++;
}
System.out.println("Accuracy: " + ((float)correct/(float)testSet.getNumberOfExamples()));

int correct=0;

for (Example e : testSet.getExamples()) {

ClassificationOutput p = f.predict(testSet.getNextExample());

if (p.getScore(positiveClass) > 0 && e.isExampleOf(positiveClass))

correct++;

else if (p.getScore(positiveClass) < 0 && !e.isExampleOf(positiveClass))

correct++;

}

System.out.println("Accuracy: " + ((float)correct/(float)testSet.getNumberOfExamples()));

Kernel based Learning

Using Kernel functions within KeLP is very simple. It is sufficient to declare a kernel function, set on which representation it will operate and tell the algorithm the it must use a kernel function to compute similarity scores.

In the previous example, if we want to use a Polynomial kernel on top of a linear kernel, it is sufficient to do as following:

// instantiate a passive aggressive algorithm
KernelizedPassiveAggressive kPA = new KernelizedPassiveAggressive();
// indicate to the learner what is the positive class
kPA.setLabel(positiveClass);
// set an aggressiveness parameter
kPA.setAggressiveness(0.01f);
// use the first (and only here) representation
Kernel linear = new LinearKernel("0");
// Normalize the linear kernel
NormalizationKernel normalizedKernel = new NormalizationKernel(linear);
// Apply a Polynomial kernel on the score (normalized) computed by the linear kernel
Kernel polyKernel = new PolynomialKernel(2f, normalizedKernel);
// tell the algorithm that the kernel we want to use in learning is the polynomial kernel
kPA.setKernel(polyKernel);

// instantiate a passive aggressive algorithm

KernelizedPassiveAggressive kPA = new KernelizedPassiveAggressive();

// indicate to the learner what is the positive class

kPA.setLabel(positiveClass);

// set an aggressiveness parameter

kPA.setAggressiveness(0.01f);

// use the first (and only here) representation

Kernel linear = new LinearKernel("0");

// Normalize the linear kernel

NormalizationKernel normalizedKernel = new NormalizationKernel(linear);

// Apply a Polynomial kernel on the score (normalized) computed by the linear kernel

Kernel polyKernel = new PolynomialKernel(2f, normalizedKernel);

// tell the algorithm that the kernel we want to use in learning is the polynomial kernel

kPA.setKernel(polyKernel);

For a complete example with kernel, you can download the HelloKernelLearning.java file.

Semantic Analytics Group @ Uniroma2

SAG is the Semantic Analytics Group at the University of Rome, Tor Vergata

Hello (kernel) Learning!

Autentication

News

Username
Password

	Remember Me Lost your password?