A new representation can be added by creating a class that implements Representation.
The programmer will be asked for implementing two specific methods, getTextFromData and setDataFromText, which are necessary for serializing the Representation in a string format and deserializing it. The decision on which textual format using for describing the representation to be added is left to the programmer, i.e. a JSON format is not imposed. For instance, in the case of sparse vectors, we adopted the standard format of libSVM, while for tree representations the Penn Treebank notation has been adopted, as shown in the page multiple representation formalism.
An empty constructor must be included. Optionally, the class can be annotated with @JsonTypeName in order to specify an alternative type name to be used during the serialization/deserialization mechanism, otherwise the class name will be automatically used.
If a norm can be defined on the representation to be added (i.e. the representation is a vector, a matrix or a higher order tensor), then the Normalizable interface can be implemented, enabling some useful preprocessing operations on datasets like feature scaling or data normalization. In this case the following methods must be implemented:
- getSquaredNorm: returns the squared norm of this vector
- normalize: scales the representation in order to have a unit norm in the explicit feature space
- scale: multiplies each element of the representation by a given coefficient