Distributional Polarity Lexicon

The Distributional Polarity Lexicons (Castellucci et al, 2016) are a automatically derived sentiment vector representations for words. Each lexicon consists of a list of words each associated to three numerical scores, reflecting the positivity, negativity and neutrality respectively.

In this page we release 4 lexicons in English and Italian. In each language we release 2 versions of the lexicons. The first version contains a list of plain words with polarity scores, e.g. good, suffered, loved, smile. The second version contains a list of words that have been pre-processed, to produce lemma::pos pairs, i.e. good::j indicates the adjective good, while pain::n indicates the noun pain. We call the plain words lexicon DPL-EN and DPL-IT, respectively for English and Italian. The pre-processed versions are called DPLp-EN and DPLp-IT, respectively for English and Italian.

Example of English words and associated polarity scores.
term	positivity	negativity	neutrality
good::j	0.74	0.11	0.15
bad::j	0.12	0.80	0.08
pain::n	0.13	0.76	0.11
#apple::h	0.14	0.16	0.70
article::n	0.16	0.09	0.75

The acquisition process is based on Distributional Models and on Social Media data to train a Support Vector Machine classifier, used to derive the polarity scores, according to the methodology described in (Castellucci et al, 2015). In particular, a sentiment transfer from sentences to words is exploited to derive such representation. In fact, by means of words Distributional representation, it is possible to represent both sentences and words in the same vector space, for example applying a simple sum operator to represent a sentence given the vectors of the words composing it. As sentences can be related to polarities, it is possible to acquire a classifier by observing how sentence relate to different emotional categories. Then, the classifier can be applied to a single word to obtain its emotional signature. As demonstrated in (Castellucci et al, 2015) sentences can be acquired automatically from Twitter by adopting simple heuristics. For example, in (Castellucci et al, 2015) messages ending with positive smiles, e.g. 🙂 or :D, or negative, e.g. :(, or a url are used to filter messages respectively for the positive, negative and neutral categories. These messages can be thus used as the training set to acquire the sentence level classifier.

Different Sentiment Analysis tasks have been used to demonstrate the effectiveness of the acquired lexicon, you can check the outcomes in (Castellucci et al, 2015) or (Castellucci et al, 2016).

Example of Italian words and associated polarity scores.
term	positivity	negativity	neutrality
buono::j	0.77	0.12	0.11
cattivo::j	0.23	0.63	0.14
sofferenza::n	0.17	0.48	0.35
#apple::h	0.17	0.12	0.71
articolo::n	0.19	0.05	0.76

Download

Here you can download the polarity lexicons.

Each file is formatted as:

word [tab] positiveScore,negativeScore,neutralScore

DPLp-EN (106,117 words)

DPL-EN (191,389 words)

DPLp-IT (75,021 words)

DPL-IT (143,764 words)

Information

For more information, please feel free to contact:

DISCLAIMER: The polarity lexicon is acquired automatically and no human intervention nor validation has been made. We do not assume liability for any effect caused by the usage of this resource.

References

Giuseppe Castellucci, Danilo Croce, Roberto Basili (2016): A Language Independent Method for Generating Large Scale Polarity Lexicons. In: Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC'16), To Appear, European Language Resources Association (ELRA), Portoroz, Slovenia, 2016.

Giuseppe Castellucci, Danilo Croce, Roberto Basili (2015): Acquiring a Large Scale Polarity Lexicon through Unsupervised Distributional Methods. In: Natural Language Processing and Information Systems - 20th International Conference on Applications of Natural Language to Information Systems, Lecture Notes in Computer Science, Springer, 2015.

Semantic Analytics Group @ Uniroma2

SAG is the Semantic Analytics Group at the University of Rome, Tor Vergata