Distributional Polarity Lexicon
The Distributional Polarity Lexicons (Castellucci et al, 2016) are a automatically derived sentiment vector representations for words. Each lexicon consists of a list of words each associated to three numerical scores, reflecting the positivity, negativity and neutrality respectively.
In this page we release 4 lexicons in English and Italian. In each language we release 2 versions of the lexicons. The first version contains a list of plain words with polarity scores, e.g. good, suffered, loved, smile. The second version contains a list of words that have been pre-processed, to produce lemma::pos pairs, i.e. good::j indicates the adjective good, while pain::n indicates the noun pain. We call the plain words lexicon DPL-EN and DPL-IT, respectively for English and Italian. The pre-processed versions are called DPLp-EN and DPLp-IT, respectively for English and Italian.
The acquisition process is based on Distributional Models and on Social Media data to train a Support Vector Machine classifier, used to derive the polarity scores, according to the methodology described in (Castellucci et al, 2015). In particular, a sentiment transfer from sentences to words is exploited to derive such representation. In fact, by means of words Distributional representation, it is possible to represent both sentences and words in the same vector space, for example applying a simple sum operator to represent a sentence given the vectors of the words composing it. As sentences can be related to polarities, it is possible to acquire a classifier by observing how sentence relate to different emotional categories. Then, the classifier can be applied to a single word to obtain its emotional signature. As demonstrated in (Castellucci et al, 2015) sentences can be acquired automatically from Twitter by adopting simple heuristics. For example, in (Castellucci et al, 2015) messages ending with positive smiles, e.g. 🙂 or :D, or negative, e.g. :(, or a url are used to filter messages respectively for the positive, negative and neutral categories. These messages can be thus used as the training set to acquire the sentence level classifier.
Different Sentiment Analysis tasks have been used to demonstrate the effectiveness of the acquired lexicon, you can check the outcomes in (Castellucci et al, 2015) or (Castellucci et al, 2016).
Here you can download the polarity lexicons.
Each file is formatted as:
word [tab] positiveScore,negativeScore,neutralScore
DPLp-EN (106,117 words)
DPL-EN (191,389 words)
DPLp-IT (75,021 words)
DPL-IT (143,764 words)
For more information, please feel free to contact:
DISCLAIMER: The polarity lexicon is acquired automatically and no human intervention nor validation has been made. We do not assume liability for any effect caused by the usage of this resource.