Data

The data source chosen for creating the datasets is the popular website booking.com. The platform allows users to share their opinions about hotels through a positive/negative textual review and a fine-grain rating system that assigns a score to  different aspects: cleanliness, comfort, amenities, staff, value for money, free/paid WiFi, location. The websitedetailed on the s provides a large number of reviews in many languages, including Italian.

More than 10,000 sentences from hotel reviews in Italian have been extracted using the Python implementation of Scrapy, an open source framework to extract data from websites. The hotel’s page situated in Naples, Bologna, Milan have been analyzed and the reviews which contain textual contents have been extracted. The longer reviews have been split into single self-reliant sentences using Tint (The Italian NLP tool) library.

In order to guarantee a balanced distribution of positive and negative sentences in the portion of the dataset reserved for each annotator, we randomly selected the sentences alternating positive and negative annotated phrases, leveraging the review-level information already provided by booking.com

 

 

Data Annotation Process

In order to obtain a complete dataset for ABSA, we annotated the sentences from the hotel reviews according to seven aspects: pulizia (cleanliness), comfort, servizi (amenities), staff, qualita-prezzo (value), wifi (wireless Internet connection) and posizione (location). For each aspect, the polarity (positive, negative) of its mention has been annotated. The positive and negative polarities are annotated independently, thus for each aspect four sentiment classes are possible: positive (positive=yes, negative=no), negative (positive=no, negative=yes), neutral (positive=no, negative=no), mixed (positive=yes, negative=yes).

The annotation task involved the following classes:

  • cleanliness_positive
  • cleanliness_negative
  • comfort_positive
  • comfort_negative
  • amenities_positive
  • amenities_negative
  • staff_positive
  • staff_negative
  • value_positive
  • value_negative
  • wifi_positive
  • wifi_negative
  • location_positive
  • location_negative
  • other_positive
  • other_negative

Please note that the special topic “other” has been added for completeness, to annotate sentences with opinions on aspects not among the seven  considered by the task. The aspect “other” is provided additionally and it will be not part of the evaluation of results provided for the task.

Four different annotators have been involved in the task in parallel, i.e., dividing the dataset into four equal subsets of about~ 2500 sentences each. Furthermore, a portion of the dataset made of 250 sentences was annotated by all four annotators. We computed the calculated the inter-annotator agreement per-class  over this subset and reported a variation between 85% and 100% (percentage of sentences for which all annotators agreed).

Incomplete, irrelevant, and incomprehensible sentences have been discarded from the dataset during the annotation.

Finally, the *_presence field for the ACD task is computed as the logic inclusive OR of the respective *_positive and *_negative fields.

Data Format

The data format used is CSV with UTF-8 encoding and semicolon as separator. Using this annotation, it is possible to keep the information about which are sentences observed in the same review structure. It is important to note that in booking.com the order of positive and negative sentences is strictly defined, and this could influence the learning. To overcome this issue, we shuffled the sentences in each review. As a consequence, the final id showed in the data file will not reflect the real order of the sentences in the review.
The text of the sentence will be provided at the end of the row and delimited by . Moreover, three binary values are provided for each aspect indicating, respectively: the presence of the aspect in the sentence (aspect\presence:0/1), the positive polarity for that aspect (aspect\pos:0/1) and finally the negative polarity (aspect\neg:0/1). An example of the annotated dataset in the proposed format:

id_sentence; aspect1_presence; aspect1_pos; aspect1_neg; ... sentence;
20160624;0;0;0;0;0;0;0;0;0;0;0;0;1;1;0;0;0;0;1;1;0;"Considerato il prezzo e per una sola notte, va benissimo come punto di appoggio."
20160625;1;0;1;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;"Almeno i servizi igienici andrebbero rivisti e migliorati nella pulizia."
20160626;0;0;0;1;0;1;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;"La struttura purtroppo è vecchia e ci vorrebbero dei lavori di ammodernamento."

Download

Results

 

Task ACD: Aspect Category Detection

rankMicro-PrecisionMicro-RecallMicro-F1-score

1 0.8397 0.7837 0.8108
2 0.8713 0.7504 0.8063
3 0.8697 0.7481 0.8043
4  0.8626 0.7519 0.8035
5 0.8819 0.7378 0.8035
6 0.898 0.6937 0.7827
7 0.8658 0.697 0.7723
8 0.7902 0.7181 0.7524
9 0.6232 0.6093 0.6162
10 0.6164 0.6134 0.6149
11 0.5443 0.5418 0.5431
12 0.6213 0.433 0.5104
baseline 0.4111 0.2866 0.3377

Task ACP: Aspect Category Polarity

rankMicro-PrecisionMicro-RecallMicro-F1-score

1 0.8264 0.7161 0.7673
2 0.8612 0.6562 0.7449
3 0.7472 0.7186 0.7326
4 0.7387 0.7206 0.7295
5 0.8735 0.5649 0.6861
6 0.6869 0.5409 0.6052
7 0.4123 0.3125 0.3555
8 0.5452 0.2511 0.3439
baseline 0.2451 0.1681 0.1994