Archive for Danilo Croce

New SAG paper at NAACL 2022!

The paper titled “Learning to Generate Examples for Semantic Processing Tasks” has been accepted at NAACL 2022! It is the result of a recent collaboration between our group and Amazon Seattle.

Here the paper abstract:

Even if recent Transformer-based architectures, such as BERT, achieved impressive results in semantic processing tasks, their fine-tuning stage still requires large scale training resources. Usually, Data Augmentation (DA) techniques can help to deal with low resource settings. In Text Classification tasks, the objective of DA is the generation of well-formed sentences that i) represent the desired task category and ii) are novel with respect to existing sentences. In this paper, we propose a neural approach to automatically learn to generate new examples using a pre-trained sequence-to-sequence model. We first learn a task-oriented similarity function that we use to pair similar examples. Then, we use these example pairs to train a model to generate examples. Experiments in low resource settings show that augmenting the training material with the proposed strategy systematically improves the results on text classification and natural language inference tasks by up to 10% accuracy, outperforming existing DA approaches.

New SAG paper at ACL 2021!

The paper titled “Learning to Solve NLP Tasks in an Incremental Number of Languages” has been accepted at ACL2021! It is the result of a recent collaboration between our group and Amazon Seattle.

Here the paper abstract:

In real scenarios, a multilingual model trained to solve NLP tasks on a set of languages can be required to support new languages over time. Unfortunately, the straightforward retraining on a dataset containing annotated examples for all the languages is both expensive and time-consuming, especially when the number of target languages grows. Moreover, the original annotated material may no longer be available due to storage or business constraints. Re-training only with the new language data will inevitably result in Catastrophic Forgetting of previously acquired knowledge. We propose a Continual Learning strategy that updates a model to support new languages over time, while maintaining consistent results on previously learned languages. We define a Teacher-Student framework where the existing model “teaches” to a student model its knowledge about the languages it supports, while the student is also trained on a new language. We report an experimental evaluation in several tasks including Sentence Classification, Relational Learning and Sequence Labeling.

EVALITA2020: SAG ranked first in two tasks

Two UNITOR groups (composed of members of the SAG and students from the Computer Science courses) participated to two distinct EVALITA2020 tasks:

  • Sardistance: Stance Detection in Italian tweets
  • DankMeme: multimoDal Artefacts recogNition Knowledge for MEMES

and both groups ranked first in both competitions! You can find more details in the proceedings of the conference and the video presentations:

The new KeLP site is Online!

KeLP grew significantly in the past two years, from a laboratory project to a Maven Project containing more than 20,000 lines of code.
We (the KeLP team) believe that the old site is “too small” for KeLP and a better description of the available kernel methods and learning algorithms is required.

This new site provides larger documentations, new tutorial and examples, in a more structured form.

A new site is available at

… it provides larger documentations, new tutorial and examples, in a more structured form.

Please, let us know your opinion about the site and/or questions about KeLP!

LU4R has been released!

LU4R is the adaptive spoken Language Understanding chain For(4) Robots tool, that is the result of the collaboration between the Semantic Analytics Group at the University of Roma, Tor Vergata, and the Laboratory of Cognitive Cooperating Robots (Lab.Ro.Co.Co.) at Sapienza, University of Rome.

LU4R receives as input one or more transcriptions of a spoken command for a robots and produces one or more linguistic predicates reflecting the actions intended by the user. Predicates, as well as their arguments, are consistent with a linguistically-motivated representation and coherent with the environment perceived by the robot. The interpretation process is sensitive to different configurations of the environment (possibly synthesized through a Semantic Map or different approaches) that collect all the information about the entities populating the operating world. The tool is fully implemented in Java and is released according to a Client/Server architecture, in order to decouple the chain from the specific robotic platform that will use it.

You can find more information about LU4R and download it at:

CLIC-it 2016: best young paper!

We are proud to announce that the paper  Context–aware Spoken Language Understanding for Human Robot Interaction by Andrea Vanzo, Danilo Croce, Roberto Basili and Daniele Nardi has been awarded with the Best Young Paper Award at the CLIC-it 2016 conference!

The paper presented LU4R, the adaptive spoken Language Understanding chain For(4) Robots tool. More details about LU4R can be found here.

A new KeLP tutorial has been released.

A tutorial showing how to implement with KeLP a Kernel-based classifier for the Question Classification task has been presented at the Web Mining and Retrieval course. The tutorial can be downloaded from the course WmIR course webpage.

You can find the slides at this link and the tutorial material at this link.

SAG’s KeLP team ranked first at the SemEval 2016 Community Question Answering Task

The Kelp group, mainly composed of people from the SAG laboratories (Simone Filice, Danilo Croce and Roberto Basili) ranked first at the subtask A within the SemEval 2016 Task 3: Community Question Answering!

Moreover the Kelp group ranked third and second with respect to the Subtask B and Subtask C, respectively (among up to 12 groups).

The list of participants and the detailed results can be downloaded at this link.


The ECIR 2016 paper has been accepted!

The paper “Large-scale Kernel-based Language Learning through the Ensemble Nystrom methods” by Danilo Croce and Roberto Basili has been accepted at the 8th European Conference on Information Retrieval (ECIR 2016) that will be held on 20-23 March 2016 in Padua, Italy (acceptance rate: 21%)

The list of accepted paper can be browsed at this link.

Abstract: Kernel methods have been used by many Machine Learning paradigms, achieving state-of-the-art performances in many Language Learning tasks. One drawback of expressive kernel functions, such as Sequence or Tree kernels, is the time and space complexity required both in learning and classification. In this paper, the Nystrom methodology is studied as a viable solution to face these scalability issues.
By mapping data in low-dimensional spaces as kernel space approximations, the proposed methodology positively impacts on scalability through compact linear representation of highly structured data. Computation can be also distributed on several machines by adopting the so-called Ensemble Nystrom Method.
Experimental results show that an accuracy comparable with state-of-the-art kernel-based methods can be obtained by reducing of orders of magnitude the required operations and enabling the adoption of datasets containing more than one million examples.

The Context-based Corpus for Sentiment Analysis in Twitter has been released

The Context-based Corpus for Sentiment Analysis in Twitter used in the COLING 2014 best paper

Andrea Vanzo, Danilo Croce, Roberto Basili (2014): A context based model for Sentiment Analysis in Twitter. In: Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics, pp. 2345–2354, Dublin City University and Association for Computational Linguistics, Dublin, Ireland, 2014.

has been released.

Please find further details at: