EVALITA 2011 Frame Labeling over Italian Texts

Universita' di Roma Tor Vergata, Universita' di Pisa, Istituto di Linguistica Computazionale,
Fondazione Bruno Kessler, Universita' di Trento

NEWS!! Details about Second Test Run are available

NEWS!! First Test Run is available

NEWS!! Data formats and Important Dates about the Final Test Stage!!
(See the Submission Details renewed section.)

Introduction

In the "Frame Labeling over Italian Texts" (FLaIT) evaluation exercise systems have to detect the semantic frame "evoked" by a predicate and the major semantic roles explicitly mentioned in an Italian sentence, according to the frame semantics paradigm of (Fillmore, 1985). In particular, the task consists in recognizing words and phrases that evoke semantic frames of the sort defined in the FrameNet project (Baker et al., 1998, http://framenet.icsi.berkeley.edu), and their semantic dependents, which are usually, but not always, their syntactic dependents.
We will refer to this problem as Semantic Role Labeling (SRL). As in previous SRL shared tasks (e.g. CoNLL-2004 and CoNLL-2005), the general goal is to come forward with representation models, inductive algorithms and inference methods which address the proposed SRL problem.

Previous experiences (as in CoNLL-2004/2005 or Semeval 2007 (Baker et al., 2007)) were focused on developing SRL systems based on partial parsing information and/or increasing the amount of syntactic and semantic input information, aiming to boost the performance of machine learning systems on the SRL task. Accordingly, the Evalita 2011 FLaIT challenge will concentrate on the definition of different tasks, focusing on different aspects of the SRL problem.

We encourage the adoption of basic resources for Italian that are under development in the iFrame project. These resources will be made available to all groups participating to the FLaIT task

Interested groups that may not rely on proprietary parsing technologies will be supported in their participation as syntactic annotations for the training and test data at the morphological and syntactic level (at least lemmas, POS tags and Named Entities are expected) will be made available. The level of quality of these auxiliary information may not be homogeneous, as no full manual validation in the released material is expected for the 2011 EvalIta edition.

The reference semantic system is Framenet, presented and widely discussed at the Framenet Project Home Page.

Please visit Evalita 2011 homepage where information about general organization and all tasks is available.

Tasks, Subtasks and Evaluation Metrics

Participants are expected to submit results about one or more tasks among the following:

Frame Prediction
In the first subtask, we want to measure the accuracy in detecting the correct frame of a sentence given the presence of a possibly ambiguous lexical unit. In this case we are interested to verify if the system is able to recognize the occurrence of a predicate word as the lexical unit of its corresponding frame, and to verify if ambiguous lexical units are properly disambiguated within sentences.

Semantic Role Labeling: Argument Detection and Classification.
In the Semantic Role Labeling task, participants will be asked to locate and annotate all the semantic arguments of a frame that are explicitly realized in a sentence, given the corresponding lexical unit. This task correspond to the traditional Semantic Role labeling challenge as defined by the CoNLL 2005 task.

Evaluation Metrics

The evaluation metrics will be:

Frame Detection accuracy, i.e. the percentage of (sentence,predicate) pairs for which the correct frame is decided by the system. An even number of ambiguous and unambiguous lexical units will be made available in the test set.
Boundary Detection accuracy, i.e. the percentage of semantic arguments whose boundaries are precisely determined (i.e. all the tokens belonging to the argument are correctly assigned as inner participant of an argument) with respect to the total number of true semantic arguments. The average across the overall number of sentences is computed as the microaverage across all arguments.
Argument Classification precision (AC precision), i.e. the percentage of arguments whose semantic role (i.e. Frame Element label) is correctly assigned with respect to the total number of annotated (i.e. returned by the system) arguments. The average across the overall number of sentences is computed as the microaverage across all arguments
Argument Classification recall (AC recall), i.e. the percentage of true arguments whose semantic role (i.e. Frame Element label) is correctly assigned by the system. The average across the overall number of sentences is computed as the microaverage across all arguments
Argument Classification F1-measure, i.e. the weighted harmonic mean of AC precision and AC recall

The evaluation will be carried out by the organizers. Participants are required to submit the annotations for the test data and provide a brief description of their system and a full notebook paper describing their experiments, in particular the techniques and the resources used. An analysis of their results will be made possible as the test reference data will be made available after the publication of the official results.

Description of the Training and Testing Data

Source of training data

The training corpus is distributed in two separated sets. The first set, hereafter FBK set has been developed at the Fondazione Bruno Kessler (Tonelli and Pianta, 2008). It includes annotation at the syntactic and semantic level uunder the XML Tiger format also used by the Salsa project. The reference syntactic formalism of the is a constituency-based formalism obtained as output from the constituency-based parser by Corazza et al. (2007). The second set, hereafter ILC set has been developed at the ILC in Pisa by Alessandro Lenci and his colleagues [6]. It also includes annotation at the syntactic and semantic level under the XML Tiger format also used by the Salsa project. The grammatical formalism adopted in the development of the ILC set is a dependency based one, based on the TANL parser.

The creation of data was initiated by different groups that made them available to the iFrame (Italian Framenet) project, a collaboration between the University of Pisa, the University of Roma "Tor Vergata", the University of Trento, the Fondazione Bruno Kessler and CELI.

Documentation and Formats

All data made available adhere to the general following rules:

Characters are UTF-8 encoded (Unicode).
A document contains sentences separated by an empty line.
A sentence consists of a sequence of tokens, one token per line.
A token consists of some fields described below. Different files refer to different fields. In every case, fields are separated by one tab character.
Annotating individual Frame elements. The frame and Frame element tags can span several tokens. We will not use an IOB-like notation: rather multiple tokens belonging to one single argument are ALL labeled with the same frame element label. No prefix "B", "I" or "O"are applied.
Annotating different Frames in one sentence As more than one frame may be evoked (by different lexical units) in one sentence, one column is used to locate all such lexical units (being each one labeled with its corresponding evoked frame) in the sentence. The line corresponding to the lexical unit is labeled with a Frame in such a column. Moreover, the following columns on the rights are individually dedicated to express all frame elements for each detected frame, from the first to last in order of appearance in the sentence.

Original Source data files.

The description of file <FBK>_XXX.src, produced in Trento, and <ILC>_XXX.src, annotated in Pisa, will be soon documented here.
Both resources are based on the XML format used by different Framenet projects worldwide, including the SALSA project in Saarbrucken, where the SALTO annotation tool based on the Tiger format are adopted.

Semantic Annotation data files.

The so-called semantic data file include a CoNLL-like tabular format that expresses all the annotations of the semantic layer, i.e. frames and frame elements for every predicate and role in the sentence. Multiple frames are possible for one sentence. One single tabular description per sentence is made available. Semantic annotations corresponding to the XXX.src file are delivered in a single XXX.sem file. The semantic file is also the format in which participants are expected to deliver their output, i.e. labeled sentences.

The following table describes the columns as they are found in the semantic annotation file .sem

Field Name	Description
Tok Counter	Token counter, that is incremental for each sentence
Form	Word form or punctuation symbol, w
PoS	Part-of-speech tag, with morphological features, based on the TANL tagset.
Frame tag	The Frame F of the word w, if w is a lexical unit for F.
Frame Element tags for the first (top-down) entry F1 in the column "Frame tag"	The Frame Element FE of the word w, if w belongs to the semantic argument of first frame of the column "Frame tag", whose type is FE.
...
Frame Element tags for the k-th (top-down) entry Fk in the column "Frame tag"	The Frame Element FE of the word w, if w belongs to the semantic argument of the frame Fk of the column "Frame tag", whose type is FE.
...

Example of the semantic annotation format.

1	Rilevata	V	-		-		-	-
2	la		RD	-		-		-	-
3	presenza	S	Presence	Target		-	-
4	di		E	-		Entity		-	-
5	gas		S	-		Entity		-	-
6	in		E	-		Location	-	-
7	uno		PI	-		Location	-	-
8	dei		EA	-		Location	-	-
9	tubi		S	-		Location	-	-
10	trasparenti	A	-		Location	-	-
11	che		PR	-		Location	-	-
12	compongono	V	-		Location	-	-
13	l'		RD	-		Location	-	-
14	opera		S	-		Location	-	-
15	,		FF	-		-		-	-
16	i		RD	-		-		-	-
17	guardiani	S	-		-		-	-
18	hanno		VA	-		-		-	-
19	fatto		V	-		-		-	-
20	scattare	V	Process_start	-		Target	-
21	uno		RI	-		-		Event	-
22	speciale	A	-		-		Event	-
23	piano		S	-		-		Event	-
24	d'		E	-		-		Event	-
25	emergenza	S	-		-		Event	-
26	e		CC	-		-		-	-
27	per		E	-		-		-	Duration
28	45		N	-		-		-	Duration
29	minuti		S	-		-		-	Duration
30	i		RD	-		-		-	Agent
31	pompieri	S	-		-		-	Agent
32	hanno		VA	-		-		-	-
33	isolato		V	Closure		-		-	Target
34	la		RD	-		-		-	Containing_object
35	sala		S	-		-		-	Containing_object
36	.		FS	-		-		-	-

Syntactic Annotation data files.

The so-called syntactic data file includes a CoNLL-like tabular format that expresses all the annotations of the syntactic layer, i.e. lemmas, POS tags and dependencies, as they are output by the TANL parser. Syntactic annotations allow teams that are using an owned parser to access meaningful syntactic information for each sentence. Syntactic files corresponding to the sentences in the XXX.src file are delivered in a single XXX.synt file. While the original annotations are hand-validated for the entire training set, it is worth noticing that grammatical information defined in the XXX.synt file are NOT validated, and can be possibly noisy.

The following table describes the columns as they are found in the syntactic annotation file .synt

Field Name	Description
Tok Counter	Token counter, that is incremental for each sentence
Form	Word form or punctuation symbol, w
Lemma	Word lemma or punctuation symbol
PoS	Part-of-speech tag, with morphological features, based on the TANL tagset.
Head	Head of the current token w, which is either a valid Token counter or zero ('0'). Note that depending on the original treebank annotation, there may be multiple tokens with an ID of zero.
Dependency label	The dependency relation to the HEAD triggered by w in the sentence

Example of the syntactic annotation format

1	Rilevata	rilevare	V	0	ROOT
2	la		il		RD	3	det
3	presenza	presenza	S	20	subj
4	di		di		E	3	comp
5	gas		gas		S	4	prep
6	in		in		E	20	comp
7	uno		uno		PI	6	prep
8	dei		di		EA	7	comp
9	tubi		tubo		S	8	prep
10	trasparenti	trasparente	A	9	mod
11	che		che		PR	12	subj
12	compongono	comporre	V	9	mod_rel
13	l'		il		RD	14	det
14	opera		opera		S	12	obj
15	,		,		FF	14	con
16	i		il		RD	17	det
17	guardiani	guardiano	S	20	subj
18	hanno		avere		VA	19	aux
19	fatto		fare		VM	20	modal
20	scattare	scattare	V	1	arg
21	uno		uno		RI	23	det
22	speciale	speciale	A	23	mod
23	piano		piano		S	20	obj
24	d'		di		E	23	comp
25	emergenza	emergenza	S	24	prep
26	e		e		CC	20	con
27	per		per		E	33	comp_temp
28	45		@card@		N	29	mod
29	minuti		minuto		S	27	prep
30	i		il		RD	31	det
31	pompieri	pompiere	S	33	subj
32	hanno		avere		VA	33	aux
33	isolato		isolare		V	20	conj
34	la		il		RD	35	det
35	sala		sala		S	33	obj
36	.		.		FS	1	punc

The adopted tagset

The description of the Tanl tagset used for the morpho-syntactic annotation of the sentences (Files: .synt) are described HERE. All tags of the semantic layer make reference to Framenet.

Copyright and license

The annotated corpora distributed for this task is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 2.5 Italy License. Participants will be requested to agree on the conditions upon downloading the resource.

Resource download

Early Training (Dry-Run) Corpus: Download (1st version)

Training Corpus (Second Version): Download

First Run Test Corpus: Download

Second Run Test Corpus: The Second Run will be automatically sent to participants after the submission of the First Run data
For participants that have not performed the First Run please send an e-mail to Diego De Cao

Submission details

Tasks - Releases of Data for Testing - Submission Procedures - Important Dates

Tasks
The task on which all systems will participate are Frame Prediction, Argument Detection and Classification, as described above.

Frame Detection. In the Test Corpus a significant number of sentences includes a predicate word that is ambiguous across two or more frames. We remind here that all these ambiguous frames are represented in the training data sets, made available to participants. No unseen frame for any predicate word targeted in the test corpus is allowed. The Frame detection task makes thus a sort of "closed" world assumption whereas the dictionary of lexical units for each frame is confined to the ones already exemplified in the training data sets.

The Argument Detection (also known as Boundary Recognition) task will be measured in two different conditions, i.e. with and without the information about the correct Frame of the targeted sentence. This will allow to factorize out the errors due to previous mistakes introduced by the individual systems.

Analogously, the Argument Classification task will be also evaluated in three different conditions, i.e. with no information, with information about the exact frame per predicate word and with information about the exact frame and argument boundaries. At this purpose we will proceed to deliver test data in three different stages that will characterize three different runs, as hereafter discussed.

Releases of Data for Testing
A corpus of test sentences (hereafter Test Data set) will be released in three different stages or runs. Every test run aims at evaluating the ability of the participant systems in all the three tasks, i.e. Frame Prediction, Argument Detection and Classification. As a consequence, the output of every Test Run follows the same format, i.e. the Semantic Annotation format (described in the SemA section) with information about frames and roles described in consecutive and inter-dependent columns.
Input syntactic information is provided in order to facilitate participants that cannot apply their own parsing technology. However, syntactic information as provided in the test data files may include parsing errors.

First Test Run. In the first run the test sentences will be released without any semantic information except the marking of triggering predicate word. As for the presence of possibly ambiguous lexical units (as already observed in the training set) the Frame Prediction stage is a necessary step in this first run. In this stage, the sentences will be made available with an explicit marking of the predicate word (i.e. the lexical unit). The file will be released as a Semantic Annotation File (see the SemAnn section, above) where the fourth column concerning the Frame label will be leave blank and the other columns explicitly mark the predicate word (with the fixed, i.e. frame independent, label Target) without any annotations for roles.

Second Test Run. In the second run the test sentences will be released with the explicit information about the correct frame corresponding to the marked predicate word. Although the Frame Prediction stage is no longer a necessary step in this second run, the output format must not be changed with respect to the first test run. This will facilitate the automatic evaluation process. In the second test run stage, the sentences will be made available with an explicit marking of the frame corresponding to a predicate word (i.e. the lexical unit). The file will be thus released as a Semantic Annotation File (see the SemAnn section, above) where the fourth column explicitly marks the frame of the predicate word (i.e. the frame label itself, e.g. Judgment_communication), and columns used for semantic annotations (i.e. predicates and roles) explicitly marks the predicate word (with the fixed label Target) without any annotations for roles.

Third Test Run. In the third run the test sentences will be released with the explicit information about (19 the correct frame corresponding to the marked predicate word, as well as (2) the marked boundaries of individual arguments. The output format of the third run is the same of the previous two test runs. This will facilitate the automatic evaluation process. In the third test run stage, the file will be thus released as a Semantic Annotation File (see the SemAnn section, above). Frames will explicitly mark the predicate word(s) (i.e. frame labels, e.g. Process_start), and the following columns will describe arguments in the BIO notation: different columns correspond to possibly multiple predicate words per sentence, as in the example below:

1	Rilevata	V	-		-		-	-
2	la		RD	-		-		-	-
3	presenza	S	Presence	Target		-	-
4	di		E	-		B		-	-
5	gas		S	-		O		-	-
6	in		E	-		B		-	-
7	uno		PI	-		I		-	-
8	dei		EA	-		I		-	-
9	tubi		S	-		I			-
10	trasparenti	A	-		I		-	-
11	che		PR	-		I		-	-
12	compongono	V	-		I			-
13	l'		RD	-		I		-	-
14	opera		S	-		O		-	-
15	,		FF	-		-		-	-
16	i		RD	-		-		-	-
17	guardiani	S	-		-		-	-
18	hanno		VA	-		-		-	-
19	fatto		V	-		-		-	-
20	scattare	V	Process_start	-		Target	-
21	uno		RI	-		-		B	-
22	speciale	A	-		-		I	-
23	piano		S	-		-		I	-
24	d'		E	-		-		I	-
25	emergenza	S	-		-		O	-
26	e		CC	-		-		-	-
27	per		E	-		-		-	B
28	45		N	-		-		-	I
29	minuti		S	-		-		-	O
30	i		RD	-		-		-	B
31	pompieri	S	-		-		-	O
32	hanno		VA	-		-		-	-
33	isolato		V	Closure		-		-	Target
34	la		RD	-		-		-	B
35	sala		S	-		-		-	O
36	.		FS	-		-		-	-

Submission Procedures
Test Runs are indexed from 1 (the first) to 3 (the third one). Each team is allowed to submit a maximum number of 2 systems, possibly deriving from different configurations, parametrizations or resources. Results for each run must be made available to the organizers address, EvalIta-FLaIT, as files that have the same format as the Training Corpus (file .sem). The different files must be named as follows:

<Team>_FLaIT_<Run>_<System>.sem

<Team>: a short name for the team, without special characters
<Run>: the labels 1,2 or 3 for the respective Test Runs,
<System>: an arbitrary label that the team decides for its different trials (e.g. different models derived from different parametrizations/resources)

Important Dates

12.10 Release of first run data
19.10 a Tagging of frames, boundaries and arguments over the first run data due from participants
19.10 b Release of second run data (GOLD data for the frame assignment task)
22.10 Tagging of frames, boundaries and arguments over the second run data due from participants
22.10 Release of third run data (Gold data for boundaries)
24.10 Tagging of arguments over the third run data due from participants
28.10 Assessment returned to participants
24.11 Technical reports due to organizers from the authors

Contacts

Organizers
Roberto Basili (University of Roma, Tor Vergata),
Alessandro Lenci (University of Pisa)

Steering Committe:
Alessandro Moschitti (University of Trento),
Sara Tonelli (Fondazione Bruno Kessler),
Diego De Cao (University of Roma, Tor Vergata),
Giulia Venturi (ILC-CNR & Scuola Superiore "S. Anna", Pisa),
Giampaolo Mazzini (CELI, Torino)

Address:

Dipartimento di Ingegneria dell'Impresa, Universita' di Roma, Tor Vergata
Via del Politecnico, 1 - 00133 Roma, Italy
Phone: (+39) 06 72597718

References

[1] Charles J. Fillmore. 1985. Frames and the semantics of understanding. Quaderni di semantica, 6(2):222-254, december.
[2] Collin F. Baker, Charles J. Fillmore, and John B. Lowe. 1998. The Berkeley frame net project.
[3] Collin Baker, Michael Ellsworth, and Katrin Erk. 2007. Semeval-2007 task 19: Frame semantic structure extraction. In Proc. of SemEval-2007, pages 99-104, Czech Republic.
[4] Sara Tonelli and Emanuele Pianta. 2008. Frame information transfer from English to Italian. In Proceedings of LREC, Marrekech, Morocco.
[5] Anna Corazza, Alberto Lavelli, Giorgio Satta, Phrase-Based Statistical Parsing, EVALITA 2007 Workshop on Evaluation of NLP Tools for Italian, AI*IA, 2007.
[6] A. Lenci, S. Montemagni, E.M. Vecchi, G. Venturi (forthcoming), "Enriching the ISST-TANL Corpus with Semantic Frames