{"id":114,"date":"2018-06-21T12:02:59","date_gmt":"2018-06-21T12:02:59","guid":{"rendered":"http:\/\/sag.art.uniroma2.it\/absita\/?page_id=114"},"modified":"2018-09-12T09:40:16","modified_gmt":"2018-09-12T09:40:16","slug":"evaluation","status":"publish","type":"page","link":"http:\/\/sag.art.uniroma2.it\/absita\/evaluation\/","title":{"rendered":"Evaluation"},"content":{"rendered":"<p>We evaluate the ACD and ACP subtasks separately by comparing the classifications provided by the participant systems to the gold standard annotations of the test set.<\/p>\n<p>For the ACD task, we compute Precision, Recall and F<img style=\"display: inline;\" src=\"https:\/\/latex.codecogs.com\/gif.latex?_1\" \/>-score defined as:<br \/>\n<img style=\"display: inline;\" src=\"https:\/\/latex.codecogs.com\/gif.latex?F1_{a} =  \\frac{2 P_a R_a}{P_a + R_a}\" \/>,<br \/>\nwhere Precision (<img style=\"display: inline;\" src=\"https:\/\/latex.codecogs.com\/gif.latex?P_a\" \/>) and Recall (<img style=\"display: inline;\" src=\"https:\/\/latex.codecogs.com\/gif.latex?R_a\" \/>) are defined as:<br \/>\n<img style=\"display: inline;\" src=\"https:\/\/latex.codecogs.com\/gif.latex?P_a = \\frac{|S_{a} \\cap G_a|}{|S_a|}; R_a = \\frac{|S_a \\cap G_a|}{|G_a|}\" \/>.<br \/>\nHere <img style=\"display: inline;\" src=\"https:\/\/latex.codecogs.com\/gif.latex?S_a\" \/> is the set of aspect category annotations that a system returned for all the test sentences, and <img style=\"display: inline;\" src=\"https:\/\/latex.codecogs.com\/gif.latex?G_a\" \/> is the set of the gold (correct) aspect category annotations.<br \/>\nFor instance, if a review is labeled in the gold standard with the two aspects<br \/>\n<img style=\"display: inline;\" src=\"https:\/\/latex.codecogs.com\/gif.latex?G_a=\\{\\textsc{cleanliness}, \\textsc{staff}\\}\" \/>,<br \/>\nand the system predicts the two aspects<br \/>\n<img style=\"display: inline;\" src=\"https:\/\/latex.codecogs.com\/gif.latex?S_a=\\{\\textsc{cleanliness},\\textsc{comfort}\\}\" \/>,<br \/>\nwe have that <img style=\"display: inline;\" src=\"https:\/\/latex.codecogs.com\/gif.latex?|S_{a} \\cap G_a|=1\" \/>, <img style=\"display: inline;\" src=\"https:\/\/latex.codecogs.com\/gif.latex?|G_{a}|=2\" \/> and <img style=\"display: inline;\" src=\"https:\/\/latex.codecogs.com\/gif.latex?|S_{a}|=2\" \/> so that <img style=\"display: inline;\" src=\"https:\/\/latex.codecogs.com\/gif.latex?P_a=\\frac{1}{2}\" \/>, <img style=\"display: inline;\" src=\"https:\/\/latex.codecogs.com\/gif.latex?R_a=\\frac{1}{2}\" \/> and <img style=\"display: inline;\" src=\"https:\/\/latex.codecogs.com\/gif.latex?F1_a=\\frac{1}{2}\" \/>.<br \/>\nFor the ACD task the baseline will be computed by considering a system which assigns the most frequent aspect category (estimated over the training set) to each sentence. <\/p>\n<p>For the ACP task we will evaluate the entire chain, thus considering both the aspect categories  detected in the sentences together with their corresponding polarity, in the form of <img style=\"display: inline;\" src=\"https:\/\/latex.codecogs.com\/gif.latex?(aspect, polarity)\" \/> pairs.<br \/>\nWe again compute Precision, Recall and F<img style=\"display: inline;\" src=\"https:\/\/latex.codecogs.com\/gif.latex?_1\" \/>-score now defined as<br \/>\n<img style=\"display: inline;\" src=\"https:\/\/latex.codecogs.com\/gif.latex?\nF1_{p} =   \\frac{2 P_p  R_p}{P_p + R_p}\n\" \/>.<br \/>\nPrecision (<img style=\"display: inline;\" src=\"https:\/\/latex.codecogs.com\/gif.latex?P_p\" \/>) and Recall (<img style=\"display: inline;\" src=\"https:\/\/latex.codecogs.com\/gif.latex?R_p\" \/>) are defined as<br \/>\n<img style=\"display: inline;\" src=\"https:\/\/latex.codecogs.com\/gif.latex?\nP_p = \\frac{|S_{p} \\cap G_p|}{|S_p|}; R_p = \\frac{|S_p \\cap G_p|}{|G_p|}\n\" \/>,<br \/>\nwhere <img style=\"display: inline;\" src=\"https:\/\/latex.codecogs.com\/gif.latex?S_p\" \/> is the set of <img style=\"display: inline;\" src=\"https:\/\/latex.codecogs.com\/gif.latex?(aspect, polarity)\" \/> pairs that a system returned for all the test sentences, and <img style=\"display: inline;\" src=\"https:\/\/latex.codecogs.com\/gif.latex?G_a\" \/> is the set of the gold (correct) pairs annotations.<\/p>\n<p>For instance, if a review is labeled in the gold standard with the pairs<br \/>\n<img style=\"display: inline;\" src=\"https:\/\/latex.codecogs.com\/gif.latex?G_p=\\{(\\textsc{cleanliness}, POS), (\\textsc{staff}, POS)\\}\" \/>,<br \/>\nand the system predicts the three pairs<br \/>\n<img style=\"display: inline;\" src=\"https:\/\/latex.codecogs.com\/gif.latex?S_p=\\{(\\textsc{cleanliness}, POS),  (\\textsc{cleanliness}, NEG),(\\textsc{comfort}, POS)\\}\" \/>,<br \/>\nwe have that <img style=\"display: inline;\" src=\"https:\/\/latex.codecogs.com\/gif.latex?|S_{p} \\cap G_p|=1\" \/>, <img style=\"display: inline;\" src=\"https:\/\/latex.codecogs.com\/gif.latex?|G_{p}|=2\" \/> and <img style=\"display: inline;\" src=\"https:\/\/latex.codecogs.com\/gif.latex?|S_{p}|=3\" \/> so that <img style=\"display: inline;\" src=\"https:\/\/latex.codecogs.com\/gif.latex?P_a=\\frac{1}{3}\" \/>, <img style=\"display: inline;\" src=\"https:\/\/latex.codecogs.com\/gif.latex?R_a=\\frac{1}{2}\" \/> and <img style=\"display: inline;\" src=\"https:\/\/latex.codecogs.com\/gif.latex?F1_a=0.28\" \/>.<\/p>\n<p>For the ACP task, the baseline will be computed by considering a system which assigns the most frequent <img style=\"display: inline;\" src=\"https:\/\/latex.codecogs.com\/gif.latex?(aspect, polarity)\" \/> pair (estimated over the training set) to each sentence. <\/p>\n<p>We will produce separate rankings for the tasks, based on the <img style=\"display: inline;\" src=\"https:\/\/latex.codecogs.com\/gif.latex?F_1\" \/> scores. Participants who submit only the result of the ACD task will appear in the first ranking only.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>We evaluate the ACD and ACP subtasks separately by comparing the classifications provided by the participant systems to the gold standard annotations of the test set. For the ACD task, we compute Precision, Recall and F-score defined as: , where Precision () and Recall () are defined as: . Here is the set of aspect &hellip; <a href=\"http:\/\/sag.art.uniroma2.it\/absita\/evaluation\/\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">Evaluation<\/span><\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":[],"_links":{"self":[{"href":"http:\/\/sag.art.uniroma2.it\/absita\/wp-json\/wp\/v2\/pages\/114"}],"collection":[{"href":"http:\/\/sag.art.uniroma2.it\/absita\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"http:\/\/sag.art.uniroma2.it\/absita\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"http:\/\/sag.art.uniroma2.it\/absita\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"http:\/\/sag.art.uniroma2.it\/absita\/wp-json\/wp\/v2\/comments?post=114"}],"version-history":[{"count":19,"href":"http:\/\/sag.art.uniroma2.it\/absita\/wp-json\/wp\/v2\/pages\/114\/revisions"}],"predecessor-version":[{"id":161,"href":"http:\/\/sag.art.uniroma2.it\/absita\/wp-json\/wp\/v2\/pages\/114\/revisions\/161"}],"wp:attachment":[{"href":"http:\/\/sag.art.uniroma2.it\/absita\/wp-json\/wp\/v2\/media?parent=114"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}