Esci dai Frame

  Web Mining e Retrieval (a.a. 2018/19)
Secondo Semestre
Docente: Roberto Basili Email: basili@info.uniroma2.it
    Elenco dei File nel deposito

Sommario Contenuti

1.Novita'

2.Programma del Corso

3.Testi di Riferimento

4.Link Utili

5.Diapositive delle lezioni

6.Progetti ed Esercizi Proposti


Novita'

  • Classroom of the second Exam Session: The Second Final Test will be held on February 26th, 2020 in Aula 6 Sogene since 10:00 in the morning.
    Studenti are requested to register at the Delphi portal.


  • Results of the writte test of January 31.

  • Exams: Autumn session

    The exams of the Autumn session will be held according to the following agenda:
    • Monday, September 9 at 15:00 in Room B14
    • Friday, September 20 at 15:00 in Room C3
  • Results of the Second Final Test (July 19):

    • Results
      Registration will be possible in the week between 5 and 9 of August, 2019, after the accomplishment of all exercises as well as the (individual or team) projects.
      Students that must start the final projects are invited to contact prof. Basili or Prof. Croce and communicate their final choice. If needed, you can ask (via e-mail) for dedicated meetings focused on discussing the project topics before deciding.
      Exam registration will be done on-demand. Please contact the professors, via e-mail.

  • Results of the Second Mid Term (June 20):

    • Results
      Registration will be possible after the accomplishment of all exercises as well as the (individual or team) projects.
      Students that must start the final projects are invited to contact prof. Basili or Prof. Croce and communicate their final choice. If needed, you can ask (via e-mail) for dedicated meetings focused on discussing the project topics before deciding.
      Exam registration will be done on-demand. Please contact the professors, via e-mail.

  • Projects for the academic year 2018-19:


  • Introduction to the Final Test

  • Exams: Summer session

    The exams of the summer session will be held according to the following agenda:
    • Thursday, June 20 at 11:00 in Room C11
    • Friday, July 19 at 11:00 in Room C6




  • Complementary Lesson: On Monday June 17 at 14:00 a lesson on the topics of the Second Mid Term test will be held in Room B7, instead of B8 as previously announced.
  • SEMINAR ANNOUNCEMENT: Il giorno 10 Giugno 2019 nell'ambito delle lezioni del Corso, si terra' alle ore 14:00 in Aula B8 il seminario "Applying AI to the real world" di D. Saracino (PriceWaterhouse & Cooper - New Venture).


    ( The seminar "Applying AI to the real world" held by D. Saracino (PriceWaterhouse & Cooper - New Venture) will be held in Room B8 on Monday June 10, 2019 at 14:00.
  • URGENTE: A causa delle elezioni europee, le attivit├ didattiche dell'intera universit├ sono sospese per l'intera giornata. La lezione di Lunedi' 27 Maggio alle ore 14:00 ├Ę quini cancellata. Le lezioni continueranno regolarmente a partire da Mercoledi' 29 Maggio 2019, alle ore 9:30 in Aula B8, Edificio Aule Nuove, Macroarea Ingegneria. ( As for the European Elections, the teching activities are canceled for Monday 27. The lesson scheduled on Monday 27 at 14:00 is cancelled. Lessons will continue regularly since Wednesday 29 of March in the room B8 at 9:30).
  • Neural Network LAB 3: In order to use the online python notebook ( Named Entity Recognition with LSTM ), please refer to this LINK.

  • Neural Network LAB 2: In order to use the online python notebook ( MNIST handwritten digits classification with CNNS ), please refer to this LINK.

  • Classroom for Thursday 16, h. 14:00-18:00: The lesson of the Mini course on NN, scheduled on Thursday May, 16, at 14:00 will be carried out in Room B7 and not in B8 as previously announced.
  • Neural Network LAB 1: In order to use the online python notebook ( MNIST handwritten digits classification with MLPs ), please refer to this LINK.

  • Results of the April MidTerm test. Please access the following FILE for the details.

  • Mini Course on Neural Networks in the week between 13 and 16 of May a course dedicated to Neural Networks will be carried out by merging lessons from the Web Minig & Retrieval and Machine Learning courses. The agenda of lessons is the following
    Please, click on the image to see it at a higher resolution.

  • Sono stati pubblicati gli esercizi relativi al Question Classification in ambiente Weka e Kelp che permettono di acquisire circa 8 punti per gli homework assegnati durante il corso.
  • MIDTERM testThe students that have (still) to carry out the MidTerm test are requested to join the professor TODAY (May 3) at 15:00 at his own office.


  • Sono (di nuovo) aperte le iscrizioni al Corso, si prega di accedere alla pagina Delphi.

  • Documentazione della lezione del 18 Aprile 2019: Introduzione al MidTerm test ed Esempi di domande.

  • URGENTE: La lezione di Giovedi' 11 Aprile alle ore 11:30 ├Ę cancellata per indisponibilita' fisica del docente.
    A causa dell'annullamento della lezione, viene anche annullata la prova del Test in Itinere prevista Mercoledi' 17 Aprile. Il Test in Itinere si terra' Mercoledi' 24 Aprile alle ore 9:30 in Aula B8.
    Le lezioni continueranno regolarmente a partire da Lunedi' 15 Aprile 2019, alle ore 14:00 in Aula B8, secondo il calendario accademico.
    ( URGENT: The lesson of April Thursday 11 at 11:30 is cancelled as the teacher will be unavailable due to health problems. Consequently, the MidTerm Test foreseen on Wednesday April 17 is also cancelled. The MidTerm will be held on Wednesday April 24 at 9:30 in room B8.
    The lessons will continue regularly since Monday April 15 in room B8 at 14:00).
  • URGENTE: La lezione di Lunedi' 18 alle ore 14:00 ├Ę cancellata. Le lezioni continueranno regolarmente a partire da Mercoledi' 20 Marzo 2019, alle ore 9:30 in Aula B8, Edificio Aule Nuove, Macroarea Ingegneria. ( The lesson of March, Monday 18 at 14:00 is cancelled. Lesons will coninue regularly since Wednesday 20 of March in the room B8 at 9:30).
  • Le lezioni del Corso continueranno regolarmente a partire da Lunedi' 11 Marzo 2019, alle ore 14:00 in Aula B8, Edificio Aule Nuove, Macroarea Ingegneria. Le lezioni di Mercoledi' 6 e Giovedi' 7 Marzo sono cancellate. ( Lessons will continue on a regular basis startng from Monday March, 11, 2019 (h. 14:00, Aula B8). The lessons scheduled on March, 6 and 7 (Wednesday and Thursday) are cancelled.)
  • Le lezioni del Corso inizieranno regolarmente Lunedi' 4 Marzo 2019, alle ore 14:00 in Aula C2, Edificio Aule Nuove, Macroarea Ingegneria.
    Gli studenti che intendono seguire il Corso sono pregati di registrarsi ad esso, accedendo al sito Delphi.

  • Calendario delle lezioni del Corso:
    • LUNEDI', h. 14:00-15:45 (Aula B8 Edificio Aule Nuove Macroarea di Ingegneria)
    • MERCOLEDI', h. 9:30-11:15 (Aula B8 Edificio Aule Nuove Macroarea di Ingegneria)
    • GIOVEDI', h. 11:30-13:15 (Aula B8 Edificio Aule Nuove Macroarea di Ingegneria)
  • Le diapositive delle lezioni saranno pubblicate durante il ciclo delle lezioni su queste pagine.
  • Il Corso insiste sulle ricerche ed i progetti innovativi del Semantics Analytics Group (SAG), che si occupa di Machine Learning e Natural Language Processing nella progettazione ed ingegnerizzazione di Sistemi Software Avanzati di Intelligenza Artificiale, e nelle loro applicazioni predittive nella interpretazione e ricerca di documenti, nella sicurezza in rete, nella analisi dei Social Network e nei processi di Digital Transformation.
    Sono attive alcune sperimentazioni e progetti presso il SAG Laboratory for Semantics Analytics, da cui sono emanate annualmente alcune Borse di Studio e Premi di Laurea.
    Sara' possibile discutere in dettaglio le diverse Tesi con il coordinatore di SAG, prof. Roberto BASILI, o con il responsabile tecnico del Laboratorio, prof. Danilo CROCE.
    L'orario di ricevimento, diverso da quello dei Corsi, e' da concordare con i docenti via e-mail.


    RECENTI PROPOSTE di TESI di LAUREA.

Programma


Segue il programma preliminare del Corso che sara' messo a punto ed finalizzato al termine delle lezioni del Corso.

Section I: Machine Learning and Kernel-based Learning.
Machine Learning and Artificial Intelligence. Supervised methods. Probabilistic and Generative Methods. Unsupervised Learning. Clustering. Semantic Similarity metrics. Agglomerative clustering methods. K-mean. Hidden Markov Models. Statistical Learning Theory: PAC learnability. Kernel-based Learning. Polynomial and Radial Basis Function Kernels. String and Tree kernels. Semantic kernels. Neural Modeling: Perceptron, Multilayer Percetrons, Deep Neural Networks. Language Models and Recurrent Networks. Introduction to the main platforms for the development of ML software: TensorFlow, Weka, SciKit, KeLP.

Section II: Statistical Language Processing.
Supervised Language Processing tools. HMM-based POS tagging. Named Entity Recognition. Statistical parsing. PCFGs: Charniak parser. Lexicalized Parsing Methods. Shallow Semantic Parsing: kernel based semantic role labelling. Information Extraction. Introduction to IBM's Watson.

Section III: Web Mining & Retrieval.
Ranking Models for the Web. Introduction to Social Network Analysis: rank, centrality. Random walk models: Page Rank. Web Search Engines. SEO. Google. Preference Learning for IR. Question Answering Systems. Wikipedia-based knowledge Acquisition. Social Web. Graph-based algorithms for community detection. Opinion Mining and Sentiment Analysis.


Testi di Riferimento

  • IR - Introduction to Information Retrieval, Christopher D. Manning, Prabhakar Raghavan and Hinrich Schutze, Cambridge University Press. 2008. Find the book Home page HERE.
  • ML - Pattern Recognition and Machine learning, C. Bishop. Springer. 2006.
  • ML ed IR - Automatic Text Categorization: from Information Retrieval to Support Vector Learning, Roberto Basili, Alessandro Moschitti, ARACNE Editore, 2005.
  • Web IR - Bing Liu, Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data. 2nd Edition, July 2011, Springer.
  • Dispense fornite dal docente

Lezioni (Lucidi)


Link Utili


LABORATORI: Progetti ed Esercizi

  • Neural Network LAB 1: In order to use the online python notebook ( MNIST handwritten digits classification with MLPs ), please refer to this LINK.
  • Introduction to the Final Test