LLM, Computer Vision and Multimodality

Generative AI
Image and Document processing
Grounded HRI
Adaptation of Foundational LLM
Brand Reputation and Sentiment Analysis in Social Media
Data and Resources for Processing the Italian language

Intelligence is inherently tight to the integrated capability of observing, modeling and internalizing the external environment where a software agent acts, learns and decides or suggests. This research focus insists on the design of intelligent models able to understand, communicate as well as act or make suggestions/recommendations by using linguistic, visual and multimodal representations in an integrated manner. In the SAG Lab, we study how advanced generative AI models—such as Large Language Models and vision-language neural architectures—can be used to process and reason over heterogeneous data, bridging perception and semantics.

Our work spans from generative modeling and image understanding for reasoning in complex domains (such as medicine or tourism) to grounded language understanding in human-robot interaction, with a strong emphasis on engineering principles such as robustness, adaptability and multilingual inclusivity for ethical AI. By combining deep learning with structured representations, the Lab has proposed and experimentally validated different systems that can operate in dynamic and human-centric environments.

Generative AI

Generative Machine Learning models are nowadays a core paradigm aimed at modeling intelligence in conversational systems, such as chatGPT-like agents, as well as as a basis for the design of AI agents, that integrate complex functionalities such as search, question answering and reasoning abilities. The problems related to the adaptation and extension of generative language models—especially Large Language Models (LLMs)—refer to the aspects of accuracy, adequacy as well as coverage of large scale models, and to their assessment in performing effectively across wide ranges of tasks and specialized domains. Our research aims to move beyond generic language generation by enabling these models to understand task-specific constraints and context, through complex training mechanism techniques integrating prompt engineering, instruction tuning and domain adaptation. Robust and aligned generative AI systems are expected to be particularly accurate when specialistic knowledge is involved and complex decision making is necessary. We are particularly interested in assessing the robustness of generative models when applied to low-resource scenarios, multilingual settings, or newly emerging domains. This work lies at the intersection of generative system design, learning task modeling and scalable AI architectures. It enables us to build systems that are both flexible and task-aware.

Image and Document processing

The work of SAG in this area focuses on enabling systems to interpret and reason over visual content, both in isolation and when combined with textual information. We develop and evaluate models for tasks such as Visual Question Answering (VQA), visual entailment, and image-based document understanding. These systems are capable of handling multimodal inputs and producing grounded, context-sensitive responses. The goal is to bridge visual understanding and language comprehension in unified architectures. This research plays a key role in applications like interactive assistants, document intelligence, and multimodal retrieval.

Grounded HRI

How language knowledge can be grounded in perception and action to enable agent systems to carry out intuitive communication is a particularly interesting research challenge in the realm of social robotics as well as mechanical agents. Our research studies methods to combine linguistic modeling, visual learning, and symbolic reasoning in support of human-robot interaction in complex environments. A central theme is multimodal grounding—integrating visual, spatial, and linguistic cues to resolve interpretation ambiguity, grounded understanding and enhance task execution in robotic systems. We investigate how robots can learn to interpret instructions in context and adapt their behavior accordingly. This line of work connects language understanding with situated learning and real-world interaction.

Adaptation of Foundational LLM

Large Language Models offer impressive capabilities, but require careful adaptation to meet the needs of real-world applications. Our research addresses how these models can be tuned, instructed, or extended to handle specialized tasks, user-defined goals, and diverse linguistic domains. We investigate strategies for efficient fine-tuning, zero- and few-shot generalization, and injection of external knowledge in foundational models. We also focus on adapting LLMs to underrepresented languages and domain-specific settings, ensuring inclusivity and task precision. This enables LLMs to become more effective tools across industrial, societal, and research contexts.

Brand Reputation and Sentiment Analysis in Social Media

Sentiment Analysis in Social Networks as well as Enterprise Brand Reputation are strongly dependent on the ability of large scale intelligent systems to recognize the subjective positions of writers of Web textual data and to capitalize them to better understand social trends, dominant community preferences and to properly react for marketing or rebranding purposes. The SAG group studies how AI and NLP methods can help monitor and recognize user sentiments, brand perception and social dynamics across distributed digital platforms. Our models combine textual and multimodal analysis to extract signals from complex, noisy, and fast-evolving data such as chat messages, tweets, comments or visual posts. This work includes developing fine-grained sentiment classification models, emotion detection systems and different tools for real-time opinion mining. By understanding how people behave and manifest ideas and thoughts about people, products, services, or public organisations, our research supports more informed decision-making in marketing, communication, and policy. It also reflects our broader interest in applied NLP in dynamic and open-ended environments.

Data and Resources for Processing the Italian language

The SAG laboratory actively contributes to the development of high-quality linguistic resources and benchmarks for the Italian language, a historically underrepresented language in the AI landscape. Our work includes curating datasets for various NLP tasks—such as question answering, classification, and semantic parsing—and ensuring linguistic diversity in multilingual and multimodal systems. We also explore the design of evaluation protocols and tools to assess the performance of models on Italian content. These efforts aim to foster accessible, inclusive AI that works across languages and cultural contexts. Supporting Italian is part of our broader mission toward equitable language technology.

Semantic Analytics Group @ Uniroma2

SAG is the Semantic Analytics Group at the University of Rome, Tor Vergata