Investigation is an interesting business intelligence application domain for relational mining: the textual data is constituted by reports on investigation on criminal organizations based on police interrogatory, electronic eavesdropping and wiretaps whereas the relations are typically among subjects mentioned in these sources, e.g. person x belongs to criminal enterprise y or person x knows person y.
The high variability of natural language and the rather high noise affecting such documents make linguistic processors, providing the necessary representation levels, unreliable. These problems prevents the unsupervised approaches to be effective in discovering relevant patterns. Thus the automatic relation miner requires the use of training data to effectively select target patterns.
During 2008 and 2009 the SAG has designed robust models for linguistic relation mining from highly noisy data in the investigative domain. The proposed approach, based on state-of-the-art statistical algorithms such as Support Vector Machines, along with effective and versatile pattern mining methods, e.g. word sequence kernels, has achieved impresive results on data sets made available by the DNA (Direzione Nazionale Antimafia).
Advanced kernels have been designed to exploit some specific features and help managing the difficult conditions, e.g. long-distance dependence features, met in the targeted texts. Experiments have been shown to confirm accurate results against different working conditions (i.e. different levels of noise and different amount of training data).
The results show that our relation miner is robust to non conventional languages as dialects, jargon expressions or coded words typically contained in the target intelligence text.
The outcomes of this line of research has been applied to justice texts in the realm of processual investigation giving rise to a relation mining search engine able to support natural language queries and conceptual navigation across investigative texts, as the following screendumps suggest.
Navigational Functions: from Entities back to Texts
Supervised semantic relation mining from linguistically noisy text documents. IJDAR 14(2): 213-228 (2011)