Classifying Narcotrafficking Spatial Event Documents using Transformers

Date of Publication

2021 12:00 AM

Security Theme

Transnational Organized Crime

Keywords

TOC

Description

The low signal-to-noise ratio of information in unstructured textual documents remains a challenge in geo-text analyses. While information extraction approaches can be used to identify places, times and people mentioned in text, many of the identified entities may not be related to the analytical scenario, and may mislead spatial models without pre-filtering. On the other hand, detecting relevant spatiotemporal events and their attributes is of high interest to geovisual or computational solutions leveraging textual data. In this paper, we present a classification approach for identifying documents describing the time, locations and attributes of such spatiotemporal events (in Spanish), namely of drug trafficking activities in Honduras. We fine-tune a Spanish-specific and a Multilingual BERT (Bidirectional Encoder Representations from Transformers) model for this task with our limited amounts of training data. Our results indicate high performance of this approach, and the ability of the models to identify and filter documents including spatiotemporal events. The results are noteworthy since all documents in the dataset are related to narcotrafficking events (regardless of class membership), but not all describe a spatiotemporal event, yet the models exhibit high performance in detecting the ones describing the event.

Share

 
COinS
 
Jan 1st, 12:00 AM

Classifying Narcotrafficking Spatial Event Documents using Transformers

The low signal-to-noise ratio of information in unstructured textual documents remains a challenge in geo-text analyses. While information extraction approaches can be used to identify places, times and people mentioned in text, many of the identified entities may not be related to the analytical scenario, and may mislead spatial models without pre-filtering. On the other hand, detecting relevant spatiotemporal events and their attributes is of high interest to geovisual or computational solutions leveraging textual data. In this paper, we present a classification approach for identifying documents describing the time, locations and attributes of such spatiotemporal events (in Spanish), namely of drug trafficking activities in Honduras. We fine-tune a Spanish-specific and a Multilingual BERT (Bidirectional Encoder Representations from Transformers) model for this task with our limited amounts of training data. Our results indicate high performance of this approach, and the ability of the models to identify and filter documents including spatiotemporal events. The results are noteworthy since all documents in the dataset are related to narcotrafficking events (regardless of class membership), but not all describe a spatiotemporal event, yet the models exhibit high performance in detecting the ones describing the event.