Presenter Information

Armando J. Ochoa

Faculty Advisor

Mark A. Finlayson

Location

FIU Wellness & Recreation Center

Start Date

8-4-2019 2:00 PM

End Date

8-4-2019 4:00 PM

Session

Poster Session 3

Abstract

Internally, threat intelligence platforms use structured protocols (such as STIX2 or VERIS) to share and analyze cyber-security data but cyber-security-related events are usually re-ported and talked about using free-form texts such as blog posts, social media activity, and news articles. Due to the unstructured nature of these natural language texts, machines cannot easily consume and process them, which reduces how much information analysts and threat intelligence platforms have access to. To solve this problem, we propose implementing an Information Extraction System that takes unstructured texts within the cyber-security domain and processes and parses them into a structured format. We will create a pipeline which consumes free-from text articles taken from the VERIS Community Database and process them to create VERIS-sytle JSON reports. The Stanford CoreNLP toolkit will provide parsing, tokenization, and part-of-speech analysis to prepare the text for more complex information extraction techniques. Named Entity Recognition and Relationship Extraction of different levels, from regex to statistical models, together with rule-based analysis, are used to parse out the actors and events which contain cyber-related information and are relevant to the VERIS model. In order to complement the pipeline, we will also create a working set of annotated free-form texts out of a subset of the VERIS Community Database. To our knowledge, this Information Extraction system is the first one to be directly designed for the Cyber-Security Domain and the first one to leverage the VERIS format and the VERIS Community Database. This system would help bridge the gap between structured and unstructured data within the cybersecurity domain, allowing security specialist to easily consume the plethora of cyber-security-related data that is freely accessible over the internet.

Comments

**Abstract Only**

File Type

Poster

Share

COinS
 
Apr 8th, 2:00 PM Apr 8th, 4:00 PM

Analysis and Parsing of Unstructured Cyber-Security Data

FIU Wellness & Recreation Center

Internally, threat intelligence platforms use structured protocols (such as STIX2 or VERIS) to share and analyze cyber-security data but cyber-security-related events are usually re-ported and talked about using free-form texts such as blog posts, social media activity, and news articles. Due to the unstructured nature of these natural language texts, machines cannot easily consume and process them, which reduces how much information analysts and threat intelligence platforms have access to. To solve this problem, we propose implementing an Information Extraction System that takes unstructured texts within the cyber-security domain and processes and parses them into a structured format. We will create a pipeline which consumes free-from text articles taken from the VERIS Community Database and process them to create VERIS-sytle JSON reports. The Stanford CoreNLP toolkit will provide parsing, tokenization, and part-of-speech analysis to prepare the text for more complex information extraction techniques. Named Entity Recognition and Relationship Extraction of different levels, from regex to statistical models, together with rule-based analysis, are used to parse out the actors and events which contain cyber-related information and are relevant to the VERIS model. In order to complement the pipeline, we will also create a working set of annotated free-form texts out of a subset of the VERIS Community Database. To our knowledge, this Information Extraction system is the first one to be directly designed for the Cyber-Security Domain and the first one to leverage the VERIS format and the VERIS Community Database. This system would help bridge the gap between structured and unstructured data within the cybersecurity domain, allowing security specialist to easily consume the plethora of cyber-security-related data that is freely accessible over the internet.

Rights Statement

Rights Statement

In Copyright. URI: http://rightsstatements.org/vocab/InC/1.0/
This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s).