Chae Clark

Two Six Technologies

Automated and Explained Prioritization of Incident Reports from Multiple Sources (pdf)

Abstract Introduction and Background
Not all network intrusions are equally malicious. There are a limited number of security analysts, and due to the volume of security alerts reported daily, from various security tools (Wireshark, Nessus, Splunk, etc.), prioritizing which alerts to triage first is a necessity. Current protocols generally have an analyst use their experience to supply a severity only after investigating the alert. There is no automated system to prioritize or rank findings from multiple alerting systems before being presented to an analyst. This is challenging because alerts only give a snapshot of the activity, versus a full investigation that could correlate activity across multiple log types. In this report, we develop a Neural Network Regressor that predicts a severity score given a recorded incident.

It’s worth noting that common network alerting/security software has a severity rating system in place. In contrast, what we are proposing is a system for not only prioritizing alerts across multiple rule-based security systems, but our system also prioritizes alerts created by machine-learning based security tools.

Model Overview
At a high level, the model takes multi-modal features as inputs and embeds them individually into a numerical feature space. A self-attention layer is added to increase explainability before connecting to an output layer for severity scoring.

Input Features and Embedding
To allow broad use across different alerting and report-generating systems, the model accepts 8 unique features. TTP is the categorized attack as detailed in the MITRE framework[1]. The Attack Success feature details whether the intrusion was successful. Large networks can be attacked fairly regularly by external sources. It should be non-controversial to say that successful intrusions should be given higher severity than blocked intrusions. Duration details the amount of time the attack was active on the system (as recorded by the network sensors). Src./Dst. Role details the role within the enterprise (e.g. admin, contractor, external unknown, etc.) for the source and destination of the network traffic. Who is performing the action matters. A remote contractor and an internal admin SSH-ing to a restricted file-server have different implications. The Service Exploited gives details about the resources used during the communication. Did the user connect to the main Domain Controller or a random workstation? Location denotes which of the physical or virtual locations were targeted in the intrusion. This feature allows for differentiation when sensors monitor multiple enterprises. Finally, Description is the full textual description of the event. This feature should contain most other important context about the alert.

The textual features are embedded using a sentence transformer2 placing them into a 512-dimensional space. This component is especially important for the description input (as it’s unstructured text), but is also used for the non-numerical features as well. This allows flexibility in reporting style of different alert/incident types.

Attention and Output Head
Self attention applies a weight to each embedded feature. This adds an importance weighting that can be used to determine the most relevant features in the severity score to aid model explainability. A Regression Head is used to predict a single positive value from 1 - 4. Rounding is used to produce an integer for evaluation purposes. A Classification Head was considered to predict a specific priority label directly, but lacked the ordinal output of the Regression Head.

Experimental Data
To train and evaluate our model we will use a set of human investigated incidents covering several years of attempted intrusions spanning several Department of Defence enterprises. These reports contain all of the necessary features to train our model. A positive of this dataset spanning multiple locations and time ranges is that the incidents were created by multiple alerting systems, investigated by numerous analysts, and cover a wide range of attack types. For qualitative analysis, we use a dataset of incident reports created by a variety of machine learning tools developed to detect network intrusions from nation-state actors.

Results
Using hyperparameter tuning with a small holdout set and the Adam optimizer, we train our model to optimize mean-squared-error. The results show that on an unseen evaluation set, we see high Precision and Recall. Qualitatively, when analyzing prioritization of machine-learning generated reports, we see that the higher priority reports are related to suspicious authentication and external data transfers. Reports given less priority were related to more nebulous/general anomaly detections.