Brian Genz

Northwestern Mutual

Labeling Red: Harvesting Labeled Data from Adversary Simulations (pdf, video)

Attackers have a seemingly endless arsenal of tools and techniques at their disposal, while defenders must continuously strive to improve detection capabilities across the full spectrum of possible attack vectors. The MITRE ATT&CK Framework provides a useful collection of attacker tactics and techniques that enables a threat-focused approach to detection.

This talk will highlight methodologies and key lessons learned from an internal adversary simulation at a Fortune 100 company that evolved into a series of data science experiments designed to improve threat detection.

In 2017, we performed basic Exploratory Data Analysis (EDA) while working to improve detection engineering activities around post-exploitation attack techniques during adversary simulation exercises. We paused to ask the question, “Isn’t this labeled data we’re generating? The red team just performed this attack, and we can positively identify the observations that resulted from that attack technique.”

Could we move beyond clustering, we wondered, and into the realm of supervised learning? We had to consider whether we were introducing any biases based on the methodology used in selecting and executing the attack techniques. We were also curious as to whether the inherent attacker tradecraft principle of stealth might translate into imbalanced classes in the data, and to what extent.

We defined what we wanted to model: “Post-compromise attacker activity.” We focused on an initial technique: “DNS Exfiltration.” We defined the goal as, “Incorporate labeled attack data in training a model to classify DNS requests as ‘malicious’ or ‘benign.’

What started as a few questions and resulting brainstorming sessions eventually grew into a security data science practice supporting detection engineering, Digital Forensics and Incident Response (DFIR), Threat Hunting, and Threat Intelligence at the Fortune 100 company. This talk will step through the key aspects of the problem-solving approach used, with an emphasis on model selection and feature engineering.