CAMLIS 2022

DAY ONE

Keynote: PE Binary Classification Pitfalls

Amanda Rousseau absolutely loves taking apart malware. She currently works on the Microsoft Offensive Research & Security Engineering (MORSE) team to help find vulnerabilities in the Windows OS. Previously, she worked as an Offensive Security Engineer on the Red Team at Facebook, Malware Researcher at Endgame, FireEye, and the U.S. Department of Defense Cyber Crime Center. Amanda received a MS in Information Systems Engineering from Johns Hopkins University.

Her research interests include malware evasion techniques, rootkits, dynamic behavior classification, and developing runtime kernel detections. You can find Amanda on Twitter at @malwareunicorn.

Amanda Rousseau

Malware Researcher, Microsoft MORSE Team

Speakers/Contributors: Ethan Rudd, David Krisiloff, Daniel Olszewski, Ed Raff, and James Holt
Machine learning-based malware classification has become a key component of modern defense-in-depth strategies, with focus placed on the binary classification task of malware detection. These detection models are typically combined with other toolchains, which provide additional context necessary for triage and remediation, including detection names, capability, and type information. The resulting systems are often complex and interconnected, incurring significant technical debt, infrastructure costs, and inevitable errors.
In this paper, we examine the feasibility of using machine learning to streamline malware analysis pipelines in a manner which minimizes potential risks and costs while preserving flexibility and functionality. To this end, we explore the use of metric learning to embed malicious and benign samples in a low-dimensional vector space with enriched capability information for downstream use in a variety of applications, including detection, family classification, and malware attribute classification.
Specifically, we enrich labeling on malicious and benign PE files from the EMBER dataset using Mandiant’s CAPA tool, an open-source toolchain which uses disassembly and subject matter expert (SME) derived rules and heuristics to determine malicious capabilities. Using these CAPA labels, we derive several different types of metric embeddings utilizing an embedding neural network trained via contrastive loss, Spearman rank correlation on malware similarity, and combinations thereof.
We then examine performance on a variety of transfer tasks performed on the EMBER and SOREL datasets. We show that for a variety of transfer tasks, we are able to utilize relatively low-dimensional metric embeddings with little decay in performance, which offers the potential to quickly retrain for a variety of transfer tasks. The low-dimensional representations offer added potential to significantly reduce training and storage overhead when performing retrains or transferring to additional downstream tasks.
Presentation > | Video >
Speakers/Contributors: Andre Nguyen, Richard Zak, Luke Edward Richards, Maya Fuchs, Fred Lu, Robert Brandon, Garay David Lopez Munoz, Ed Raff, Charles Nicholas, and James Holt
As organizations in government and industry increasingly rely on digitized data and networked computer systems, they face a growing risk of exposure to cyber attacks. Automated methods such as machine learning based malware detection algorithms have helped analysts to sift through large amounts of data. However, it is still too expensive to always run the best algorithms when massive amounts of new data are generated every day.

In this work, we demonstrate the benefits of leveraging uncertainty estimation when multiple algorithms with different strengths and costs are used as a part of a larger machine learning malware detection system. In particular, we introduce a novel method in which cheaper machine learning algorithms can choose to defer to costlier models when their own predictions are uncertain and the more expensive model is expected to do well.

We first use this method to detect specific capabilities in executable files, then extend it to general malware detection. In both cases, we are able to maintain high accuracy while minimizing the use of the more costly algorithms. With capability detection, we achieve an average 99.9% of correctly labeled capabilities for half the computational cost of using the expensive model throughout. For general malware detection, using this method to strategically balance the use of static and dynamic analysis saves a year's worth of compute time.
Presentation > | Video >
Contributors/Speakers: Lindsey Lack and John Conwell
Presentation > | Video >
Contributors/Speakers: Tadesse Zemichael and Rachel Allen
Azure active directory (Azure-AD) is an identity and access management service, that helps users to access external and internal resources such as Office365, and SaaS applications. The Sign-in logs in the Azure-AD log identify who the user is, how the application is used for the access, and the target accessed by the identity [1]. At a given time t, a service s is requested by user u from device d using the authentication mechanism of a to be either allowed or blocked. Previous works on anomalous authentication detection include applying blackbox ML models on handcrafted features extracted from authentication logs or rule-based models [8]. The closest work on using graphs for malicious authentication detection includes [9], where a graph is built for each user login log and then graph features are extracted as the next step to be used for similarity metrics. Our work closely follows the success of heterogenous GNN embedding on cyber applications such as fraud detection [2,7], and cyber-attack detection on prevalence datasets. Unlike earlier models, this work uses heterogeneous graphs for authentication graph modeling and relational GNN embedding for capturing relations among different entities. This allows us to take advantage of relations among users/services, and at the same time avoids the feature extracting phase [8]. In the end, the model learns both from structural identity and the unique feature identity of individual users. The drawback of a rule-based or feature-based system is, that it fails to generalize for new attacks and rules need to be maintained often. An evolving attack and connected malicious users across the network are hard to detect through feature/rule-based methods. This work presents a heterogenous relational convolutional graph embedding approach for malicious Azure-AD sign-in detection. First, to overcome node feature sparsity and capture activity aggregation is done based on windows time t and node tuples (User, Device, Service). The nodes are separated with target node “authentication” to capture dynamic sign-in behavior and other static nodes (user, device, and service). This allows us to associate all time-changing features with authentication nodes and eliminates modeling the dynamic evolving nature of the graph, as every authentication is distinct in the time domain. Finally, a heterogenous relational graph convolution network (R-GCN) [5] is trained to output the embedding of “authentication”, where the embedding of authentication is fed into a binary classifier or anomaly detection algorithm for scoring purposes. We report a comparison of the model's performance on real data extracted from real-world azure authentication logs.
Presentation > | Video >
Speaker: Doug Sibley
We propose a self-supervised approach to generating features for arbitrary byte sequences by training a convolutional autoencoder directly on raw bytes. The low vocabulary of this task (256) makes it viable to train on sequences at least 1MB in size. We evaluate this approach to byte-level feature engineering by first examining how accurate the autoencoder can be at reconstructing a variety of datasets, then testing this approach specifically on SOREL malware samples, extracting the learned features and comparing them against the EMBER V2 features for the task of malware tagging. Our results suggest that the learned features from the convolutional autoencoder rival those of the human engineered set without requiring domain-specific preprocessing of the portable executable file.
Presentation > | Video >
Speaker: Rob Brandon
Presentation > | Video >
Speaker: Matthew Berninger
‍ Video >
Contributors/Speakers: Subhabrata Majumdar and Ganesh Subramanium
We investigate the detection of botnet command and control (C2) hosts in massive IP traffic using machine learning methods. To this end, we use the NetFlow data---the industry standard for monitoring of IP traffic---and ML models using two sets of features: conventional NetFlow variables and distributional features based on NetFlow variables. In addition to using static summaries of NetFlow features, we use quantiles of their IP-level distributions as input features in predictive models to predict whether an IP belongs to known botnet families. These models were used to develop intrusion detection systems to predict traffic traces identified with malicious attacks. The results are validated by matching predictions to existing denylists of published malicious IP addresses and deep packet inspection. The usage of our proposed novel distributional features, combined with techniques that enable modelling complex input feature spaces result in highly accurate predictions of our trained models.
Presentation > | Video >
Contributors/Speakers: Rumman Chowdhury, Ben Colman, Jutta Williams, and Subho Majumdar
Session Unavailible

DAY TWO

Keynote: Lessons Learned in Red Teaming AI Systems in High-Stakes Environments

Video >

Mikel D. Rodriguez is the director of The Artificial Intelligence and Autonomy Innovation Center at MITRE labs and leads the AI Red Team for the Department of Defense. Being part of a not-for-profit in the public interest Dr. Rodriguez works with a team that can look beyond the bottom line of any particular product or organization and focus harnessing AI to help address national and global challenges. For the past twenty years his research has focused on exploring how artificial intelligence and in particular Computer Vision can be used to help solve problems for a safer world.

He obtained his PhD at UCF's Center for Research in Computer Vision. He was a visiting researcher at the Robotics Institute at Carnegie Mellon and a post-doctoral fellow at INRIA at the Département d'Informatique of Ecole Normale Supérieure in Paris, France. Dr. Rodriguez is the editor of the ACM Journal for Responsible Computing. Dr. Rodriguez was the chair of the ODNI Video Analytics Research Working Group and is a senior technical advisor for the Pentagon's Project MAVEN.

He has served in the program committee for IEEE Computer Vision and Pattern Recognition, IEEE International Conference on Computer Vision, and IEEE Transactions on Pattern Analysis and Machine Intelligence.

Mikel D. Rodriguez

Director of AI & Autonomy Innovation, MITRE | Lead of DoD AI Red Team

Contributors/Speakers: Francois Labreche and Serge-Oliver Paquette
Everyday, an increasing number of new software is found to be vulnerable to exploitation. Such vulnerabilities are disclosed through publicly available databases, such as the National Vulnerability Database (NVD). However, the rate of disclosures now far outpaces the ability of any single research team or remediation team to handle them all. In this paper, we present a framework that not only predicts the vulnerabilities that will actually be exploited by malicious actors or malware, but also which vulnerabilities can go under the radar, escaping the trending discussions of online cybersecurity communities. This is achieved by leveraging topic modeling in a novel way, combining a threat score and a trend score. The interpretable nature of such topic models enables security teams to dig deeper into the predictions of our model, making it a valuable tool for their remediation and investigative work.
Presentation > | Video >
Contributors/Speakers: Kobra Khanmohammadi and Raphael Khoury
The National Vulnerability Disclosure Database is an invaluable source of information for security professionals and researchers. However, in some cases, a vulnerability report is initially published with incomplete information, a situation that complicates incident response and mitigation. In this paper, we perform an empirical study of vulnerabilities that are initially submitted with an incomplete report, and present key findings related to their frequency, nature, and the time needed to update them. We further present a novel ticketing process that is tailored to addressing the problems related to such vulnerabilities and demonstrate the use of this system with a real-life use case.
Presentation > | Video >
Contributors/Speakers: Maeve Mulholland, Tim Nary, and Fred Frey
In this work we demonstrate a method for mining registry data for signals associated with a target behavior. This methodology allows threat researchers to identify immutable signatures of a behavior without intensive processing of registry logs. We present a strategy for normalizing registry keys and then clustering them in order to make a registry log amenable to frequent item set mining. We show that by recording scripted instances of a behavior of interest, one can generate a set of time-bounded registry logs that can be mined for keys that are linked to the behavior of interest. Application of this methodology in a threat persistence scenario shows that the key associated with four different attack techniques can be easily extracted from a raw registry log with only an example script of the techniques and no prior knowledge of what the techniques entail.
Presentation > | Video >
Contributors/Speakers: Bhavna Soman, MohamadAli Torkamani, Michal Morais, Jeffery Bickford, and Baris Coskun
Data labels in the security field are frequently noisy, limited, or biased towards a subset of the population. As a result, commonplace evaluation methods such as accuracy, precision and recall metrics, or analysis of performance curves computed from labeled datasets do not provide sufficient confidence in the real-world performance of a machine learning (ML) model. This has slowed the adoption of machine learning in the field. In the industry today, we rely on domain expertise and lengthy manual evaluation to build this confidence before shipping a new model for security applications. In this paper, we introduce Firenze, a novel framework for comparative evaluation of ML models' performance using domain expertise, encoded into scalable functions called markers. We show that markers computed and combined over select subsets of samples called regions of interest can provide a robust estimate of their real-world performances. Critically, we use statistical hypothesis testing to ensure that observed differences-and therefore conclusions emerging from our framework-are more prominent than that observable from the noise alone. Using simulations and two real-world datasets for malware and domain-name-service reputation detection, we illustrate our approach's effectiveness, limitations, and insights. Taken together, we propose Firenze as a resource for fast, interpretable, and collaborative model development and evaluation by mixed teams of researchers, domain experts, and business owners.
Presentation > | Video >
Contributors/Speakers: Myles Foley, Mia Wang, Zoe M., Chris Hicks, and Vasilios Mavroudis
Computer network defence is a complicated task that has necessitated a high degree of human involvement. However, with recent advancements in machine learning, fully autonomous network defence is becoming increasingly plausible. This paper introduces an end to-end methodology for studying attack strategies, designing defence agents and explaining their operation. First, using state diagrams, we visualise adversarial behaviour to gain insight about potential points of intervention and inform the design of our defensive models. We opt to use a set of deep reinforcement learning agents trained on different parts of the task and organised in a shallow hierarchy. Our evaluation shows that the resulting design achieves a substantial performance improvement compared to prior work. Finally, to better investigate the decision-making process of our agents, we complete our analysis with a feature ablation and importance study.
Presentation > | Video >
Contributors/Speakers: Andrew Hong, Peter Malinovsky, and Suresh Damodaran
Modern and legacy cyber-physical systems produce logs of operational behavior from sensors to network traffic; analyzing these heterogeneous logs to consistently identify attack signals is a difficult problem. In this work, we propose a flexible temporal non-parametric Bayesian framework for identifying these attacks based on sticky Hierarchical Dirichlet Process Hidden Markov Model (sHDP-HMM). The advantage of this approach is that it does not require detailed information on the system architecture, and it works for systems with unknown multimodal behavior, yielding interpretable inference. We demonstrate the efficacy of this framework for accurate identification of attacks from cyber and physical attack vectors on two different CPS: an avionics testbed and a consumer robot.
Presentation > | Video >
Contributors/Speakers: Becca Lynch and Richard Harang
Video >
Speaker: Michael Moran
Video >
Speakers: Paolo Di Prodi
The global cybersecurity market is rapidly growing, driving collaboration across organizations like NIST, MITRE, and OASIS to create standards such as CVE, CWE, STIX, and ATT&CK. However, this has resulted in fragmented knowledge silos and low adoption of unifying ontologies due to their complexity. Most organizations instead rely on custom database schemas, creating challenges when sharing threat intelligence—typically done through STIX, despite its limitations.
The OmnibusCyber project addresses this by providing a standardized internal data model built on TypeDB, a strongly typed, expressive database that avoids the complexity of traditional semantic web frameworks like OWL/RDF. It enables direct mapping of complex relationships without heavy normalization, making it easier to manage, query, and analyze cybersecurity data. The platform includes a unified schema, data integration tools, and advanced analytics capabilities, helping teams centralize their data and uncover insights with less technical overhead.
Presentation > | Video >
Contributors/Speakers: Kristian Robert Langholm and Ankur Mohan
Video >

CAMLIS 2022

DAY ONE

Keynote: PE Binary Classification Pitfalls

DAY TWO

Keynote: Lessons Learned in Red Teaming AI Systems in High-Stakes Environments

Additional Content

Menu

Compliance