CAMLIS 2021
DAY ONE
Bio: Katie Nickels
Katie Nickels is the Director of Intelligence for Red Canary as well as a SANS Certified Instructor for FOR578: Cyber Threat Intelligence and a non-resident Senior Fellow for the Atlantic Council’s Cyber Statecraft Initiative. She has worked on cyber threat intelligence (CTI), network defense, and incident response for over a decade for the U.S. DoD, MITRE, Raytheon, and ManTech.
Katie hails from a liberal arts background with degrees from Smith College and Georgetown University, embracing the power of applying liberal arts prowess to cybersecurity. Katie has shared her expertise with presentations, webcasts, podcasts, and blog posts, including a presentation at Black Hat as well as her personal blog, “Katie’s Five Cents." Katie has also served as a co-chair of the SANS CTI Summit and FIRST CTI Symposium. She was a 2020 recipient of the SANS Difference Maker Award and the 2018 recipient of the President's Award from the Women's Society of Cyberjutsu. She also serves as the Program Manager for the Cyberjutsu Girls Academy, which seeks to inspire young women to learn more about STEM. You can find Katie on Twitter @LiketheCoins.
Katie Nickels
Director of Intelligence at Red Canary SANS Certified Instructor | Atlantic Council Fellow
-
Speaker: Dmitrijs Trizna
In this article, we present a Shell Language Preprocessing (SLP) library, which implements tokenization and encoding directed on the parsing of Unix and Linux shell commands. We describe the rationale behind the need for a new approach with specific examples when conventional Natural Language Processing (NLP) pipelines fail. Furthermore, we evaluate our methodology on a security classification task against widely accepted information and communications technology (ICT) tokenization techniques and achieve significant improvement of an F1-score from 0.392 to 0.874.
-
Speaker: Awalin Sopan
Security Operations Centers (SOCs) process thousands of daily alerts, most of which are false positives. While alerts are sometimes deduplicated, valuable context from similar past alerts is rarely used, making triage inefficient. By identifying patterns across alerts—within and across organizations—there’s a major opportunity to reduce noise and help analysts prioritize what truly matters.
This work introduces a prototype system that clusters similar alerts using machine learning and nearest-neighbor methods, leveraging both past and real-time data. It assigns priority based on similarity to previously resolved alerts and provides cluster-level insights like maliciousness scores and trends. Through a web-based UI, analysts can quickly assess, group, and resolve alerts at scale—significantly improving speed, consistency, and overall workload efficiency.
-
Speaker: Emily Gray, Chae Clark, and Robert Gove
Threat detectors, ZEEK/BRO logs, incident reports, and the like identify and describe single events. Cyber attacks as a whole comprise many such events, and a fuller and more detailed understanding of an attack can be achieved when looking at multiple relevant, but not necessarily obviously connected, pieces of data at the same time. The motivation for this project is to model and detect these related pieces of data.
This work attempts campaign detection via determining whether pairs of logs are from the same attack. The primary mechanism is pair-wise comparison, but in aggregate this can be used to identify multiple data points as being from the same cyber event. Since cyber log data can come in many different formats, we employ a vectorization procedure to enable the use of multiple heterogeneous log types in the same dataset. Detecting campaigns, and presenting the findings to cyber analysts, can improve the quality and speed of their analysis. -
Speaker: Kate Highnam, Kai Arulkumaran, Zachary Hanif, Nicholas R. Jennings
We present the BETH cybersecurity dataset for anomaly detection and out-of-distribution analysis. With real "anomalies" collected using a novel tracking system, our dataset contains over eight million data points tracking 23 hosts. Each host has captured benign activity and, at most, a single attack, enabling cleaner behavioural analysis. In addition to being one of the most modern and extensive cybersecurity datasets available, BETH enables the development of anomaly detection algorithms on heterogeneously-structured real-world data, with clear downstream applications. We give details on the data collection, suggestions on pre-processing, and analysis with initial anomaly detection benchmarks on a subset of the data.
-
Speaker: Richard Harang, and Ethan M. Rudd
In this paper we describe the SOREL-20M (Sophos/ReversingLabs-20 Million) dataset: a large-scale dataset consisting of nearly 20 million files with pre-extracted features and metadata, high-quality labels derived from multiple sources, information about vendor detections of the malware samples at the time of collection, and additional ``tags'' related to each malware sample to serve as additional targets. In addition to features and metadata, we also provide approximately 10 million ``disarmed'' malware samples -- samples with both the optional\_headers.subsystem and file\_header.machine flags set to zero -- that may be used for further exploration of features and detection strategies. We also provide Python code to interact with the data and features, as well as baseline neural network and gradient boosted decision tree models and their results, with full training and evaluation code, to serve as a starting point for further experimentation.
-
Contributors/Speakers: Stephen Moskal and Shanchieh Jay Yang
With growing sophistication and volume of cyber attacks combined with complex network structures, it is becoming extremely difficult for security analysts to corroborate evidences to identify campaigns and threats on their network. So much so that organizations employ teams of security professionals just to keep up with vast amount of data presented to the analysts each day. This work develops HeAT (Heated Alert Triage): given a critical indicator of compromise (IoC) such as a severe IDS alert, HeAT produces a HeATed Attack Campaign depicting the actions that led up to the critical event including reconnaissance and initial exploitation stages. We define the concept of ``Alert Episode Heat" to represent the analysts opinion of how much an event contributes to the attack campaign of the critical IoC given their own knowledge of their network context and security expertise.Leveraging a network-agnostic feature set and a short but targeted training process, HeAT is able to realize insightful and concise attack campaigns for IoC's not observed before, compare attack strategies of different attackers with the same IoC, and also be applied across networks with the same degree of fidelity.HeAT maintains the analysts original assessment of the specified ``HeAT" regardless of the critical event being assessed or the network topology. We demonstrate the capabilities of HeAT with case studies using cyber-competition datasets to mimic how HeAT would be deployed in practice and assess the HeATed attack campaign from the analyst's perspective. With the goal of aiding the analyst in quickly finding further evidence of an attack, we show that HeAT immediately reveals each attack stage of an attack campaign embedded deeply within millions of alerts that may have needed a whole team of analysts to achieve otherwise.
-
Contributors/Speaker: Robert Gove and Nathan Danneman
Cyber defenders rely on incident reports to understand attacks, but traditional table formats make it hard to see the narrative across hundreds of log entries. To address this, we introduce a summarization algorithm and visualization tool that extract key events, entities, and relationships from log data. The system builds a dynamic graph of activity and presents it in a compact, Gantt-inspired view, making patterns and timelines easier to interpret. Analysts can adjust the level of summarization, and early feedback shows the visualization is faster and more intuitive than scanning tables.
Our algorithm identifies core event sequences and filters less relevant data using either simple thresholds or a weighted model trained on real incidents. Evaluations on red team data show the approach reduces report size by up to 61% while improving precision by 22%, helping analysts focus on high-value signals. Overall, this method streamlines incident analysis by making reports smaller, clearer, and more accurate.
-
Contributors/Speakers: Daniel Grahn, Junjie Zhang
As machine learning-assisted vulnerability detection research matures, it is critical to understand the datasets being used by existing papers. In this paper, we explore 7 C/C++ datasets and evaluate their suitability for machine learning-assisted vulnerability detection. We also present a new dataset, named Wild C, containing over $10.3$ million individual open-source C/C++ files -- a sufficiently large sample to be reasonably considered representative of typical C/C++ code. To facilitate comparison, we tokenize all of the datasets and perform the analysis at this level.
We make three primary contributions. First, while all the datasets differ from our Wild C dataset, some do so to a greater degree. This includes divergence in file lengths and token usage frequency. Additionally, none of the datasets contain the entirety of the C/C++ vocabulary. These missing tokens account for up to 11% of all token usage. Second, we find all the datasets contain duplication with some containing a significant amount. In the Juliet dataset, we describe augmentations of test cases making the dataset susceptible to data leakage. This augmentation occurs with such frequency that a random 80/20 split has roughly 58% overlap of the test with the training data. Finally, we collect and processes a large dataset of C code named Wild C. This dataset is designed to serve as a representative sample of all C/C++ code and is the basis for our analyses. -
Speaker: Chae Clark
Abstract Introduction and Background
Not all network intrusions are equally malicious. There are a limited number of security analysts, and due to the volume of security alerts reported daily, from various security tools (Wireshark, Nessus, Splunk, etc.), prioritizing which alerts to triage first is a necessity. Current protocols generally have an analyst use their experience to supply a severity only after investigating the alert. There is no automated system to prioritize or rank findings from multiple alerting systems before being presented to an analyst. This is challenging because alerts only give a snapshot of the activity, versus a full investigation that could correlate activity across multiple log types. In this report, we develop a Neural Network Regressor that predicts a severity score given a recorded incident.
It’s worth noting that common network alerting/security software has a severity rating system in place. In contrast, what we are proposing is a system for not only prioritizing alerts across multiple rule-based security systems, but our system also prioritizes alerts created by machine-learning based security tools.
Model Overview
At a high level, the model takes multi-modal features as inputs and embeds them individually into a numerical feature space. A self-attention layer is added to increase explainability before connecting to an output layer for severity scoring.
Input Features and Embedding
To allow broad use across different alerting and report-generating systems, the model accepts 8 unique features. TTP is the categorized attack as detailed in the MITRE framework[1]. The Attack Success feature details whether the intrusion was successful. Large networks can be attacked fairly regularly by external sources. It should be non-controversial to say that successful intrusions should be given higher severity than blocked intrusions. Duration details the amount of time the attack was active on the system (as recorded by the network sensors). Src./Dst. Role details the role within the enterprise (e.g. admin, contractor, external unknown, etc.) for the source and destination of the network traffic. Who is performing the action matters. A remote contractor and an internal admin SSH-ing to a restricted file-server have different implications. The Service Exploited gives details about the resources used during the communication. Did the user connect to the main Domain Controller or a random workstation? Location denotes which of the physical or virtual locations were targeted in the intrusion. This feature allows for differentiation when sensors monitor multiple enterprises. Finally, Description is the full textual description of the event. This feature should contain most other important context about the alert.
The textual features are embedded using a sentence transformer2 placing them into a 512-dimensional space. This component is especially important for the description input (as it’s unstructured text), but is also used for the non-numerical features as well. This allows flexibility in reporting style of different alert/incident types.
Attention and Output Head
Self attention applies a weight to each embedded feature. This adds an importance weighting that can be used to determine the most relevant features in the severity score to aid model explainability. A Regression Head is used to predict a single positive value from 1 - 4. Rounding is used to produce an integer for evaluation purposes. A Classification Head was considered to predict a specific priority label directly, but lacked the ordinal output of the Regression Head.
Experimental Data
To train and evaluate our model we will use a set of human investigated incidents covering several years of attempted intrusions spanning several Department of Defence enterprises. These reports contain all of the necessary features to train our model. A positive of this dataset spanning multiple locations and time ranges is that the incidents were created by multiple alerting systems, investigated by numerous analysts, and cover a wide range of attack types. For qualitative analysis, we use a dataset of incident reports created by a variety of machine learning tools developed to detect network intrusions from nation-state actors.
Results
Using hyperparameter tuning with a small holdout set and the Adam optimizer, we train our model to optimize mean-squared-error. The results show that on an unseen evaluation set, we see high Precision and Recall. Qualitatively, when analyzing prioritization of machine-learning generated reports, we see that the higher priority reports are related to suspicious authentication and external data transfers. Reports given less priority were related to more nebulous/general anomaly detections. -
Contributors/Speakers: Robert Joyce, Edward Raff, Charles Nicholas
Although groups of strongly correlated antivirus engines are known to exist,
at present there is limited understanding of how or why these correlations came to be. Using a corpus of 25 million VirusTotal reports representing over a decade of antivirus scan data, we challenge prevailing wisdom that these correlations primarily originate from "first-order" interactions such as antivirus vendors copying the labels of leading vendors. We introduce the Temporal Rank-1 Similarity Matrix decomposition (R1SM-T) in order to investigate the origins of these correlations and to model how consensus amongst antivirus engines changes over time. We reveal that first-order interactions do not explain as much behavior in antivirus correlation as previously thought, and that the relationships between antivirus engines are highly volatile. We make recommendations on items in need of future study and consideration based on our findings.
DAY TWO
Bio: Nicolas Papernot
Nicolas Papernot is an Assistant Professor in the Department of Electrical and Computer Engineering and the Department of Computer Science at the University of Toronto, and a faculty member at the Vector Institute where he holds a Canada CIFAR AI Chair, and a faculty affiliate at the Schwartz Reisman Institute.
His research interests are at the intersection of security, privacy, and machine learning. A sample of his research includes cleverhans.io which he co-authored, and research in proof-of-learning, collaborative learning beyond federation, dataset inference, machine unlearning, differentially private ML, and adversarial examples.
Prof. Papernot earned a Ph.D. in Computer Science and Engineering at the Pennsylvania State University, working with Prof. Patrick McDaniel and was supported by a Google PhD Fellowship. Upon graduating, he spent a year at Google Brain in Úlfar Erlingsson's group.
Nicolas Papernot
Assistant Professor, Dept. of Electrical & Computer Engineering and Computer Science, University of Toronto
-
Contributors/Speakers: Katie Paxton-Fear, Duncan Hodges, and Oliver Buckley
Insider threats—attacks by employees or trusted individuals—are often more damaging and harder to detect than external attacks. Insiders have legitimate access, knowledge of systems, and can blend malicious actions with normal behavior, allowing attacks to go unnoticed for long periods. Detecting them is challenging because suspicious actions (like accessing sensitive files) may also be part of routine work, and insiders may bypass controls or exploit organizational gaps.
This work introduces a dynamic insider threat model built from large collections of real incident narratives using NLP and topic modeling. It identifies key themes like motivation, methods, and organizational weaknesses, then visualizes incidents as structured graphs to help investigators explore key questions and decision points. The approach supports better incident analysis and prevention by turning unstructured reports into actionable insights, offering a scalable, data-driven tool for understanding insider threats.
-
Contributors/Speakers: Ethan Rudd and David Krisiloff
Malware classification in the wild remains a difficult problem due in part to concept drift and out-of-distribution data. Concept drift occurs when the statistical properties of target classes, e.g., malware or goodware, change over time, and practical application of machine learning (ML) for information security can be framed as an Open Set Recognition problem. Under an Open Set paradigm, samples that are ill-supported by data in the training set occur at deployment and one must be able to flag these unsupported samples as “unknowns” to differentiate them from properly classified samples. Open Set Recognition was formalized in Scheirer et. al. [1] as a risk minimization problem.
ML deployments for malware detection in the industry typically address concept drift through periodic model retrains on novel data at some specified cadence and do not address the open set problem at all. In practice, a specified cadence for model updates could be replaced by a measure of concept drift, and rather than accepting potential false positives from ‘unknown’ samples and dealing with them as they occur, some measure of support could be used instead to flag these samples and pre-emptively route them to auxiliary detection technologies, least expensive to most expensive (e.g., when static detection is ill-supported route to dynamic detection; when dynamic detection is ill-supported, route to an analyst). Thus, there is motivation for a malware classification model whose representation can be used to provide measurements of statistical support and concept drift for each sample.
While discriminative models are effective at encouraging class separation in a latent space, they are susceptible to concept drift and are not guaranteed to work well in an Open Set Recognition regime, particularly for losses which aim to force separation at the margin but do little to bound the span of class predictions. Moreover, losses which rely on an associated sample label can only be evaluated during training and validation stages; not on new samples encountered after deployment.
By contrast, generative models aim to characterize data distributions and can specifically shape the distribution of sample points in the latent space. For example, Variational Auto-Encoders (VAEs) aim to enforce specific Gaussian distributional constraints which can be used to bound the spread of samples in latent space. Moreover, VAE loss functions can often be computed irrespective of class label, as loss terms are typically evaluated with respect to either data reconstruction, divergence from a known distribution, or the veracity of a sample (real/fake) as is commonly devised in adversarial learning paradigms.
In this presentation, we explore methods to combine loss functions from generative models with standard discriminative losses into multi-objective hybrid discriminative-generative models. We then discuss the impacts on classification performance and training of these auxiliary loss terms on malware detection through examples on open-source malware and goodware datasets (e.g., EMBER 2018, SOREL 20M), applying open set evaluation protocols [1]. We then investigate the characteristics of the associated latent spaces, motivate measurements of concept drift between source and target distributions, and implement classification confidence measures. Additionally, we compare how thresholding generative losses during deployment might be used to enhance classification confidence and reduce open space risk.
[1] W. J. Scheirer, A. Rocha, A. Sapkota, and T. E. Boult, “Towards open set recognition,” IEEE T-PAMI, vol. 36, July 2013. -
Contributors/Speakers:Nancirose Piazza, Yaser Faghan,Vahid Behzadan and Ali Fathi
Deep Reinforcement Learning (DRL) has become an appealing solution to algorithmic trading such as high frequency trading of stocks and cyptocurrencies. However, DRL policies are shown to be susceptible to adversarial attacks. It follows that algorithmic trading DRL agents may also be compromised by such adversarial techniques, leading to policy manipulation. In this paper, we develop a threat model for deep trading policies, and propose two active attack techniques for manipulating the performance of such policies at test-time. Additionally, we explore the exploitation of a passive attack based on adversarial policy imitation. Furthermore, we demonstrate the effectiveness of the proposed attacks against benchmark and real-world DQN trading agents.
-
Speaker: Aditya Kuppa
Machine Learning methods are playing a vital role in combating ever-evolving threats in the Cybersecurity domain. Explanation methods that shed light on the decision process of black-box classifiers are one of the biggest drivers in the successful adoption of these models. Explaining predictions that address ‘Why?/Why Not?’ questions help users/stakeholders/analysts understand and accept the predicted outputs with confidence and build trust. Counterfactual explanations are gaining popularity as an alternative method to help users not only understand the decisions of black-box models (why?) but also provide a mechanism to highlight mutually exclusive data instances that would change the outcomes (why not?).
Recent Explainable Artificial Intelligence literature has focused on three main areas : (a) creating and improving explainability methods that help users better understand how the internal of ML models work as well as their outputs; (b) attacks on interpreters with a white-box setting; (c) defining the relevant properties, metrics of explanations generated by models. Nevertheless, there is no thorough study of how the model explanations can introduce new attack surface to the underlying systems. A motivated adversary can leverage the information provided by explanations to launch membership inference, model extraction attacks to compromise the overall privacy of the system. Similarly, explanations can also facilitate powerful evasion attacks such as poisoning and back door attacks.
In this paper, we cover this gap by tackling various cyber security properties and threat models related to counterfactual explanations. We study black-box attacks that leverages Explainable Artificial Intelligence (XAI) methods to compromise confidentiality and privacy properties of underlying classifiers. We validate our approach with datasets and models used in cyber security domain to demonstrate that our method achieves the attacker's goal under threat models which reflect the real-world settings. -
Speaker: Tamás Vörös, Rich Harang, Josh Saxe, and Konstantin Berlin
Most modern malware like Remote Administration Tools, ransomware, coin miners and espionage tools require communication with the internet, as they need to accept commands, transmit payloads, or exfiltrate sensitive information. Identifying such malicious communication potentially requires firewalls to decrypt encrypted traffic, make expensive queries to cloud infrastructure, or otherwise perform resource intensive computations, making such data collection impractical for all passing traffic.
IP allow/block lists can potentially be used as a computational cheap pre-filter for these expensive operations but cannot be applied to unlisted IPs. Here we demonstrate that we can effectively expand upon the coverage of an allow or blocklist by building a machine learning (ML) model that is able to accurately predict if a previously unseen IP address is likely to be involved in known malicious behavior. While predicting malicious traffic based only on the IP address is difficult, we greatly improve on existing baseline with two different deep learning architectures and additionally utilizing pretraining.
We test our approaches on two distinct datasets and show that combining our deep learning architectures and pretraining improves the area under the curve from .89 and .992 to .93 and .995 respectively. Our results show the viability of building an ML model as a replacement or augmentation to traditional allow and blocklists, and importantly should generalize to IPv6 data, where maintaining such lists manually might become intractable.
-
Contributors/Speakers: Xigao Li, David Krisiloff, and Scott Coull
Traditional antivirus systems rely on static analysis for fast, on-device malware detection, but these methods are easily evaded through techniques like packing. Dynamic sandbox analysis is more robust but slow and resource-intensive. This work explores binary emulation as a middle ground, using tools like SpeakEasy to simulate program execution on-device and capture rich behavioral data such as API calls, memory usage, and network activity for machine learning models.
Using emulation data, we built models for malware detection and family classification, combining API call sequences and memory features with techniques like gradient boosting and neural networks. Results show emulation-based models can correct up to 50% of errors from static models, and combining both approaches improves performance further. This method balances accuracy, speed, and resource use, offering a scalable alternative for detecting evasive malware.
-
Contributors/Speakers: Sunil Vasisht, Philip Tully, Jay Gibble
Malware analysts often rely on static and dynamic analysis to understand a binary’s behavior, but deeper insights typically require reverse engineering using tools like IDA Pro or Ghidra. These disassemblers convert binaries into low-level assembly and higher-level pseudocode while performing operations like function recognition and auto-naming. Function names are augmented using signatures and prior human annotations, but coverage is limited: most functions in a fresh malware sample remain unnamed, slowing triage and analysis. Improving function name coverage is therefore crucial for accelerating malware investigation workflows.
We frame function name prediction as a neural machine translation problem, using structured disassembly representations such as ASTs and CFGs. AST paths and leaf tokens are embedded and encoded using BiLSTMs, while CFGs capture control flow and call relationships for graph-based modeling. Our dataset includes 360k functions from 4.3k malicious PE files, annotated with IDA-generated names, expert-labeled metadata, and capabilities from tools like capa. Evaluations using F1 scores and expert feedback show that leveraging syntactic structure enables accurate natural language function annotations, reducing reverse engineering effort and offering potential for integration into IDA plug-ins or scalable malware analysis pipelines.
-
Contributors/Speakers: Gordon Werner and S. Jay Yang
Intrusion detection systems generate a large number of streaming alerts. It can be overwhelming for analysts to quickly and effectively understand behavior within a network. Critical alerts occur so infrequently that it can be difficult to determine what surrounding alerts are actually related to them, providing a deep challenge to analysts. What if an analyst could provide a collection of known critical alerts and quickly receive a summary detailing their temporal behaviors within a network as well consistently co-occurring signatures that pre-empt or succeed the critical action? What if this information could be provided in near real time, with no training data, and with the capability to adapt to changing temporal patterns and relationships across signatures? The Concept Learning for Intrusion Event Aggregation in Realtime with Rare co-Occurring Alert signature Discovery (CLEAR-ROAD) answers that question, revealing consistent co-occurrences derived from alerts with similar temporal arrival patterns. Alerts are aggregated, or sequenced, based on their unique and invariant arrival patterns, not external training data. The signature patterns expressed by such temporal activity are then discovered through pattern mining techniques. A constrained databasing approach is used to reduce the number of sequences processed by an average of 90\% for individual streams. Case studies are conducted to analyze the co-occurring signatures found across two real world datasets, one from a SOC operation and another from a penetration testing competition. CLEAR-ROAD is able to find consistently co-occurring signatures across streams and datasets quickly and effectively. Differences in temporal behavior are also found to lead to unique co-occurring signatures for some critical alerts. Case studies show the clear and near-immediate benefits provided to analysts by the system.
-
Contributors/Speakers: Nick Gregory and Harini Kannan
In recent years, exploits like Spectre, Meltdown, Rowhammer, and Return Oriented Programming (ROP) have been detected using Hardware Performance Counters. But to date, only relatively simple and well-understood counters have been used, representing just a tiny fraction of the information we can glean from the system. What's worse, using only well-known counters as detectors for these attacks has a huge disadvantage - an attacker can easily bypass known counter-based detection techniques with minimal changes to existing sample exploit code. Uncovering the treasure trove of overlooked and undocumented counters is necessary if we are to both build defenses against these attacks and anticipate how an adversary could bypass our defenses.
In this paper, we’ll first introduce our version of Spectre variant 4 with evasive changes that can bypass any detections using conventional cache miss, branch miss, and branch misprediction counters. We’ll then show how our model using select undocumented counters is able to detect this new edited variant, and how it is also able to detect a novel Spectre implementation submitted to Virus Total.
-
Speaker: Andy Applebaum
The past few decades have shown that machine learning (ML) can be a powerful tool for static malware detection, with papers today still purporting to eek out slight accuracy improvements. At the same time, researchers have noted that ML-based classifiers are susceptible to adversarial ML, whereby attackers can exploit underlying weaknesses in ML techniques to specifically tailor their malware to evade these classifiers. Defending against these kinds of attacks has proven challenging, particularly for those not steeped in the field.
To help tighten this gap, we have developed Kampff, a Windows PE malware classifier designed to detect attempts at evasion. Kampff uses a portfolio of classifiers, building on a primary classifier designed to detect ``normal'' malware by attaching classifiers designed to specific types of adversarial malware to it. While simplistic, this approach is able to make it significantly harder -- though not impossible -- to bypass the primary classifier. -
Speaker: Sven Cattell
Adversarial attacks against AI products are more than just static events that the model gets wrong. In much of the literature we generate N points using the attack and report the accuracy of the model against those N points. When these are cheap to produce, like in the whitebox case, this is reasonable. In the blackbox case there may be thousands of queries that may take days or weeks if it's behind a rate limited API. If the attack is successful it will probably get reused. We've previously shown that we can monitor the overall adversarial drift using a bayesian approach with a cover tree. In this paper we show evidence that black box adversarial attacks induce a high measured drift, even when attackers are attempting to hide in benign traffic.
-
Contributors/Speaker: Shubham Jain, Ana-Maria Cretu and Yves-Alexandre de Montjoye
End-to-end encryption (E2EE) by messaging platforms enable people to securely and privately communicate with one another. Its widespread adoption however raised concerns that illegal content might now be shared undetected. Following the global pushback against key escrow systems, client-side scanning based on perceptual hashing has been recently proposed by governments and researchers to detect illegal content in E2EE communications. We here propose the first framework to evaluate the robustness of perceptual hashing-based client-side scanning to detection avoidance attacks and show current systems to not be robust. More specifically, we propose three adversarial attacks ---a general black-box attack and two white-box attacks for discrete cosine transform-based algorithms-- against perceptual hashing algorithms. In a large-scale evaluation, we show perceptual hashing-based client-side scanning mechanisms to be highly vulnerable to detection avoidance attacks in a black-box setting, with more than 99.9\% of images successfully attacked while preserving the content of the image. We furthermore show our attack to generate diverse perturbations, strongly suggesting that straightforward mitigation strategies would be ineffective. Finally, we show that the larger thresholds necessary to make the attack harder would probably require more than one billion images to be flagged and decrypted daily, raising strong privacy concerns. Taken together, our results shed serious doubts on the robustness of perceptual hashing-based client-side scanning mechanisms currently proposed by governments, organizations, and researchers around the world.