Privacy-preserving Surveillance Methods using Homomorphic Encryption

William Bowditch, Bill Buchanan, Will Abramson, Nikolaos Pitropakis, Adam Hall

Data analysis and machine learning methods often involve the processing of cleartext data, and where this could breach the rights to privacy. Increasingly, we must use encryption to protect all states of the data: in-transit, at-rest, and in-memory. While tunnelling and symmetric key encryption are often used to protect data in-transit and at-rest, our major challenge is to protect data within memory, while still retaining its value. This challenge is addressed by homomorphic encryption, which enables mathematical operations to be performed on the data, without actually revealing its original contents. Within surveillance, too, fundamentally we must respect the rights to privacy of those subjects who are not actually involved within an investigation. Homomorphic encryption, thus, could have a major role in protecting the rights to privacy, while providing ways to learn from captured data. Our work presents a novel use case and evaluation in the usage of homomorphic encryption and machine learning. It uses scikit-learn and Python implementations of Pailler and FV schemes, in order to create a homomorphic machine learning classification technique that allows model owners to classify data without jeopardizing user privacy. While the state-of-the-art homomorphic methods proposed today are impractical for computationally complex tasks like machine learning without substantial delay, the schemes we review are capable of handling machine learning inference. We construct a hypothetical scenario, solved with homomorphic encryption, such that a government agency wishes to use machine learning in order to identify pro-ISIS messages without (a) collecting the messages of citizens and (b) allowing users to reverse engineer the model. This scenario can demonstrated in real-time during a presenation using a simple Rasberry Pi setup. Regarding our developed system, the input used to train the surveillance model is a synthetic ISIS-related dataset. The poster session aims to provide a review of modern HE schemes for non-cryptography specialists, and gives simple examples of the usage of homomorphic encryption with benchmarking across different proposed schemes.

Supervised/unsupervised cross-over method for autonomous anomaly classification

Neil Caithness

Classical threat detection relies on rule-based systems that are often too rigid for rapid changes in the adversary landscape. Senseon has developed a method that addresses this problem by modelling typical user/device behaviour and identifying instances that do not conform to established baselines. Here, we present a method of anomaly detection and classification that starts with unsupervised statistical learning, performs autonomous class labelling, and finally builds a supervised classification engine. One key element that facilitates the method is the calculation of an anomaly score and its probability density function (PDF) from the residuals of a low-rank approximation of the input data stream. After reviewing background theory of low-rank approximation, we then present elements of the method.

Anomaly detection is performed using the residual sum of squares of the low-rank approximation of the input data known as the truncated singular value decomposition (SVD). We show that the resulting anomaly scores are distributed as chi-square at k-degrees of freedom which allows consistent comparison across data sets. The scores allow us to perform a weighted cluster analysis in the low-dimensional space, which in turn is used to assign class-labels to clusters. Clusters can be interpreted with respect to their driving features as seen on a biplot of factors of the SVD, U and V. This provides interpretable justifications during inference on new data.

The advantage of this method over other contemporary methods lies in the interpretability of results. With this method, we can explain why an observation was determined to be anomalous. Contrast this with methods that are essentially black boxes producing accurate results, but without the prospect of interpretation. We review how our method can be applied to cyber security data, and why the interpretability of results constitutes a significant innovation. Finally, we show how this anomaly detection and inference is integrated into our broader cyber defence reasoning framework.

Detecting Unexpected Network Flows with Streaming Graph Clustering

Andrew Fast

When assessing questionable network traffic, network security practitioners focus on answering the classic investigative questions such as What? When? and Where?. Traditional IDS rules and malware file analysis (both static and dynamic) address the question of “What is in the traffic (and is it malicious)?”, with file analysis being a prominent target of machine learning approaches. Network traffic analysis is a complementary approach to threat detection which answers the question: “Where did this traffic come from (and is it malicious)?”.

One natural approach to answering the question of "Where?" is viewing network traffic as a mathematical graph. There is a vast literature on graph processing that is valuable for network security including algorithms for finding connected components and community detection. In a network security context, these approaches can add evidence to a hypothesis of malicious traffic by first grouping individual nodes based on shared traffic patterns and then identifying unexpected connections between hosts in different sub-regions of the network.

Traditionally, graph processing has been performed using batch algorithms, which allow the data to pool up before processing the entire, finite sample of data. Batch processing of graphs can require a significant amount of engineering effort including the addition of a graph database, Spark’s GraphX or other specialized data store to the standard processing pipeline. Regardless of the chosen platform, the required engineering complexity often grows as the graphs grow in size.

Streaming processing is an alternative computational paradigm with reduced storage requirements compared to a batch approach. In a streaming model, processing occurs one data point at a time, limiting the amount of data that needs to be stored. Unfortunately, this reduction in storage space means a trade-off of needing new algorithms and approaches specifically designed for streaming data. In this talk, we describe a streaming approach to graph clustering. Motivated by the challenge of using machine learning to selectively record full network packets, we show how streaming graph clustering can be applied to network traffic to identify unexpected and potentially malicious flows in near-real time.

On the OTHER Application of Graph Analytics for Insider Threat Detection

Nahid Farhady Ghalaty, Ana Cruz

Insider threat detection is a growing challenge for organizations. Insider threat is defined as “the potential for an individual who has or had authorized access to an organization's assets to use their access, either maliciously or unintentionally, to act in a way that could negatively affect the organization.” In this presentation, we propose a method of detection using graph analytics. Graphs have been used for the purpose of insider threat detection in terms of detection and visualization of anomalies, i.e. finding whether an employee behaved in a way that is considered abnormal compared to his/her peers, or group-mates. In this presentation, we are leveraging graphs in order to detect employee behaviors that lead to the act of data exfiltration. Based on our discussions with security analysts, data exfiltration is often not the result of a single action, but there is a chain of events that could cause the final act of breaching critical data outside of the firm. As a result, mechanisms to detect the chain of events will make the rate of false positive lower compared to only finding anomalies.

In this presentation, we create an insider threat graph for the detection of known malicious chains of behavioral events. In the insider threat graph database, we have several types of nodes that represent the behavior and actions of employees. The nodes in the graph database include the employees’ information as well as their digital footprint from all the organization’s assets they interact with. The edges for this graph are the actions that are taken between two nodes. For example, two employee nodes can send emails to each other. Or an employee node can log out of a system node at a specific time. After building the graph, we then identify the patterns that lead to data exfiltration. Using the analysts’ expert knowledge, we have built customized queries that can detect such patterns and create customized alerts.

For this research work, we have created a framework using synthetic data and implemented the graph using AWS Neptune. The queries have been implemented using the Gremlin query language. We present the trade-offs of using traditional relational databases versus graph databases for insider threat detection. Based on our experiments, graph databases provide a single framework for both anomaly detection as well as behavioral chain detection. We also compare the setup complexity, data ingestion time, query and searching capabilities as well as the cost for both solutions.

Cyber-Adversary Behavior Extraction and Comparisons Using IDS Alert Logs

Stephen Moskal

Computer networks are under constant threat from cyber attackers as adversaries from anywhere in the world can potentially probe, access, and exploit the network at any time. From the perspective of the defense (system administrators or IT staff) the intentions and motivations of the attacker is unknown and the defense can only observe the adversarial traffic captured by some sort of Intrusion Detection System (IDS) to respond accordingly by patching vulnerabilities and applying strict security policies. However, the complexity of network structures and the sophistication of the attacker's skills/behavior means that there are numerous ways attackers could penetrate the network making it extremely difficult to defend against all types of attackers. We propose that contained within the IDS alerts is an attack scenario describing a process of adversarial actions leading to an overarching goal which can be used to profile the adversary's behavior, compare to other observations of attackers, and generate examples of similar attack scenarios. To extract the attack actions from IDS alerts, we hypothesize that the decline in alert volume for an observable cyber-attack kill chain stage signifies the beginning of a new action and the alerts can be aggregated to represent one action. We then evaluate how the attributes of the actions transpire over the course of an attack scenario such as the source IP, target IP, attack type, and service type by defining a set of network agnostic labels called Attacker Movements and a corresponding feature set describing the history of attacker actions to compare attack attributes between attackers. As IDS alerts are not typically reflective of the actions we use data collected from the Colligate Penetration Testing Competition (CTPC) including IDS logs from multiple teams of contestants and detailed in-person team observations to capture the adversary's perspective and thought process. We recover 63% of the all observable actions performed by the adversaries and capture 4 out of the 5 critical actions leading to exploitation of an asset. Lastly we report the similarity between adversary behaviors using the Jensen-Shannon divergence of Attacker Movements comparing behaviors from the same team attacking different targets, teams with in the same competition, and teams of a similar skill level but from a previous competition. This similarity metric gives the capability of determining interesting and unique behavior profiles which can be used to assess and prevent future attacks based on previously observed behavioral patterns.

Canopy: A Learning-Based Approach for Automatic Low-Volume DDoS Mitigation

Banjo Obayomi, Chris Todd, Lucas Cadalzo, Brad Moore, Tony Wong

In a low-volume distributed denial-of-service (LVDDoS) attack, an adversary attempts to overwhelm the server by making requests specially crafted to use an inordinate amount of the server’s resources. The imbalance between the resources used by the server and attacker during an LVDDoS attack allows otherwise resource-constrained adversaries to mount effective attacks on large systems. Standard defense tools focus on metrics such as the number of requests and don’t focus on nuanced metrics such as user experience.

We propose Canopy, a novel approach for detecting LVDDoS attacks by applying machine learning techniques to extract meaning from observed patterns of TCP state transitions. We differentiate between malicious and benign traffic by employing a supervised learning approach, using features extracted from the temporal patterns of TCP state transitions. We employ three different algorithms of varying complexity for our classification model: decision trees, ensemble methods, and temporal convolutional networks.

Canopy is able to detect and mitigate these low-volume attacks accurately and quickly: our tests find that attacks are identified during 100% of test runs within 650 milliseconds. Server performance is restored quickly: in our experimental testbed, we find that clients’ experience is restored to normal within 7.5 seconds.

During active attack mitigation, which only occurs during server performance degradation indicative of an attack, Canopy exhibits minimal erroneous mitigative action applied to benign clients: under 5% of benign clients are incorrectly blocked. These clients are blocked for an average of 4 seconds.

Canopy is able to identify various types of attacks regardless of the protocols they exploit. We tested attacks that exploit HTTP features such as SlowRead and ApacheKiller and TCP protocol attacks such as Sockstress. The robust attack suite used to train Canopy allows for its capabilities to generalize well to LVDDoS attacks not included in its training dataset. In our evaluation runs Canopy was able to identify never-before-seen attacks within 750 milliseconds.

Disclaimer: This research was developed with funding from the Defense Advanced Research Projects Agency (DARPA). The views, opinions and/or findings expressed are those of the author and should not be interpreted as representing the official views or policies of the Department of Defense or the U.S. Government.

Predicting Exploitability: Forecasts for Vulnerability Management

Michael Roytman

Security is all about reacting. It’s time to make some predictions. We explain how Kenna Security used the AWS Machine Learning platform to train a binary classifier for vulnerabilities, allowing the company to predict whether or not a vulnerability will become exploitable.

We offer an overview of the process. Kenna enriches the data with more specific, nondefinitional-level data. 500 million live vulnerabilities and their associated close rates inform the epidemiological data, as well as “in the wild” threat data from AlienVault’s OTX and SecureWorks’s CTU, Reversing Labs, and ISC SANS. We use 70% of the national vulnerability database as its training dataset and generates over 20,000 predictions on the remainder of the vulnerabilities. It then measures specificity and sensitivity, positive predictive value, and false positive and false negative rates before arriving at an optimal decision cutoff for the problem.

The Secret Life of Pwns: Characterizing and Predicting Exploit Weaponization

Octavian Suciu, Erin Avllazagaj, Tudor Dumitras

In recent years it has become challenging to weaponize the exploits of software vulnerabilities, so that they are effective in real-world attacks. However, the functionality added during the weaponization process has not been studied systematically, as it is difficult to infer, automatically, how close an exploit is to becoming weaponized. While the CVSS v3 standard specifies an Exploit Code Maturity metric---which captures the development status of exploits for each vulnerability, to assess the associated risks more accurately---this metric must be updated manually and is currently not published in the National Vulnerability Database.

In this work, we combine program analysis and data mining techniques to decompose exploit code into constituent micro-functionalities, and we conduct a quantitative and qualitative study of functionality reuse among 38,000 public PoC exploits. Our analysis aims to uncover statistical associations between reused components and existence of functional variants of exploits. From these observations, we propose a method to automatically predict whether vulnerabilities get weaponized, based on the micro-functionalities present in the exploit code.
Finally, we apply our prediction method to a corpus of 32M samples collected from Pastebin, a code-sharing platform popular among hackers, discovering that the existence of these variants is highly indicative of weaponization.
These results suggest that functionality-reuse patterns among exploits provide useful signals for assessing the maturity of exploit code and they open new avenues for reasoning about the risk of weaponized exploits.

Serverless Machine Learning for Phishing

Scott Rodgers

Phishing emails are one of the largest issues Cybersecurity professionals face today. An errant user clicking a malicious link can be all that is required for attacker to gain a foothold inside a corporate network. As such, many Cybersecurity departments will review reported emails from employees to help them determine if they are legitimate or not. While a great service, this can be extremely time consuming when employees submit large numbers of emails. To help minimize the load on our Detection team, our team has developed a machine learning email classification tool. Currently, our classifier extracts over 400 features from each individual email to allow it to identify emails that may require follow up from an analyst. It produces a likelihood score and recommended classification (Phishing, Spam, Legitimate, etc.) to inform analysts and automatically disposition low risk emails. To implement the model in production we will discuss serverless deployment with REST API triggers.

Adversarial Attacks against Malware N-gram Machine Learning Classifiers

Emily Rogers

Despite astonishing success in complex pattern recognition, machine learning classifiers have yet to be fully embraced in critical areas of security due to its known weakness against adversarial attacks. These attacks are specially crafted inputs that are designed to fool trained classifiers into misclassification, and can be either white- or black-box attacks. Although the ease of creating such inputs is well-documented for images, malware is another matter. Unlike images, malware features carry functional significance, are often discrete, and are usually correlated; unconstrained manipulations of the features, as is standard for images, most often result in non-sensical feature sets, dysfunctional code, or both. An example of this is a feature set of n-grams of Windows API calls, which cannot be perturbed in standard fashion without potentially destroying both internal coherence and malware functionality. We present here the first ever algorithm for crafting adversarial examples on n-gram features that preserves functionality, satisfies internal correlations, and does not require knowledge of the original generating malware code. These examples can be used in adversarial training to enable a more robust defense against them, and ultimately to ensure the secure use of machine learning classifiers in malware classification.

Towards A Public Dataset/Benchmark for ML-Sec

Ethan Rudd

While machine learning for information security (ML-Sec) has taken off in recent years, as a field, it is still arguably in its infancy, compared with other areas of applied machine learning like computer vision and natural language processing (NLP). One reason for the size, traction, and research progress in these other fields has been the presence of large scale benchmark datasets and common evaluation protocols, e.g., ImageNet and the associated ILSVRC competitions/benchmarks. While many private threat intelligence/vendor response aggregation feeds are available, there are, at current, few public data sources in the industry that reflect realistic commercial ML-Sec use cases. We argue that this is detrimental both to the progress of the industry, as it offers no common benchmark to assess trained classifiers, and especially detrimental to academic researchers, where the resources/infrastructure required to leverage commercial threat feeds and obtain realistic data sets is often a barrier to entry into ML-Sec compared with areas of applied machine learning.

To this end, we have created an ML-Sec dataset for public release. The first part of this talk will announce the dataset and give an overview of its design, design rationale, and benchmark classifier performance. With respect to design, first and foremost we release (nearly) raw binary files of both mal- and benign-ware rather than samples already pre-processed into features, allowing researchers to experiment realistically with both feature extraction and model construction problems. Next, we ensure that the dataset is sufficiently difficult to leave room for performance improvements, even with realistic baseline classifiers – i.e., performance is not saturated at release. Third, we ensure that the dataset is large / representative enough to ensure that classifiers' performance will (with high-probability) retain relative rank order, even in the presence of much larger training sets. Finally, we provide a variety of metadata for heterogeneous applications, including but not limited to both malware detection and malware type tagging/classification, enabling richer applications beyond simple binary classification.

The second part of this talk will discuss challenges faced during the dataset release including hosting, legal/licensing considerations, and security challenges encountered during this release, and provide paths forward and suggestions for groups wishing to release similar types of datasets. The purpose of this part of the presentation is to provide paths forward for other ML-Sec research groups to create similar industry/academic benchmarks and data sets; we hope to expedite this process by flagging core challenges.

Phish Language Processing (PhishLP)

Santhosh Kumar Ramachandran

As attacks get more sophisticated, detecting these threats pose innumerable challenges. Spear Phishing is one such attack which targets a specific organisation or individual. Spear phishing attacks are launched by highly trained individuals with good knowledge about the target. Most of the spear phishing attacks are multi-staged, where a reconnaissance email without malicious link or attachment is sent as the first step followed by the actual malicious email. Conventional signature-based engines will fail the detect the spear phishing emails due to frequent change in patterns.

PhishLP is a Natural Language Processing(NLP) based signature-less engine that can understand the context of the email and categorize the risk accordingly.

Traditional text classification approach practices tokenizing words and vectorizing it to obtain the feature vector. But the feature vectors don't capture the positional information about the word. The classical approach also misses in ordering the feature vectors of sentences with a similar meaning close to each other in vector space. Our solution is to encode the whole sentences as a fixed-length vector using an open source sentence encoder algorithm and classify each sentence in the email body into one of the four categories.

• Info
• Threat
• Action
• Spam

Emails containing sentences with a high probability of Threat or Action and received from an external/less-credible domain are marked as phishing/spam.

Using historical spam and phishing emails, we have trained a DNN based model. The model was able to classify the sentences with an accuracy of 99.5%. Our experiment helps us to develop a context-based phishing classification engine which can adapt itself for future threats.

Linking Exploits from the Dark Web to Known Vulnerabilities for Proactive Cyber Threat Intelligence: An Attention-Based Deep Structured Semantic Model Approach

Sagar Samtani

The Dark Web has emerged as a valuable source to proactively develop cyber threat intelligence (CTI) capabilities. Despite its value, Dark Web data contains tens of thousands of unstructured, un-sanitized text records containing significant non-natural language. This prevents the direct application of standard CTI analytics (e.g., malware analysis, IP reputation services) and text mining methodologies to perform critical tasks. One such challenge pertains to systematically linking Dark Web exploits to known vulnerabilities present within modern organizations. In this talk, I will present my recent work in extending a deep learning technique known as the Deep Structured Semantic Model (DSSM) (drawn the neural information retrieval) to incorporate emerging attention mechanisms from interpretability for deep learning literature. The resultant Exploit Vulnerability Attention DSSM (EVA-DSSM) automatically links hacker forum exploits and vulnerabilities provided by enterprise vulnerability assessment tools based on their names, outputs interpretable and explainable text features that are critical for creating links, and provides prioritized links for subsequent remediation and mitigation efforts. Rigorous evaluation indicates that EVA-DSSM outperforms baseline methods drawn from distributional semantics, probabilistic matching, and deep learning-based short text matching algorithms in matching relevant vulnerabilities from major vulnerability assessment tools to 0-day to web applications exploits, remote exploits, local exploits, and denial of service exploits. The framework’s utility in two contexts: the systems of selected major US hospitals and Supervisory Control and Data Acquisition (SCADA) systems worldwide.

Evaluating the Potential Threat of Generative Adversarial Models to Intrusion Detection Systems

Conrad Tucker

Signature-based Intrusion Detection Systems (IDS) use pre-defined signatures of malware activity to identify malware, and are therefore limited to detecting known malware. To overcome this limitation, anomaly detection-based IDS characterizes the behavior of network traffic, and monitors the computer network for activities that exceed a pre-defined range of normal behaviors. To characterize the network traffic behavior, a set of features is defined by an anomaly detection model. Commonly used features for network traffic at the packet level include header length, packet size, source and destination ports, source and destination IP addresses, etc. At the flow level, average packet length and number of packets in a flow can be used as features. Although the exact anomaly detection model of an IDS is usually kept confidential in order to minimize the vulnerability to potential intrusions, it is possible for a malware developer to use machine learning approaches to characterize a black-box anomaly detection model, such that an attacking surface can be revealed. Considering the advantage of deep neural networks in approximating complicated abstract models, deep learning approaches could potentially be used by malware developers to attack anomaly detection-based IDS.

Among the various deep learning models that could be used to hide malware from anomaly detection-based IDS, Generative Adversarial Networks (GANs) are getting increasing attention from network security researchers. A GAN model consists of a generator neural network and a discriminator neural network. The generator is trained to generate synthesized data that resembles the training data, while the discriminator is trained to distinguish the synthesized data from the training data. The introduction of a discriminator creates an adversarial learning process which helps to increase the generator’s performance in generating data similar to the training data. In the application of hiding malware from anomaly detection-based IDS, a set of feature values of benign network traffic or undetected malicious network traffic can be collected by the malware developer as the initial training data. Then, a generator is trained to generate feature values similar to the training data, and the discriminator which simulates an IDS is trained to approximate the anomaly detection model. When the training is complete, the generated feature values are used to modify the malware behavior. The malware with modified behavior is evaluated by a testing IDS, where the generated features resulting in successful malware hiding are collected to update the initial training dataset. Thus, an iterative process of GAN training to hide malware from IDS is formed.

For intrusion efficiency, we study how different choices of initial training data and GAN models will affect the success rate of malware hiding, and whether or how fast the rate increases as the iterative GAN training process continues. For the knowledge transferability, we evaluate how the success rate of malware hiding changes when the malware trained on one testing IDS is used to attack a new IDS.