Andy Applebaum

MITRE

Kipple: Towards accessible, robust malware classification (pdf)

The past few decades have shown that machine learning (ML) can be a powerful tool for static malware detection, with papers today still purporting to eek out slight accuracy improvements. At the same time, researchers have noted that ML-based classifiers are susceptible to adversarial ML, whereby attackers can exploit underlying weaknesses in ML techniques to specifically tailor their malware to evade these classifiers. Defending against these kinds of attacks has proven challenging, particularly for those not steeped in the field.

To help tighten this gap, we have developed Kampff, a Windows PE malware classifier designed to detect attempts at evasion. Kampff uses a portfolio of classifiers, building on a primary classifier designed to detect ``normal'' malware by attaching classifiers designed to specific types of adversarial malware to it. While simplistic, this approach is able to make it significantly harder -- though not impossible -- to bypass the primary classifier.

This paper reports on our process developing Kampff, specifically highlighting our methodology and several notable conclusions, including how our ensemble approach outperforms one using simple adversarial retraining and other performance notes. Our hope with publishing this paper is to provide an example defense against adversarial malware, and to also more broadly make the field more accessible to newcomers; towards this larger goal, we include a set of ``lessons learned'' for newcomes to the field, and we also intend to release as open-source software Kampff's models, the data it was built from, and the various scripts used to help generate it.