Adversarial Attacks against Malware N-gram Machine Learning Classifiers

Emily Rogers

Despite astonishing success in complex pattern recognition, machine learning classifiers have yet to be fully embraced in critical areas of security due to its known weakness against adversarial attacks. These attacks are specially crafted inputs that are designed to fool trained classifiers into misclassification, and can be either white- or black-box attacks. Although the ease of creating such inputs is well-documented for images, malware is another matter. Unlike images, malware features carry functional significance, are often discrete, and are usually correlated; unconstrained manipulations of the features, as is standard for images, most often result in non-sensical feature sets, dysfunctional code, or both. An example of this is a feature set of n-grams of Windows API calls, which cannot be perturbed in standard fashion without potentially destroying both internal coherence and malware functionality. We present here the first ever algorithm for crafting adversarial examples on n-gram features that preserves functionality, satisfies internal correlations, and does not require knowledge of the original generating malware code. These examples can be used in adversarial training to enable a more robust defense against them, and ultimately to ensure the secure use of machine learning classifiers in malware classification.