Trying to Make Meterpreter into an Adversarial Example
While machine learning has put previously hard-to-solve problems within reach, recent research has shown that many of the associated methods are susceptible to misclassification via the explicit construction of adversarial examples. These cleverly crafted inputs are designed to toe the line of classifier decision boundaries, and are typically constructed by slightly perturbing correctly classified instances until the classifier misclassifies it, even though the instance is largely the same. Researchers have published ways to construct these examples with full, some, or no knowledge of the target classifier, and have furthermore shown their applicability to a variety of domains, including in security.
In this talk, we’ll discuss several experiments where we attempted to make Meterpreter – a well-known and well-signatured RAT – into an adversarial example. To do this, we leveraged the open-source gym-malware package, which treats the target classifier as a black-box and uses reinforcement learning to train an agent on how to apply perturbations to input PE files in a way that results in evasive malware. Deviating from existing work, our approach trained and tested only on different versions of Meterpreter, which were compiled by using msfvenom with different compilation options, such as templates, encoders, added code, and others. Our goal was in part to test if the reinforcement learning approach is more effective when focused on one malware family, as well as to see if we can make something well-known (and widely-used) evasive.
Unfortunately, our results were underwhelming: we found little difference between using a fully black-box, gray box, or random agent to apply perturbations, and we also did not see significant changes between varying the game length between 10 or 20 perturbations per instance. However, on analyzing the samples generated by msfvenom, we saw that many of the instances we created were naturally evasive due to their compilation parameters, and did not benefit from applied perturbations; applying an encoder, for example, increased the confidence of the classifier, whereas using a template – even of a malicious executable – decreased it. Taken as a whole, our results lay out interesting areas for future work, both in the realm of pre- and post-compilation adversarial example construction.