Sven Cattell

DEFCON AIV

Bayesian Covertrees Can Monitor Attacks Too (pdf)

Adversarial attacks against AI products are more than just static events that the model gets wrong. In much of the literature we generate N points using the attack and report the accuracy of the model against those N points. When these are cheap to produce, like in the whitebox case, this is reasonable. In the blackbox case there may be thousands of queries that may take days or weeks if it's behind a rate limited API. If the attack is successful it will probably get reused. We've previously shown that we can monitor the overall adversarial drift using a bayesian approach with a cover tree. In this paper we show evidence that black box adversarial attacks induce a high measured drift, even when attackers are attempting to hide in benign traffic.