Amelia Kawasaki
ShadowLogic: Hidden Backdoors in Any Whitebox LLM (video, pdf)
Speaker: Amelia Kawasaki
Author(s): Amelia Kawasaki; Kasimir Schulz; Leo Ring
Abstract: This paper unveils ShadowLogic, a method for injecting hidden backdoors into white-box LLMs by modifying theircomputational graphs. These backdoors are activated by a secret trigger phrase, allowing the model to generate uncensored responses and exposing a new class of graph-level vulnerabilities.
