Tamás Vörös

LLM Salting: From Rainbow Tables to Jailbreaks (video, pdf)

Speaker: Tamás Vörös

Author(s): Tamás Vörös; Adarsh Kyadige

This work proposes LLM salting, a lightweight defense mechanism that rotates the internal refusal direction of LLMs, rendering previously effective jailbreak prompts (like GCG) ineffective without degrading model utility.