Tamás Vörös
LLM Salting: From Rainbow Tables to Jailbreaks (video, pdf)
Speaker: Tamás Vörös
Author(s): Tamás Vörös; Adarsh Kyadige
This work proposes LLM salting, a lightweight defense mechanism that rotates the internal refusal direction of LLMs, rendering previously effective jailbreak prompts (like GCG) ineffective without degrading model utility.
