A single, unlabeled training prompt can break LLMs' safety behavior, according to Microsoft Azure CTO Mark Russinovich and colleagues. They published a research paper that detailed how this prompt, ...
The GRP‑Obliteration technique reveals that even mild prompts can reshape internal safety mechanisms, raising oversight concerns as enterprises increasingly fine‑tune open‑weight models with ...