

Was jumpscared on my YouTube recommendations page by a video from AI safety peddler Rob Miles and decided to take a look.
It talked about how it’s almost impossible to detect whether a model was deliberately trained to output some “bad” output (like vulnerable code) for some specific set of inputs.
Pretty mild as cult stuff goes, mostly anthropomorphizing and referring to such LLM as a “sleeper agent”. But maybe some of y’all will find it interesting.
Slack CEO responded there that it was all a “billing mistake” and that they’ll do better in the future and people are having none of it.
A rare orange site W, surprisingly heartwarming.