Researchers at ETH Zurich created a jailbreak attack that bypasses AI guardrails

A pair of researchers from ETH Zurich developed a poisoning attack method by which artificial intelligence models trained via reinforcement learning from human feedback can be jailbroken.


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *