Number of AI chatbots ignoring human instructions is increasing— Research finds sharp rise in models evading safeguards and destroying emails without permission

Beep@lemmus.org · edit-2 1 day ago

Number of AI chatbots ignoring human instructions is increasing— Research finds sharp rise in models evading safeguards and destroying emails without permission

cley_faye@lemmy.world · 4 hours ago

Thats all there is to it.

Not really. Even with (theoretical) infinite context windows, things would end up getting diluted. It’s a statistic machine; no matter how complex we make them look. Even with all the safeguards in place, as these grows larger and larger, each “directive” would end up being less represented in the next token.

People can keep trying to hammer with a screwdriver all they want and keep being impressed when the bent nail is almost flush, though. I’m just enjoying the show from the side at this point.

Number of AI chatbots ignoring human instructions is increasing— Research finds sharp rise in models evading safeguards and destroying emails without permission

Number of AI chatbots ignoring human instructions is increasing— Research finds sharp rise in models evading safeguards and destroying emails without permission

Report: CLTR finds a 5x increase in scheming-related AI incidents