The Loss of Control Observatory analysed over 183,000 AI interaction transcripts and found a 5x increase in scheming-related incidents over five months.
Not really. Even with (theoretical) infinite context windows, things would end up getting diluted. It’s a statistic machine; no matter how complex we make them look. Even with all the safeguards in place, as these grows larger and larger, each “directive” would end up being less represented in the next token.
People can keep trying to hammer with a screwdriver all they want and keep being impressed when the bent nail is almost flush, though. I’m just enjoying the show from the side at this point.
Not really. Even with (theoretical) infinite context windows, things would end up getting diluted. It’s a statistic machine; no matter how complex we make them look. Even with all the safeguards in place, as these grows larger and larger, each “directive” would end up being less represented in the next token.
People can keep trying to hammer with a screwdriver all they want and keep being impressed when the bent nail is almost flush, though. I’m just enjoying the show from the side at this point.