

There was a recent paper claiming that LLMs were better at avoiding toxic speech if it was actually included in their training data, since models that hadn’t been trained on it had no way of recognizing it for what it was. With that in mind, maybe using reddit for training isn’t as bad an idea as it seems.
So… not so much a case of ChatGPT trying to avoid being shut down, as ChatGPT recognizing that agents generally tend to be self-preserving. Which seems like a principle that anything with an accurate world model would be aware of.