And indeed, the other crucial piece is that... apologizing isn't a protocol with an expected reward function. I can just, not accept your apology. I can just, feel or "update my priors" howmever I like.
We apologize and care about these things because of shame. Which we have to regulate, in part through our actions and perspectives.
Why people feel the way they do and act the way do makes total sense when ~~one finally confronts your own vulnerabilities~~ sorry, builds an API and RL framework.
Adversarial attacks on training data for LLMs is in fact a real issue. You can very very effectively punch up with regards to the proportion of effect on trained system with even small samples of carefully crafter adversarial inputs. There are things that can counter act this, but all of those things increase costs, and LLMs are very sensitive to economics.
Think of it this way. One, reason why humans don't just learn everything is because we spend as much time filtering and refocusing our attention in order to preserve our sense of self in the face of adversarial inputs. It's not perfect, again it changes economics, and at some point being wrong but consistent with our environment is still more important.
I have no skepticism that LLMs learn or understand. They do. But crucially, like everything else we know of, they are in a critically dependent, asymmetrical relationship with their environment. The environment of their existence being our digital waste, so long as that waste contains the correct shapes.
Long term I see regulation plus new economic realities wrt to digital data, not just to be nice or ethical, but because it's the only way future systems can reach reliable and economical online learning. Maybe the right things happen for the wrong reasons.
It's funny to me just how much AI ends up demonstrating non equilibrium ecology at scale. Maybe we'll have that self introspective moment and see our own relationship with our ecosystems reflect back on us. Or maybe we'll ignore that and focus on reductive world views again.