mathematically “correct” sounding output
It's hard to say because that's a rather ambiguous way of describing it ("correct" could mean anything), but it is a valid way of describing its mechanisms.
"Correct" in the context of LLMs would be a token that is likely to follow the preceding sequence of tokens. In fact, it computes a probability for every possible token, then takes a random sample according to that distribution* to choose the next token, and it repeats that until some termination condition. This is what we call maximum likelihood estimation (MLE) in machine learning (ML). We're learning a distribution that makes the training data as likely as possible. MLE is indeed the basis of a lot of ML, but not all.
*~Oversimplification.~