I'm highly skeptical of this at first glance. Replacing self-attention with gated recurrent units seems like a decisive step back in natural language processing capabilities. The advancement that gave rise to LLMs in the first place was when people realized that building networks out of a bunch of self-attention blocks instead of recurrent units like GRU or LSTM was extremely effective.
In short, they are proposing an older type of model which are generally outclassed by attention-based transformers that power all the LLMs we see today. I doubt it will be able to achieve nearly as good results as existing LLMs. I foresee this type of research being used to silence criticisms of the ungodly amounts of energy used by LLMs to say "See, people are working on making them way more efficient! Any day now..." Meanwhile they will never come to fruition.