Found myself a copy of the paper for a read-through and it's immediately obvious to me why they couldn't get above 90% accuracy.
The word "Gender" occurs exactly zero times in the text and the datasets they worked with were divided into a strict sex binary. As a result, the accuracy of their models' predictions could not significantly improve upon prior work in the field.
The only new info here is that their XAN is able to point out the specific brain features that influenced its predictions. Potentially useful with regards to the development of treatments for gendered brain issues in neurotypical people, but anyone who falls outside of the 90th percentile of sexually dimorphic normativity won't see any benefit here.