I'm reading these and I admit I am a total layperson and so I don't understand a lot of it, but this is very odd for several reasons-
Usenet
We collected data for the Usenet discussion system by querying the Usenet Archive (https://archive.org/details/usenet?tab=about). We selected a list of topics considered adequate to contain a large, broad and heterogeneous number of discussions involving active and populated newsgroups. As a result of this selection, we selected conspiracy, politics, news and talk as topic candidates for our analysis. For the conspiracy topic, we collected around 280,000 comments between 1 September 1994 and 30 December 2005 from the alt.conspiracy newsgroup. For the politics topics, we collected around 2.6 million comments between 29 June 1992 and 31 December 2005 from the alt.politics newsgroup. For the news topic, we collected about 620,000 comments between 5 December 1992 and 31 December 2005 from the alt.news newsgroup. Finally, for the talk topic, we collected all of the conversations from the homonym newsgroup on a period that ranges from 13 February 1989 to 31 December 2005 for around 2.1 million contents.
First, the other discussion forums they sampled were contemporary and covered a different date range.
Secondly, with the other discussion forums, they said specifically what topics they selected and did not discuss it for Usenet.
Thirdly, why include Usenet at all?
I also don't know that a measure of 'toxicity' should come from the most popular groups and forums, especially ones that involve divisive topics. You should measure it from groups involving things like cars or Star Wars or astronomy. Discussion forums which are not known for historically known for toxicity, shouldn't you? Or at the very least sample both types of groups?
I'm not even saying there isn't historical toxicity in those groups, I'm asking that if you're going to measure the level of toxicity on the internet, it doesn't seem to me that topics which always invite toxicity are a good metric.