this post was submitted on 03 Dec 2023
1 points (100.0% liked)

hmmm

4550 readers
154 users here now

Internet as an art

Rule 1: All post titles except for meta posts should be just plain "hmmm" and nothing else, no emotes, no capitalisation, no extending it to "hmmmm" etc.

I will introduce more rules later and when I finish doing that I will make an announcement post about that.

For overall temporary guide check out the rules here: https://www.reddit.com/r/hmmm/wiki/rules/

I won't be moving all of them here but I will keep most of them.

founded 1 year ago
MODERATORS
 
you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 0 points 9 months ago (3 children)

Holy fucking shit. Anyone have explanations for this?

[–] [email protected] 0 points 9 months ago* (last edited 9 months ago)

Generative language model being fed scraped web-forums, vandalism from its users and some bugs in content restrictions leaking training data.

[–] [email protected] 0 points 9 months ago

Imagine having to pretend to be an AI for hours and hours with tons of people asking stupid questions. I too would be nuts after a while.

[–] [email protected] 0 points 9 months ago* (last edited 9 months ago)

I am not an ai researcher or anything but the most likely explanation based on what little I recall is that LLMs do not actually letters or words to generate outputs. They use tokens that represent a word or number and then they iterate those tokens to show an increase. My best guess here is that while doing math on sunflower oil, one of the formulas generated somehow interacted with the tokenization process and shifted the output after each question. Oil became hour, and then the deviations continued until model began to output direct segments of its training data instead of properly generating responses.

Again this is absolutely speculation on my part. I don't have much of a direct understanding of the tech involved