this post was submitted on 27 Feb 2024
107 points (100.0% liked)
Technology
37804 readers
257 users here now
A nice place to discuss rumors, happenings, innovations, and challenges in the technology sphere. We also welcome discussions on the intersections of technology and society. If it’s technological news or discussion of technology, it probably belongs here.
Remember the overriding ethos on Beehaw: Be(e) Nice. Each user you encounter here is a person, and should be treated with kindness (even if they’re wrong, or use a Linux distro you don’t like). Personal attacks will not be tolerated.
Subcommunities on Beehaw:
This community's icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
A token is not a concept. A token is a word or word fragment that occured often in free text and was assigned a number. Common words, prefixes, and suffixes are the vast majority of tokens, and the rest are uncommon pairs of letters.
The algorithm to generate tokens is essentially compression, there is no semantic meaning embedded in them.
Yea, that was a bad way to phrase it - I just meant that from what I've heard tokens are very much not word by word. And sometimes it's a couple words, but maybe that was misinformation. And I was trying (and failing) to make an analogy for a human - a concept is a compression of what otherwise would be a bunch of words, though I kind of meant more like a reference I guess.