this post was submitted on 07 Jul 2024
385 points (96.8% liked)

Science Memes

10940 readers
1890 users here now

Welcome to c/science_memes @ Mander.xyz!

A place for majestic STEMLORD peacocking, as well as memes about the realities of working in a lab.



Rules

  1. Don't throw mud. Behave like an intellectual and remember the human.
  2. Keep it rooted (on topic).
  3. No spam.
  4. Infographics welcome, get schooled.

This is a science community. We use the Dawkins definition of meme.



Research Committee

Other Mander Communities

Science and Research

Biology and Life Sciences

Physical Sciences

Humanities and Social Sciences

Practical and Applied Sciences

Memes

Miscellaneous

founded 2 years ago
MODERATORS
385
Sardonic Grin (mander.xyz)
submitted 4 months ago* (last edited 4 months ago) by [email protected] to c/[email protected]
 
you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 2 points 4 months ago* (last edited 4 months ago)

I'm not describing binary classification, I'm describing multiclass. "Group classification" isn't really a thing. Yes, your ml system probably guesses what kind of plant it is and then looks up the ediblity of components.

The problem with this is how they will handle rare plants that aren't in the dataset, or that are in the dataset but with insufficient data to be recognised.

Because multiclass assumes that it's seen representative data on all possible outputs (e.g. plant types) it will tend to be dangerously confident on plant types it hasn't seen before.

This is because it can rule out other classes. E.g. if you're trying to classify as rose, tulip, or daisy and you get a bramble, your classifier is likely to be very certain it's a rose because tulips and daisies don't have thorns. So your softmax score is likely to show heavy confidence in rose even though it's actually none of them.

This is exactly what can go wrong when you try to use the softmax/standard multiclass approach and come across an interesting rare mushroom or wild carrot. You don't want it to guess which type of plant in the database it's most like, even if this guess comes with scores, you want it to say that it genuinely doesn't know and you shouldn't eat it.