Technology

965 readers

15 users here now

A tech news sub for communists

founded 2 years ago

MODERATORS

[email protected]

Researchers upend AI status quo by eliminating matrix multiplication in LLMs (arstechnica.com)

submitted 4 months ago by [email protected] to c/[email protected]

5 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] [email protected] 1 points 4 months ago (1 children)

Thing is that each approach has its own advantages. It could be that a simpler approach in certain cases makes more sense. At the end of the day, people will benchmark this and we'll see how it compares. Seems like initial benchmarks show that the approach works.

[–] [email protected] 5 points 4 months ago* (last edited 4 months ago) (1 children)

AI/ML research has long been notorious for choosing bullshit benchmarks that make your approach look good, and then nobody ever uses it because it's not actually that good in practice.

It's totally possible that there will be legitimate NLP use-cases where this approach makes sense, but that is almost entirely separate from the current LLM craze. Also, transformer-based LLMs pretty much entirely supplanted recurrent networks as early as like 2018 in basically every NLP task. So even if the semiconductor industry massively reoriented to producing chips that support "MatMul-free" models like this one to even get an energy reduction, that would still mean that the model outputs would be even more garbage than they already are.

[–] [email protected] 2 points 4 months ago

Sure, that's why I said other people will benchmark it as well at some point and we'll know definitively. Based on my reading, the idea here is to combine both approaches as an optimization technique. Using GPT as a hammer for every problem has been the hype phase. Now, people are starting to realize that other approaches have value too, and it's likely that combining different approaches will in fact produce interesting results.