this post was submitted on 27 Jan 2025
881 points (98.0% liked)

Technology

61263 readers
4176 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each other!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed
  10. Accounts 7 days and younger will have their posts automatically removed.

Approved Bots


founded 2 years ago
MODERATORS
 

cross-posted from: https://lemm.ee/post/53805638

(page 7) 50 comments
sorted by: hot top controversial new old
[–] [email protected] 151 points 5 days ago (11 children)

Good. That shit is way overvalued.

There is no way that Nvidia are worth 3 times as much as TSMC, the company that makes all their shit and more besides.

I'm sure some of my market tracker funds will lose value, and they should, because they should never have been worth this much to start with.

load more comments (11 replies)
[–] [email protected] 3 points 5 days ago
[–] [email protected] 56 points 5 days ago (11 children)

My understanding is that DeepSeek still used Nvidia just older models and way more efficiently, which was remarkable. I hope to tinker with the opensource stuff at least with a little Twitch chat bot for my streams I was already planning to do with OpenAI. Will be even more remarkable if I can run this locally.

However this is embarassing to the western companies working on AI and especially with the $500B announcement of Stargate as it proves we don't need as high end of an infrastructure to achieve the same results.

[–] [email protected] 33 points 5 days ago

500b of trust me Bros... To shake down US taxpayer for subsidies

Read between the lines folks

load more comments (10 replies)
[–] [email protected] 9 points 5 days ago* (last edited 4 days ago)

Some things to learn in here ? :
https://github.com/deepseek-ai
Large-scale reinforcement learning (RL) ?

chat (requires login via email or Google...)Chat with DeepSeek-R1 on DeepSeek's official website: chat.deepseek.com, and switch on the button "DeepThink"


aha moments (in white paper)from page 8 of 22 in :
https://raw.githubusercontent.com/deepseek-ai/DeepSeek-R1/refs/heads/main/DeepSeek_R1.pdf

One of the most remarkable aspects of this self-evolution is the emergence of sophisticated behaviors as the test-time computation increases. Behaviors such as reflection—where the model revisits and reevaluates its previous steps—and the exploration of alternative approaches to problem-solving arise spontaneously. These behaviors are not explicitly programmed but instead emerge as a result of the model’s interaction with the reinforcement learning environment. This spontaneous development significantly enhances DeepSeek-R1-Zero’s reasoning capabilities, enabling it to tackle more challenging tasks with greater efficiency and accuracy.

Aha Moment of DeepSeek-R1-Zero
A particularly intriguing phenomenon observed during the training of DeepSeek-R1-Zero is the occurrence of an “aha moment”. This moment, as illustrated in Table 3, occurs in an intermediate version of the model. During this phase, DeepSeek-R1-Zero learns to allocate more thinking time to a problem by reevaluating its initial approach. This behavior is not only a testament to the model’s growing reasoning abilities but also a captivating example of how reinforcement learning can lead to unexpected and sophisticated outcomes.

This moment is not only an “aha moment” for the model but also for the researchers observing its behavior. It underscores the power and beauty of reinforcement learning: rather than explicitly teaching the model on how to solve a problem, we simply provide it with the right incentives, and it autonomously develops advanced problem-solving strategies. The “aha moment” serves as a powerful reminder of the potential of RL to unlock new levels of intelligence in artificial systems, paving the way for more autonomous and adaptive models in the future.



https://github.com/huggingface/open-r1
Fully open reproduction of DeepSeek-R1

https://en.m.wikipedia.org/wiki/DeepSeek
DeepSeek_R1 was released 2025-01-20

[–] [email protected] 24 points 5 days ago (3 children)

Well, you still need the right kind of hardware to run it, and my money has been on AMD to deliver the solutions for that. Nvidia has gone full-blown stupid on the shit they are selling, and AMD is all about cost and power efficiency, plus they saw the writing on the wall for Nvidia a long time ago and started down the path for FPGA, which I think will ultimately be the same choice for running this stuff.

[–] [email protected] 5 points 5 days ago (1 children)

From a "compute" perspective (so not consumer graphics), power... doesn't really matter. There have been decades of research on the topic and it almost always boils down to "Run it at full bore for a shorter period of time" being better (outside of the kinds of corner cases that make for "top tier" thesis work).

AMD (and Intel) are very popular for their cost to performance ratios. Jensen is the big dog and he prices accordingly. But... while there is a lot of money in adapting models and middleware to AMD, the problem is still that not ALL models and middleware are ported. So it becomes a question of whether it is worth buying AMD when you'll still want/need nVidia for the latest and greatest. Which tends to be why those orgs tend to be closer to an Azure or AWS where they are selling tiered hardware.

Which... is the same issue for FPGAs. There is a reason that EVERYBODY did their best to vilify and kill opencl and it is not just because most code was thousands of lines of boilerplate and tens of lines of kernels. Which gets back to "Well. I can run this older model cheap but I still want nvidia for the new stuff...."

Which is why I think nvidia's stock dropping is likely more about traders gaming the system than anything else. Because the work to use older models more efficiently and cheaply has already been a thing. And for the new stuff? You still want all the chooch.

[–] [email protected] 3 points 5 days ago (7 children)

Your assessment is missing the simple fact that FPGA can do things a GPU cannot faster, and more cost efficiently though. Nvidia is the Ford F-150 of the data center world, sure. It's stupidly huge, ridiculously expensive, and generally not needed unless it's being used at full utilization all the time. That's like the only time it makes sense.

If you want to run your own models that have a specific purpose, say, for scientific work folding proteins, and you might have several custom extensible layers that do different things, N idia hardware and software doesn't even support this because of the nature of Tensorrt. They JUST announced future support for such things, and it will take quite some time and some vendor lock-in for models to appropriately support it.....OR

Just use FPGAs to do the same work faster now for most of those things. The GenAI bullshit bandwagon finally has a wheel off, and it's obvious people don't care about the OpenAI approach to having one model doing everything. Compute work on this is already transitioning to single purpose workloads, which AMD saw coming and is prepared for. Nvidia is still out there selling these F-150s to idiots who just want to piss away money.

load more comments (7 replies)
[–] [email protected] 2 points 5 days ago (2 children)

I'm way behind on the hardware at this point.

Are you saying that AMD is moving toward an FPGA chip on GPU products?

While I see the appeal - that's going to dramatically increase cost to the end user.

[–] [email protected] 6 points 5 days ago (8 children)

No.

GPU is good for graphics. That's what is designed and built for. It just so happens to be good at dealing with programmatic neural network tasks because of parallelism.

FPGA is fully programmable to do whatever you want, and reprogram on the fly. Pretty perfect for reducing costs if you have a platform that does things like audio processing, then video processing, or deep learning, especially in cloud environments. Instead of spinning up a bunch of expensive single-phroose instances, you can just spin up one FPGA type, and reprogram on the fly to best perform on the work at hand when the code starts up. Simple.

AMD bought Xilinx in 2019 when they were still a fledgling company because they realized the benefit of this. They are now selling mass amounts of these chips to data centers everywhere. It's also what the XDNA coprocessors on all the newer Ryzen chips are built on, so home users have access to an FPGA chip right there. It's efficient, cheaper to make than a GPU, and can perform better on lots of non-graphic tasks than GPUs without all the massive power and cooling needs. Nvidia has nothing on the roadmap to even compete, and they're about to find out what a stupid mistake that is.

load more comments (8 replies)
[–] [email protected] 1 points 5 days ago

I think the idea is that you can optimise it for the model or maybe? (Guessing mostly)

[–] [email protected] 11 points 5 days ago (1 children)

Built a new PC for the first time in a decade last spring. Went full team red for the first time ever. Very happy with that choice so far.

load more comments (1 replies)
[–] [email protected] 135 points 5 days ago* (last edited 5 days ago) (21 children)

Shovel vendors scrambling for solid ground as prospectors start to understand geology.

...that is, this isn't yet the end of the AI bubble. It's just the end of overvaluing hardware because efficiency increased on the software side, there's still a whole software-side bubble to contend with.

[–] [email protected] 25 points 5 days ago (1 children)

there's still a whole software-side bubble to contend with

They're ultimately linked together in some ways (not all). OpenAI has already been losing money on every GPT subscription that they charge a premium for because they had the best product, now that premium must evaporate because there are equivalent AI products on the market that are much cheaper. This will shake things up on the software side too. They probably need more hype to stay afloat

[–] [email protected] 18 points 5 days ago (1 children)

Quick, wedge crypto in there somehow! That should buy us at least two more rounds of investment.

load more comments (1 replies)
[–] [email protected] 14 points 5 days ago

Great analogy

load more comments (19 replies)
[–] [email protected] 14 points 5 days ago (1 children)

With the amount governments seem to be on the AI train I'm becoming more and more worried about the fall out when the hype bubble does burst. I'm really hoping it comes sooner rather than later.

[–] [email protected] 0 points 5 days ago

Giving these parasites money now is a bail out of their bad decisions...

Let them compete, they should lay for their own capex

[–] [email protected] 46 points 5 days ago (2 children)

Bizarre story. China building better LLMs and LLMs being cheaper to train does not mean that nVidia will sell less GPUs when people like Elon Musk and Donald Trump can't shut up about how important "AI" is.

I'm all for the collapse of the AI bubble, though. It's cool and all that all the bankers know IT terms now, but the massive influx of money towards LLMs and the datacenters that run them has not been healthy to the industry or the broader economy.

[–] [email protected] 1 points 5 days ago* (last edited 5 days ago)

US economy has been running on bubbles for decades, and using bubbles to fuel innovation and growth. It has survived telecom bubble, housing bubble, bubble in the oil sector for multiple times (how do you think fracking came to be?) etc. This is just the start of the AI bubble because its innovations have yet to have a broad-based impact on the economy. Once AI becomes commonplace in aiding in everything we do, that's when valuations will look "normal".

[–] [email protected] 21 points 5 days ago* (last edited 5 days ago) (1 children)

It literally defeats NVIDIA's entire business model of "I shit golden eggs and I'm the only one that does and I can charge any price I want for them because you need my golden eggs"

Turns out no one actually even needs a golden egg anyway.

And... same goes for OpenAI, who were already losing money on every subscription. Now they've lost the ability to charge a premium for their service (anyone can train a GPT4 equivalent model cheaply, or use DeepSeek's existing open models) and subscription prices will need to come down, so they'll be losing money even faster

[–] [email protected] 11 points 5 days ago* (last edited 5 days ago) (1 children)

Nvidia cards were the only GPUs used to train DeepSeek v3 and R1. So, that narrative still superficially holds. Other stocks like TSMC, ASML, and AMD are also down in pre-market.

[–] [email protected] 15 points 5 days ago (1 children)

Yes, but old and "cheap" ones that were not part of the sanctions.

[–] [email protected] 9 points 5 days ago (1 children)

Ah, fair. I guess it makes sense that Wall Street is questioning the need for these expensive blackwell gpus when the hopper gpus are already so good?

[–] [email protected] 7 points 5 days ago (1 children)

It's more that the newer models are going to need less compute to train and run them.

[–] [email protected] 10 points 5 days ago (1 children)

Right. There's indications of 10x to 100x less compute power needed to train the models to an equivalent level. Not a small thing at all.

[–] [email protected] 5 points 5 days ago* (last edited 5 days ago)

Not small but... smaller than you would expect.

Most companies aren't, and shouldn't be, training their own models. Especially with stuff like RAG where you can use the highly trained model with your proprietary offline data with only a minimal performance hit.

What matters is inference and accuracy/validity. Inference being ridiculously cheap (the reason why AI/ML got so popular) and the latter being a whole different can of worms that industry and researchers don't want you to think about (in part because "correct" might still be blatant lies because it is based on human data which is often blatant lies but...).

And for the companies that ARE going to train their own models? They make enough bank that ordering the latest Box from Jensen is a drop in the bucket.


That said, this DOES open the door back up for tiered training and the like where someone might use a cheaper commodity GPU to enhance an off the shelf model with local data or preferences. But it is unclear how much industry cares about that.

load more comments
view more: ‹ prev next ›