this post was submitted on 09 Jan 2025
478 points (99.2% liked)

Opensource

1662 readers
140 users here now

A community for discussion about open source software! Ask questions, share knowledge, share news, or post interesting stuff related to it!

CreditsIcon base by Lorc under CC BY 3.0 with modifications to add a gradient



founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 116 points 1 week ago* (last edited 1 week ago) (1 children)

The most important part is that it’s a local ~~LLM~~ model running on your machine. The problem with AI is less about LLMs themselves, and more about their control and application by unethical companies and governments in a world driven by profit and power. And it’s none of those things, it’s just some open source code running on your device. So that’s cool and good.

[–] [email protected] 39 points 1 week ago (2 children)

Also the incessant ammounts of power/energy that they consume.

[–] [email protected] 1 points 6 days ago (1 children)

Curious how resource intensive AI subtitle generation will be. Probably fine on some setups.

Trying to use madVR (tweaker's video postprocessing) in the summer in my small office with an RTX 3090 was turning my office into a sauna. Next time I buy a video card it'll be a lower tier deliberately to avoid the higher power draw lol.

[–] [email protected] 2 points 5 days ago

I think it really depends on how accurate you want / what language you are interpreting. https://github.com/openai/whisper has multiple variations on their model, but they all pretty much require VRAM/graphics capability (or likely NPUs as they become more commonplace).

[–] [email protected] 21 points 6 days ago (3 children)

Running an llm llocally takes less power than playing a video game.

[–] [email protected] 1 points 1 day ago

They aren't using a LLM, this is a speech to text model like Whisper.

[–] [email protected] 13 points 6 days ago (1 children)

The training of the models themselves also takes a lot of power usage.

[–] [email protected] 2 points 1 day ago

They are using open source models that have already been trained. So no extra energy is going into the models.

[–] [email protected] 3 points 6 days ago (2 children)
[–] [email protected] 2 points 6 days ago (1 children)

Any paper about any neural network.

Using a model to get one output is just a series of multiplications (not even that, we use vector multiplication but yeah), it's less than or equal to rendering ONE frame in 4k games.

[–] [email protected] 1 points 1 day ago

I know you are agreeing with me, but it being a "series of multiplications" is not terribly informative, that's basically a given. The question is how many flops, and how efficient are flops.

[–] [email protected] 12 points 6 days ago* (last edited 6 days ago) (2 children)

I don't have a source for that, but the most that any locally-run program can cost in terms of power is basically the sum of a few things: maxed-out gpu usage, maxed-out cpu usage, maxed-out disk access. GPU is by far the most power-consuming of these things, and modern video games make essentially the most possible use of the GPU that they can get away with.

Running an LLM locally can at most max out usage of the GPU, putting it in the same ballpark as a video game. Typical usage of an LLM is to run it for a few seconds and then submit another query, so it's not running 100% of the time during typical usage, unlike a video game (where it remains open and active the whole time, GPU usage dips only when you're in a menu for instance.)

Data centers drain lots of power by running a very large number of machines at the same time.

[–] [email protected] 2 points 1 day ago (1 children)

Training the model yourself would take years on a single machine. If you factor that into your cost per query, it blows up.

The data centers are (currently) mainly used for training new models.

[–] [email protected] 1 points 1 day ago

But if you divide the cost of training by the number of people using the model, it should be pretty low.

[–] [email protected] 2 points 6 days ago (1 children)

From what I know, local LLMs take minutes to process a single prompt, not seconds, but I guess that depends on the use case.

But also games, dunno about maxing GPU in most games. I maxed mine for crypto mining, and that was power hungry. So I would put LLMs closer to crypto than games.

Not to mention games will entertain you way more for the same time.

[–] [email protected] 2 points 6 days ago* (last edited 6 days ago)

Obviously it depends on your GPU. A crypto mine, you'll leave it running 24/7. On a recent macbook, an LLM will run at several tokens per second, so yeah for long responses it could take more than a minute. But most people aren't going to be running such an LLM for hours on end. Even if they do -- big deal, it's a single GPU, that's negligible compared to running your dishwasher, using your oven, or heating your house.