The Extreme Cost of Training AI Models. : technology

[–] [email protected] 34 points 2 months ago (19 children)

I don't care how they estimate their cost in dollars. I think the cost to all of us in environmental impact would be more interesting.

load more comments (19 replies)

[–] [email protected] 16 points 2 months ago

It's obvious that Google didn't pay the crazy AWS prices to train Gemini, seeing how many servers they have in gcp.

They mean that they used creative accounting to pay themselves crazy gcp usage bills to deduct from taxes?

[–] [email protected] 6 points 2 months ago* (last edited 2 months ago) (1 children)

How is Inflection-2 cheaper to train in the cloud than own hardware?

[–] [email protected] 1 points 2 months ago

That probably indicates a problem with the estimates.

[+] [email protected] -9 points 2 months ago (6 children)

All a huge waste of money.

This isn't ai.

It's a "Smarter Search"

[–] [email protected] 8 points 2 months ago (2 children)

I hope you complained all these years when games used "AI" for computer controlled enemies, because otherwise your post would be super awkward

load more comments (2 replies)

[–] [email protected] 8 points 2 months ago

It is AI though. AGI, which is a subcategory of AI and what many people seemingly imagine AI to mean, it's not—but AI, yes.

[–] [email protected] 28 points 2 months ago* (last edited 2 months ago) (2 children)

AI is a broader term than you might realize. Historically even mundane algorithms like A* pathfinding were considered AI.

Turns out people like to constantly redefine artificial intelligence to “whatever a computer can’t quite do yet.”

load more comments (2 replies)

load more comments (3 replies)

[–] [email protected] 53 points 3 months ago (3 children)

The source didn’t have this detail - google training gemini “cloud” vs “own hardware”. Does Google Cloud not count as “own hardware” for google?

[–] [email protected] 4 points 2 months ago

From the source:

Our primary approach calculates training costs based on hardware depreciation and energy consumption over the duration of model training. Hardware costs include AI accelerator chips (GPUs or TPUs), servers, and interconnection hardware. We use either disclosures from the developer or credible third-party reporting to identify or estimate the hardware type and quantity and training run duration for a given model. We also estimate the energy consumption of the hardware during the final training run of each model.

As an alternative approach, we also calculate the cost to train these models in the cloud using rented hardware. This method is very simple to calculate because cloud providers charge a flat rate per chip-hour, and energy and interconnection costs are factored into the prices. However, it overestimates the cost of many frontier models, which are often trained on hardware owned by the developer rather than on rented cloud hardware.

https://epochai.org/blog/how-much-does-it-cost-to-train-frontier-ai-models

[–] [email protected] 10 points 2 months ago* (last edited 2 months ago)

This is an accounting trick as well, a way to shed profit, and maximize deductions, by having different units within a parent company purchase services from each other.

I realize that my sentence long explainer doesn't shed any light on how it gets done, but funnily enough, you can ask an LLM for an explainer and I bet it'd give a mostly accurate response.

Edit: Fuck it, I asked an LLM myself and just converted my first sentence into a prompt, by asking what that was called, and how it's done. Here's the reply:

This practice is commonly referred to as "transfer pricing." Transfer pricing involves the pricing of goods, services, and intangible assets that are transferred between related parties, such as a parent company and its subsidiaries.

Transfer pricing can be used to shift profits from one subsidiary to another, often to minimize taxes or maximize deductions. This can be done by setting prices for goods and services that are not at arm's length, meaning they are not the same prices that would be charged to unrelated parties.

For example, a parent company might have a subsidiary in a low-tax country purchase goods from another subsidiary in a high-tax country at an artificially low price. This would reduce the profits of the high-tax subsidiary and increase the profits of the low-tax subsidiary, resulting in lower overall taxes.

However, it's worth noting that transfer pricing must be done in accordance with the arm's length principle, which requires that the prices charged between related parties be the same as those that would be charged to unrelated parties. Many countries have laws and regulations in place to prevent abusive transfer pricing practices and ensure that companies pay their fair share of taxes.

[–] [email protected] 20 points 2 months ago

Does Google Cloud not count as “own hardware” for google?

That's why the bars are so different. The "cloud" price is MSRP

[–] [email protected] -4 points 3 months ago (1 children)

Humanity: develops nuclear fusion

AI:

load more comments (1 replies)

[–] [email protected] 28 points 3 months ago (3 children)

Considering the hype and publicity GPT-4 produced, I don't think this is actually a crazy amount of money to spend.

[–] [email protected] 10 points 2 months ago

Comparitively speaking, a lot less hype than their earlier models produced. Hardcore techies care about incremental improvements, but the average user does not. If you try to describe to the average user what is "new" about GPT-4, other than "It fucks up less", you've basically got nothing.

And it's going to carry on like this. New models are going to get exponentially more expensive to train, while producing less and less consumer interest each time, because "Holy crap look at this brand new technology" will always be more exciting than "In our comparitive testing version 7 is 9.6% more accurate than version 6."

And for all the hype, the actual revenue just isn't there. OpenAI are bleeding around $5-10bn (yes, with a b) per year. They're currently trying to raise around $11bn in new funding just to keep the lights on. It costs far more to operate these models (even at the steeply discounted compute costs Microsoft are giving them) than anyone is actually willing to pay to use them. Corporate clients don't find them reliable or adaptable enough to actually replace human employees, and regular consumers think they're cool, but in a "nice to have" kind of way. They're not essential enough a product to pay big money for, but they can only be run profitably by charging big money.

[–] [email protected] 17 points 3 months ago* (last edited 3 months ago) (2 children)

Yeah, I'm surprised at how low that is, a software engineer in a developed country is about 100k USD per year.
So 40M USD for training ChatGPT 4 is the cost of 400 engineers for one year.
They say cost of salaries could make up to 50% of the total, so the total cost is 800 engineers for one year.
That doesn't seem extreme.

[–] [email protected] 9 points 2 months ago

This is just the estimates to train the model, so it's not accounting for the cost to develop the system for training, collecting the data, etc. This is just pure processing cost, which is staggeringly large numbers.

[–] [email protected] 11 points 2 months ago (2 children)

100k USD per engineer assumes they're exclusively hiring from US and Switzerland, that's not a general "developed country" thing. US is an outlier.

[–] [email protected] 6 points 2 months ago (1 children)

I'm talking about the cost of the engineer for the company, not the salary, which is less relevant here. In some EU countries, the salaries may be lower, but the taxes are higher to pay for the social system, so the cost for the company is similar.

[–] [email protected] 1 points 2 months ago

Yes. Also, Europeans work fewer hours per year. There are big differences between EU countries, though. https://en.wikipedia.org/wiki/List_of_countries_by_average_annual_labor_hours

[–] [email protected] 8 points 2 months ago (1 children)

US and Switzerland are way over 100k. For Netherlands and Germany 100k is a good approximation for the company costs for a senior SWE.

[–] [email protected] 0 points 2 months ago* (last edited 2 months ago) (1 children)

I did already back up the claim with a source, but okay:

US: Senior 128k USD, mid-level 94k USD
CH: Senior 118k CHF (~139k USD), mid-level 95k CHF (~112k USD)
DE: Senior 72k EUR (~80k USD), mid-level 58k EUR (~65k USD)
NL: Senior 69k EUR (~77k USD), mid-level 52k EUR (~58k USD)

Yes, US and Switzerland are outliers.

[–] [email protected] 0 points 2 months ago

Yeah, 80k gross for the worker creates close to 100k costs for the employer.

[–] [email protected] 7 points 3 months ago (2 children)

The latest releases ChatGPT 4o costs $600/hr per instance to run based on the discussion I could find about it.

If OpenAI is running 1k of those models to service the demand (they're certainly running more since queries can take 30+ seconds) then that's 200M/yr just keeping the lights on.

load more comments (2 replies)

Technology

Our Rules

Approved Bots