this post was submitted on 19 Jun 2025
305 points (99.0% liked)
Technology
71986 readers
2997 users here now
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related news or articles.
- Be excellent to each other!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
- Check for duplicates before posting, duplicates may be removed
- Accounts 7 days and younger will have their posts automatically removed.
Approved Bots
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Sure, you can do that at an aggregate level, but then how do you divide it by customer? And even then, some setups will be more efficient than others, so you'd only get that setup's usage.
And even if you do that and can narrow it down to a single user and a single prompt, you can still only roughly predict how long it will think and how long the response will be.
By customer is easy: they're each renting specific resources. A fractional cloud instance (excepting the sma burst able ones) is tied to specific CPUs and GPUs. And there are records of who rented which one when being kept already.
You might not be able to break out specific individual queries, but computing averages is completely straightforward