Only tangentially related, but today I found a repo that appears to be developed using AI assistance, and the costs for running the agents are reported in the PRs. For example, 50 USD to remove some code: https://github.com/coder/mux/pull/1658
I'm not sure I like this method of accounting for it. The critics of LLMs tend to conflate the costs of training LLMs with the cost of generation. But this makes the opposite error: it pretends that training isn't happening as a consequence of consumer demand. There are enormous resources poured into it on an ongoing basis, so it feels like it needs to be amortized on top of the per-token generation costs.
At some point, we might end up in a steady state where the models are as good as they can be and the training arms race is over, but we're not there yet.
It would be really hard to properly account for the training, since that won't scale with more generation.
The training is already done when you make a generative query. No matter how many consumers there are, the cost for training is fixed.
My point is that it isn't, not really. Usage begets more training, and this will likely continue for many years. So it's not a vanishing fixed cost, but pretty much just an ongoing expenditure associated with LLMs.
The challenge with no longer developing new models is making sure your model is up to date which as of today requires an entire training run. Maybe they can do that less or they’ll come up with a way to update a model after it’s trained. Maybe we’ll move onto something other than LLMs
At first glance this looks like a credible set of calculations to me. Here's the conclusion:
> So, if I wanted to analogize the energy usage of my use of coding agents, it’s something like running the dishwasher an extra time each day, keeping an extra refrigerator, or skipping one drive to the grocery store in favor of biking there.
That's for someone spending about $15-$20 in a day on Claude Code, estimated at the equivalent of 4,400 "typical queries" to an LLM.
Had a small discussion about this on an OP on bsky. A somewhat interesting discussion over there.
As long as it's unaccounted for by users it's at best anexternality. I think it may demand regulation to force this cost to the surface.
electricity and cooling incur wider costs and consequences.
> As long as it's unaccounted for by users it's at best anexternality.
Why is it an externality? Anthropic (or other model provider) pays the electricity cost, then it's passed along in the subscription or API bill. The direct cost of the energy is fully internalized in the price.
That's hardly unique to data centers.
I'm all for regulation that makes businesses pay for their externalities - I'd argue that's a key economic role that a government should play.
I mean if we really cared about this one bit we'd stop making a car based society in the US and save far more energy and pollution. That's not politically expedient and there are powerful vested interests in ensuring it doesn't happen.
That's why I think most of this data center energy use, especially over longer terms is a joke. Data center can pretty easily run on solar and wind energy if we spend even a small amount to political capital to make it happen.
I don't see how this follows. Data center operators buy energy and this is almost their only operating expense. Their products are priced to reflect this. The fact that basic AI features are free reflects the fact that they use almost no energy.
I would be surprised if AI prices reflect their current cost to provide the service, even inference costs. With so much money flowing into AI the goal isn't to make money, it's to grow faster than the competition.
From this article:
> For the purposes of this post, I’ll use the figures from the 100,000 “maximum”–Claude Sonnet and Opus 4.5 both have context windows of 200,000 tokens, and I run up against them regularly–to generate pessimistic estimates. So, ~390 Wh/MTok input, ~1950 Wh/MTok output.
Expensive commercial energy would be 30¢ per kWh in the US, so the energy cost implied by these figures would be about 12¢/MTok input and 60¢/MTok output. Anthropic's API cost for Opus 4.5 is $5/MTok input and $25/MTok output, nearly two orders of magnitude higher than these figures.
The direct energy cost of inference is still covered even if you assume that Claude Max/etc plans are offering a tenfold subsidy over the API cost.
I remain confident that most AI labs are not selling API access for less than it costs to serve the models.
If that's so common then what's your theory as to why Anthropic aren't price competitive with GPT-5.2?
Us person does not consume 1600 liters a day
I have a kids and a dishwasher (which with kids, runs quite often) but I’m not convinced I’m doing worse at energy consumption
That is a pretty good article although the one factor not mentioned that we see that has a huge impact on energy is batch size but that would be hard to estimate with the data he has.
We've only launched to friends and family but I'll share this here since its relevant: we have a service which actually optimizes and measures the energy of your AI use: https://portal.neuralwatt.com if you want to check it out. We also have a tools repo we put together that shows some demonstrations of surfacing energy metadata in to your tools: https://github.com/neuralwatt/neuralwatt-tools/
Our underlying technology is really about OS level energy optimization and datacenter grid flexibility so if you are on the pay by KWHr plan you get additional value as we continue to roll new optimizations out.
DM me with your email and I'd be happy to add some additional credits to you.
To add a bit more to what @scottcha is saying: overall GPU load has a fairly significant impact on the energy per result. Energy per result is inversely related, since the idle TDP of these servers is significant the more the energy gets spread the more efficient the system becomes. I imagine Anthropic is able to harness that efficiency since I imagine their servers are far from idle :)
You can infer the discount from the pricing of the batch API, which is presumably arranged for minimum inference costs. Anthropic offers a 50% discount there, which is consistent with other model providers.
LLMs don't use much energy at all to run, they use it all at the beginning for training, which is happening constantly right now.
TLDR this is, intentionally or not, an industry puff piece that completely misunderstands the problem.
Also, even if everyone is effectively running a a dishwasher cycle every day, this is still a problem that we can't just ignore, that's still a massive increase in ecological impact.
The training cost for a model is constant. The more individual use that model gets the lower the training-cost-per-inference-query gets, since that one-time training cost is shared across every inference prompt.
It is true that there are always more training runs going, and I don't think we'll ever find out how much energy was spent on experimental or failed training runs.
> The training cost for a model is constant
Constant until the next release? The battle for the benchmark-winning model is driving cadence up, and this competition probably puts a higher cost on training and evaluation too.
Sure. By "constant" there I meant it doesn't change depending on the number of people who use the model.
You underestimate the amount of inference and very much overestimate what training is.
Training is more or less the same as doing inference on an input token twice (forward and backward pass). But because its offline and predictable it can be done fully batched with very high utilization (efficiently).
Training is guestimate maybe 100 trillion total tokens but these guys apparently do inference on the quadrillion token monthly scales.
Training is pretty much irrelevant in the scheme of global energy use. The global airline industry uses the energy needed to train a frontier model, every three minutes, and unlike AI training the energy for air travel is 100% straight-into-your-lungs fossil carbon.
Not to mention doesn't aviation fuel still make heavy (heh) use of lead?
I think thats only true for propeller planes, which use leaded gasoline. Jet fuel is just kerosene
I'm not convinced that LLM training is at such a high energy use that it really matters in the big picture. You can train a (terrible) LLM on a laptop[1], and frankly that's less energy efficient than just training it on a rented cloud GPU.
Most of the innovation happening today is in post-training rather than pre-training, which is good for people concerned with energy use because post-training is relatively cheap (I was able to post-train a ~2b model in less than 6 hours on a rented cluster[2]).
Only tangentially related, but today I found a repo that appears to be developed using AI assistance, and the costs for running the agents are reported in the PRs. For example, 50 USD to remove some code: https://github.com/coder/mux/pull/1658
I'm not sure I like this method of accounting for it. The critics of LLMs tend to conflate the costs of training LLMs with the cost of generation. But this makes the opposite error: it pretends that training isn't happening as a consequence of consumer demand. There are enormous resources poured into it on an ongoing basis, so it feels like it needs to be amortized on top of the per-token generation costs.
At some point, we might end up in a steady state where the models are as good as they can be and the training arms race is over, but we're not there yet.
It would be really hard to properly account for the training, since that won't scale with more generation.
The training is already done when you make a generative query. No matter how many consumers there are, the cost for training is fixed.
My point is that it isn't, not really. Usage begets more training, and this will likely continue for many years. So it's not a vanishing fixed cost, but pretty much just an ongoing expenditure associated with LLMs.
The challenge with no longer developing new models is making sure your model is up to date which as of today requires an entire training run. Maybe they can do that less or they’ll come up with a way to update a model after it’s trained. Maybe we’ll move onto something other than LLMs
At first glance this looks like a credible set of calculations to me. Here's the conclusion:
> So, if I wanted to analogize the energy usage of my use of coding agents, it’s something like running the dishwasher an extra time each day, keeping an extra refrigerator, or skipping one drive to the grocery store in favor of biking there.
That's for someone spending about $15-$20 in a day on Claude Code, estimated at the equivalent of 4,400 "typical queries" to an LLM.
Had a small discussion about this on an OP on bsky. A somewhat interesting discussion over there.
https://bsky.app/profile/simonpcouch.com/post/3mcuf3eazzs2c
As long as it's unaccounted for by users it's at best anexternality. I think it may demand regulation to force this cost to the surface.
electricity and cooling incur wider costs and consequences.
> As long as it's unaccounted for by users it's at best anexternality.
Why is it an externality? Anthropic (or other model provider) pays the electricity cost, then it's passed along in the subscription or API bill. The direct cost of the energy is fully internalized in the price.
That's hardly unique to data centers.
I'm all for regulation that makes businesses pay for their externalities - I'd argue that's a key economic role that a government should play.
I mean if we really cared about this one bit we'd stop making a car based society in the US and save far more energy and pollution. That's not politically expedient and there are powerful vested interests in ensuring it doesn't happen.
That's why I think most of this data center energy use, especially over longer terms is a joke. Data center can pretty easily run on solar and wind energy if we spend even a small amount to political capital to make it happen.
I don't see how this follows. Data center operators buy energy and this is almost their only operating expense. Their products are priced to reflect this. The fact that basic AI features are free reflects the fact that they use almost no energy.
I would be surprised if AI prices reflect their current cost to provide the service, even inference costs. With so much money flowing into AI the goal isn't to make money, it's to grow faster than the competition.
From this article:
> For the purposes of this post, I’ll use the figures from the 100,000 “maximum”–Claude Sonnet and Opus 4.5 both have context windows of 200,000 tokens, and I run up against them regularly–to generate pessimistic estimates. So, ~390 Wh/MTok input, ~1950 Wh/MTok output.
Expensive commercial energy would be 30¢ per kWh in the US, so the energy cost implied by these figures would be about 12¢/MTok input and 60¢/MTok output. Anthropic's API cost for Opus 4.5 is $5/MTok input and $25/MTok output, nearly two orders of magnitude higher than these figures.
The direct energy cost of inference is still covered even if you assume that Claude Max/etc plans are offering a tenfold subsidy over the API cost.
I remain confident that most AI labs are not selling API access for less than it costs to serve the models.
If that's so common then what's your theory as to why Anthropic aren't price competitive with GPT-5.2?
Us person does not consume 1600 liters a day
I have a kids and a dishwasher (which with kids, runs quite often) but I’m not convinced I’m doing worse at energy consumption
That is a pretty good article although the one factor not mentioned that we see that has a huge impact on energy is batch size but that would be hard to estimate with the data he has.
We've only launched to friends and family but I'll share this here since its relevant: we have a service which actually optimizes and measures the energy of your AI use: https://portal.neuralwatt.com if you want to check it out. We also have a tools repo we put together that shows some demonstrations of surfacing energy metadata in to your tools: https://github.com/neuralwatt/neuralwatt-tools/
Our underlying technology is really about OS level energy optimization and datacenter grid flexibility so if you are on the pay by KWHr plan you get additional value as we continue to roll new optimizations out.
DM me with your email and I'd be happy to add some additional credits to you.
To add a bit more to what @scottcha is saying: overall GPU load has a fairly significant impact on the energy per result. Energy per result is inversely related, since the idle TDP of these servers is significant the more the energy gets spread the more efficient the system becomes. I imagine Anthropic is able to harness that efficiency since I imagine their servers are far from idle :)
You can infer the discount from the pricing of the batch API, which is presumably arranged for minimum inference costs. Anthropic offers a 50% discount there, which is consistent with other model providers.
LLMs don't use much energy at all to run, they use it all at the beginning for training, which is happening constantly right now.
TLDR this is, intentionally or not, an industry puff piece that completely misunderstands the problem.
Also, even if everyone is effectively running a a dishwasher cycle every day, this is still a problem that we can't just ignore, that's still a massive increase in ecological impact.
The training cost for a model is constant. The more individual use that model gets the lower the training-cost-per-inference-query gets, since that one-time training cost is shared across every inference prompt.
It is true that there are always more training runs going, and I don't think we'll ever find out how much energy was spent on experimental or failed training runs.
> The training cost for a model is constant
Constant until the next release? The battle for the benchmark-winning model is driving cadence up, and this competition probably puts a higher cost on training and evaluation too.
Sure. By "constant" there I meant it doesn't change depending on the number of people who use the model.
You underestimate the amount of inference and very much overestimate what training is.
Training is more or less the same as doing inference on an input token twice (forward and backward pass). But because its offline and predictable it can be done fully batched with very high utilization (efficiently).
Training is guestimate maybe 100 trillion total tokens but these guys apparently do inference on the quadrillion token monthly scales.
Training is pretty much irrelevant in the scheme of global energy use. The global airline industry uses the energy needed to train a frontier model, every three minutes, and unlike AI training the energy for air travel is 100% straight-into-your-lungs fossil carbon.
Not to mention doesn't aviation fuel still make heavy (heh) use of lead?
I think thats only true for propeller planes, which use leaded gasoline. Jet fuel is just kerosene
I'm not convinced that LLM training is at such a high energy use that it really matters in the big picture. You can train a (terrible) LLM on a laptop[1], and frankly that's less energy efficient than just training it on a rented cloud GPU.
Most of the innovation happening today is in post-training rather than pre-training, which is good for people concerned with energy use because post-training is relatively cheap (I was able to post-train a ~2b model in less than 6 hours on a rented cluster[2]).
[1]: https://github.com/lino-levan/wubus-1 [2]: https://huggingface.co/lino-levan/qwen3-1.7b-smoltalk