The conversation we keep having is the wrong one.
The standard discourse about AI's environmental footprint runs like this: someone publishes a number — a query uses this many watt-hours, a model trained on this many gigawatt-hours, a data center evaporates this many liters of water — and the number sits there, unmoored, as either evidence that AI is uniquely destructive or evidence that the panic is overblown. Both readings miss the same thing.
The interesting question isn't how much. It's compared to what.
A 0.34 watt-hour ChatGPT query5 means almost nothing on its own. It means something only when held against the alternative — the email you would have written instead, the Google search you would have run, the librarian you would have called, the analyst-hour you would have billed. The same is true at the project level. A ten-second AI-generated video clip9 consuming roughly a kilowatt-hour and four liters of water sounds shocking, until you ask: against what shoot? On what scale of production?
So we picked a deliberately concrete brief and answered it from end to end. The published evidence — six peer-reviewed papers, two industry-audited datasets covering 1,400+ ad productions, and disclosures from OpenAI5 and Google6 — converges on a result that is neither comforting nor damning. The live-action shoot emits 50 to 500 times more CO₂-equivalent than the AI version. It uses 100 to 200 times more electricity, end-to-end. It consumes 25 to 40 times more water. The gap survives every reasonable sensitivity test we threw at it. It does not, however, settle the larger debate, and we will get to that.
The brief, in detail.
The premise: a 3-minute commercial, a police car chase running on the I-405, through Downtown Los Angeles, ending in a residential suburb. Five principal actors. Five hero cars. Stunt extras. The kind of thing a brand like Dodge, Ford, or a streaming service might commission for a tentpole campaign. We costed it, in carbon and water and electricity, two ways.
Path A — Live action.
3 shoot days plus 1 prep day. A 40-to-60-person crew (line producer, director, AD, DPs, gaffers, grips, sound, art, stunt coordinator, hair/makeup, wardrobe, transport captains). Multi-location moves between the freeway, downtown, and the suburb. Camera cars, tow rigs, and 5 picture cars actively driven for chase choreography. Two 40–125 kW diesel generators powering lighting and base camp. Catering for 60 people for 3 days. We modeled it as LA-local — no air travel — which is the optimistic case. Add a director flown in from London and the carbon number jumps another 3 to 4 tonnes per round-trip ticket alone.
Path B — AI generation.
An end-to-end synthetic pipeline. Claude, ChatGPT, and Gemini handling scripting, beat development, and revisions. Kive for moodboarding and concept references. Higgsfield and Seedance for video generation, run hot — we assumed a generous 3:1 generation-to-keep ratio in the central case and 5:1 in the high case, reflecting the reality that current text-to-video models still produce a lot of unusable takes per keeper. Suno for music and sound design. Forty hours of human workstation time on top to stitch, color, and finish the spot in DaVinci Resolve.
The high-end workload we modeled, in concrete terms:
- 3,000 LLM messages across the three text models
- 300 still images (moodboards, concept frames, reference plates)
- 600 video clips at 720p–1080p, averaging 6 seconds each
- 30 Suno tracks for score and SFX exploration
- 40 hours of human workstation post-production
That is intentionally generous. A real production might use a third of those generations. We wanted the AI-side numbers to be conservative — meaning, weighted toward the heavy end — so the comparison would not be vulnerable to "but in practice you'd use way more video generations" objections. Even at 5:1 re-roll ratios with a Sora-2-class premium model in the high case, the totals stay in three-digit kWh territory.
What the AI side actually uses.
The dirty secret of LLM energy reporting is that the LLM part barely matters. Sam Altman disclosed in June 2025 that an average ChatGPT query consumes 0.34 watt-hours and roughly 0.32 milliliters of water5. Google disclosed in August 2025 that Gemini's median text query uses 0.24 Wh and emits just 0.03 grams of CO₂e6. Even the very long-context queries on advanced reasoning models that Jegham et al. measured in their 2025 paper top out around 30 Wh3 — for a 7,000-word input. Three thousand mixed messages across Claude, ChatGPT, and Gemini works out to between 1.5 and 4 kilowatt-hours total. That is a rounding error compared to running a hair dryer for an afternoon.
Video is the entire game.
Li, Jiang, and Tiwari's 2024 HotCarbon paper1 on Open-Sora — still the most rigorous public benchmark for text-to-video carbon — established three findings that hold up under every replication attempt since:
- A single video frame at 240p costs about 78 times the carbon of a single LLM token at comparable model size.
- Resolution scales near-quadratically: 720p generation produces roughly 10× the carbon of 240p at the same length.
- Duration scales steeper than linearly in the full pipeline: doubling clip length quadruples energy.
That makes video diffusion — not LLM scripting, not music gen, not moodboarding — the cost driver in any AI-first production workflow. Sora-2-class commercial models, where energy disclosures exist, cluster around 0.94 kWh of compute and 4 liters of water cooling per 10-second clip9, with roughly 466 grams of CO₂e on a U.S.-mixed grid.
So the math works out cleanly. 600 video clips at 6 seconds average, with a 3:1 generation-to-keep ratio, means 1,800 actual generations × 0.6 (scaling Sora-2's 10-second figure to our 6-second average) ≈ 1,000 kWh-equivalent of compute time. The Suno tracks add maybe 5 kWh. The image generations add another 2–3 kWh. The 40 hours of workstation post adds 8–15 kWh.
Then we apply California's grid carbon intensity. CARB's 2026 Lookup Table puts California at 65.07 gCO₂e/MJ8, which converts to roughly 234 gCO₂e/kWh — among the cleanest grids in the U.S. because of the state's heavy renewables share. The whole AI workflow ends up in a tight band: 75 to 290 kilowatt-hours of electricity, 18 to 122 kilograms of CO₂e. Even if we punt the data center to a U.S. mixed grid (~400 gCO₂e/kWh) and use the high case throughout, the carbon number tops out around 120 kg.
What the LA shoot actually uses.
This is where the comparison gets uncomfortable for the AI-skeptic position. AdGreen's 2023 Annual Review4 — drawn from 1,424 audited ad-production projects, methodology reviewed by PwC and MediaSense — found that productions above £50,000 per shoot day average 13.9 tonnes of CO₂e. Across their full dataset, travel and transport account for 60 percent of all emissions; on the largest projects, air travel alone is 49 percent. Energy and on-set fuel are the next-biggest category at 24 percent.
A 3-day LA car-chase shoot with five hero cars sits comfortably above their average. Picture cars burn fuel as a feature of the action, not just as transportation. Lighting a freeway sequence and a night-suburb sequence both demand significant generator power. Equipment trucks make repeated location moves between the 405, DTLA, and the suburbs. A bottom-up reconstruction lands here:
The lower bound assumes battery-electric generators (which are penetrating LA shoots fast but are not yet default), an LED-only lighting plot, optimal logistics, and crew from a tight 25-mile radius. The upper bound assumes diesel generators, conventional HMI lighting on the freeway plate, and a slightly longer 4-day shoot with a half-day for stunt rehearsal. Neither bound includes flying anyone in. AdGreen's data4 would put a London-to-LA round-trip at roughly 3 to 4 tonnes per economy ticket — and several more for crew shipping equipment cases.
Where the bottlenecks actually are.
If you are trying to lower the footprint of either workflow, the lever is not where most people look.
For live action, replacing diesel generators with battery banks gets you maybe 8 percent. Switching to LED lighting gets you another 4. Both are worth doing. Neither moves the needle compared to not flying people across an ocean. The single most leveraged decision in any commercial production is whether you cast and crew locally. AdGreen's auditors found that on £50K+/day projects, air travel alone often dwarfs every other category combined4.
For AI, the lever is resolution and re-roll discipline. Generating at 1080p when 720p would suffice quadruples your carbon. Letting a director ask for "ten more variations" of a clip that was already fine is the AI-workflow equivalent of leaving the generator on overnight. The post-production LLM use, the music gen, the moodboarding — none of it matters at this scale. Cut your video re-roll ratio from 5:1 to 2:1 and you cut your AI carbon roughly in half.
The aggregate caveat matters too.
Per-job AI footprints are small. Aggregate AI footprints are not. The same peer-reviewed sources used here are explicit about this distinction, and any honest framing of the result has to preserve it.
ChatGPT alone serves roughly 2.5 billion queries per day, which works out to about 310 GWh per year just for text inference5. The water footprint research from Li, Yang, Islam, and Ren2 projects that U.S. AI data centers will consume between 731 and 1,125 million cubic meters of water annually by 2030 — equivalent to the daily water needs of 6 to 10 million households. Cornell researchers10 have flagged similar trajectories for grid load and siting.
That is the half of the conversation that gets ignored when individual users feel reassured by per-query disclosures. A single ChatGPT message is genuinely tiny. Thirty-billion-per-month of them, multiplied by an industry where every product is racing to add an inference layer, is genuinely a lot.
So both things are true. Per-asset, AI is dramatically more efficient than the analog production it displaces. In aggregate, AI compute is on a steep growth curve that real grid and water systems will have to absorb. A reader walking away with "AI is essentially free" or "AI is uniquely destructive" has the wrong answer on both counts.
What to do with this.
If you are a brand commissioning a commercial in 2026, the comparative footprint is no longer a tiebreaker — it is one of the most lopsided sustainability decisions you can make. Producing a single 3-minute spot live emits roughly the same CO₂ as 12 to 18 average American cars driven for an entire year7. Producing it with AI emits the same as one car driven for one to ten weeks. If you make four commercials a year and shift two to AI-first production, you are doing more for your CSR report than most ESG initiatives.
If you are an agency, this is also a margin argument. AI production at this quality level, today, runs roughly 5 to 15 percent of a comparable live-action budget. That is the part the public conversation is least equipped to discuss honestly.
If you are a policymaker, the per-job efficiency story is real but it does not tell you what to regulate. The aggregate trajectory is what matters at the national-grid level — and the right regulations are about siting, water rights, and grid carbon intensity at the data center level, not about whether individual users should feel guilty.
And if you are a reader who has spent the last two years feeling vaguely bad about ChatGPT use, the math on that is plain: a year of personal queries, even heavy use, is the carbon equivalent of less than a single tank of gas. Save your worry for the flight you are about to book.