EN / 中文 ↗
v0.4 · All 98 entity wiki pages written · Pending jimmyresearch.com deployment

Token Economics in the AI Era

From physical compute to the end user: a 4-layer value chain stratified by "Unit of Trade"

Research: Jimmy · 2026-05-10 · All chips linked to wiki entities (effective after deployment)

The through-line of this research is the production and consumption of tokens. The ultimate goal: from underlying compute to end-user price, see clearly how value is distributed at every link in the chain and who takes the lion's share.

The key change in v0.2 was establishing an organizing principle: each layer uses its "output unit of trade" as the stratification marker. This turns "layer" from a "role label" into a "rigid criterion" — same transaction unit means same layer.

§ 1Organizing Principle: Unit of Trade Defines the Layer

What defines each layer is not "what it does" but "what unit of trade it outputs." The unit of trade converts between layers; the conversion point is the layer boundary.

Layer Input Key Conversion Output (Unit of Trade)
L1 Physical Compute Infrastructure + Chips Deployment + Integration GPU + Rack (physical hardware)
L2 Infra GPU + Rack Deployment + Operations GPU Hours (machine-hours)
L3 Model & API GPU Hours Training (→ Model Weights) + Inference Token API (per M tokens)
L4 Consumption Token API Wholesale / Direct Sale / Application Integration Token API (Path 1·2 → developers)
Subscription + Credit (Path 3 → end users)

§ 2Value Chain Framework

Top to bottom, arrows label the unit of trade passed between each layer. L3 and L4 each have an internal sub-structure (1P/3P, wholesale/retail) — expanded below.

L1 Physical Compute Physical Infrastructure

The bottom layer of hardware reality — infrastructure (data center / power / networking) + the AI chips themselves. Outputs "physical carriers of compute."

AInfrastructure

Physical space, stable power, high-speed interconnect, and electrical & cooling equipment — the AI factory's "land + nervous system."

A.cPower Distribution & Cooling
▸ Electrical (UPS / PDU / switchgear)
▸ Liquid cooling · DLC and cold plates (independent players almost entirely acquired by industrial conglomerates)
▸ Liquid cooling · Immersion (independent specialist players)
▸ More long-tail players (Legrand · Generac · Kohler · JetCool · Chilldyne, etc.) + full M&A landscape and bottleneck cascade analysis see L1 Physical Compute Deep-Dive Module
Baseline · A 1 MW AI DC total construction cost: ~$20–30M (high-density, AI-optimized, including full-stack electrical + cooling + networking equipment) · PUE 1.1–1.3 · Electricity price $0.05–0.12 / kWh · Depreciation 20–30 years · AI rack density 30–100+ kW (traditional ~10 kW)
Sub-itemShareKey Suppliers
Electrical (UPS / PDU / switchgear / distribution)40–50%Vertiv · Schneider · Eaton · ABB
Cooling (mostly liquid)15–20%Vertiv · CoolIT · nVent · Boyd
Networking (switches + optical modules + NIC)12–18%Arista · Broadcom · NVIDIA · Coherent · Innolight
Building shell15–20%Colocation providers + general contractors
Design / management / permits5–10%
Sources: JLL / CBRE / Synergy Research / Wood Mackenzie / Dell'Oro · Vertiv·Schneider·Arista·Broadcom·Coherent·Innolight Q1 2026 earnings (composite estimate)
BChips (AI Accelerators)

Turn silicon into AI compute units, sold or self-used in the form of "cards" or "systems."

B.bGoogle TPU (in-house, self-use + external via GCP)
B.cCloud-provider custom chips (Amazon · Microsoft, in-house + partial external)
Baseline · B Representative chip single-card pricing (2026 estimate): NVIDIA H100 ~$25–40k · B200 ~$30–50k · GB200 NVL72 rack ~$3M (72 cards) · AMD MI300X ~$20–30k
Per-card power: H100 700W · B200 1000W · GB200 superchip ~2700W · HBM: H100 80GB · H200 141GB · B200 192GB
Depreciation period: 5–6 years (per hyperscaler 10-K filings) — vs infrastructure 20–30 years. Over a single DC's lifetime, you replace GPUs 4–5 times. Sources: MSFT / AMZN 10-K (depreciation) · SemiAnalysis · secondary-market reporting
Upstream ▲ The physical manufacturing of every B-class chip depends on wafer foundries — the "L0" layer beneath L1. TSMC · Samsung Foundry
Synthesis · 1 MW Capacity Combining A infrastructure + B chips + power: full-stack annualized cost of 1 MW AI capacity.
ItemCapital InvestmentDepreciation PeriodAnnualized Cost
Infrastructure (DC building)~$15M25 yr~$0.6M
GPUs (~750 H100 @ $35k)~$26M5 yr~$5.2M
Power (1 MW × $0.08/kWh × 8760h)~$0.7M
Total~$41M~$6.5M / year
Inference: physical cost floor of a single H100·hour ≈ $1.0 / hr ($6.5M ÷ 750 cards ÷ 8760h). Within that, GPU depreciation accounts for ~80%, infrastructure + power together ~20% → L1 total cost is dominated by the chip.
vs L2 New Cloud (CoreWeave / Lambda) H100 reserved ~$2–3/hr → markup 2–3×, gross margin ~50% (consistent with public earnings disclosures).
Unit of Trade
GPU + Rack
Sale / Lease · one-time capital
L2 Infra Layer Providers / Operators

Directly owns and operates L1 assets, converting hardware into "machine-hours" sold externally.

Unit of Trade
GPU Hours
$/GPU·hour · billed by machine-hour
L3 Model & API Layer Model & API

Converts GPU Hours into a callable Token API. Internally formed by the convergence of two streams: training and inference.

Internal conversion flow
Input: GPU Hours
Training
Burns corpora into model weights. A one-off sunk cost, often tens to hundreds of millions of dollars.
→ Model weights (open / closed source)
Inference
Turns weights × GPU into real-time token capacity. Every call is a marginal cost.
→ Inference capacity
▼ Merge ▼
Model weights + inference capacity = Token API
Output: Token API (per M tokens)
Player taxonomy: 1P vs 3P have radically different cost structures

Both kinds ultimately "output a Token API," but the costs they bear are entirely different — a critical cut that any subsequent "value-chain verification" must make cleanly.

L3a · 1P (self-trained + self-operated)

Bears training sunk cost + in-house inference marginal cost. Strong pricing power, must amortize training investment.

L3b · 3P (deploying others' weights)

Bears only inference cost (open weights are free; licensed weights are paid). Essentially Inference-as-a-Service.

Engines powering 3P: Inferact (vLLM, $800M)RadixArk (SGLang, $400M)NVIDIA TensorRT-LLM

Baseline · DeepSeek Use DeepSeek as the L3 break-even calibration point. Open weights + extreme compression (MoE architecture + in-house inference optimization) let it, while consuming an equivalent amount of underlying GPU resources, push API pricing down to ~$0.14 / $0.28 per M token (input / output).
Usage: treat DeepSeek as the break-even floor under "zero brand premium + zero training amortization + inference-efficiency ceiling." The spread between any other 1P / 3P pricing and DeepSeek = the sum of that provider's brand premium + training amortization + inference-efficiency loss. Subsequent L3 markup calculations use DeepSeek as the calibration anchor.
Unit of Trade
Token API
$/M tokens · in / out billed separately
L4 Consumption Layer Token Consumption

The Token API is consumed at this layer. After entering from L3, it splits into three parallel paths flowing to different endpoints — some continues to circulate as Token API among developers; some is packaged by application-layer companies into "Subscription + Credit" sold to end users.

Input: Token API (from L3)
Path 1 Wholesale

Via an Aggregator or Token Brokerage, N 1P / 3P APIs are aggregated behind a single interface and then resold at a markup.

Output Token API (with markup) Other developers
Path 2 Direct Sale

An AI Lab (1P) or a 3P provider sells the Token API directly to developers, with no middleman markup.

Output Token API (direct) Developers
Path 3 Application Integration

The Token API enters application-layer companies, where it is wrapped into vertical products by industry. These companies ultimately sell to end users as "Subscription + Credit" — the highest-markup segment of the entire value chain happens here.

General Chatbot ChatGPT Claude.ai Gemini Poe
Coding Cursor Windsurf GitHub Copilot Cognition · Devin Replit
Legal Harvey Robin AI Spellbook
Presentation / Slides Gamma Tome Beautiful.ai Prezi Minds
Customer Service Crescendo Decagon Sierra
Search Tavily Perplexity You.com
Sales / Marketing Attentive Postscript Jasper Copy.ai
Finance Surf
AI + Payment Coinbase AgentKit Stripe · Tempo MPP Skyfire Kite AI x402 Protocol
Multi-channel IM Tinyfish Locus
Personal Agent Magic Character.ai Pi
Notes / Writing Notion AI Granola
Output Subscription + Credit End user

§ 3Vertical Integration

The unit of trade defines crisp layer boundaries, but the leading companies frequently swallow up adjacent layers, folding upstream and downstream into a single house. When doing "value-chain verification," you must first separate these companies' financials from any single-layer markup calculation — otherwise gross margins get blended into a single pot.

Direction Representatives Economic Implication
L3 → L1
AI Lab builds own compute
OpenAI Stargate · Google TPU+DC · Meta 350k H100 · xAI Colossus Big labs skip L2 and procure bare metal / build their own data centers directly. L2 New Cloud's bargaining ceiling is pushed down.
L2 → L3b
Cloud providers move up to API
AWS Bedrock · Azure OpenAI · GCP Vertex L2 uses its distribution to package closed-source models as 3P APIs and resell them, blurring the 1P / 3P boundary and absorbing part of L3 profit.
L2 + L3b integrated
Independent 3P with in-house compute
Together AI · Fireworks · Groq A single company straddles L2 / L3b. Token cost cannot be cleanly separated in the books, and external pricing does not expose per-layer markups.
L1 → L3 / L4
NVIDIA software tax
NVIDIA NIM · DGX Cloud A hardware vendor jumps over L2 and goes straight to inference services and API platforms. CUDA lock-in extends upward.
L3a → L4 Path 3
1P self-operated end-product
OpenAI ChatGPT · Anthropic Claude.ai · Google Gemini app An AI Lab runs its own end-product, bundling 1P API + application under one roof. Gross margin is not observable from outside.