v0.4 · All 98 entity wiki pages written · Pending jimmyresearch.com deployment

Token Economics in the AI Era

From physical compute to the end user: a 4-layer value chain stratified by "Unit of Trade"

Research: Jimmy · 2026-05-10 · All chips linked to wiki entities (effective after deployment)

The through-line of this research is the production and consumption of tokens. The ultimate goal: from underlying compute to end-user price, see clearly how value is distributed at every link in the chain and who takes the lion's share.

The key change in v0.2 was establishing an organizing principle: each layer uses its "output unit of trade" as the stratification marker. This turns "layer" from a "role label" into a "rigid criterion" — same transaction unit means same layer.

§ 1Organizing Principle: Unit of Trade Defines the Layer

What defines each layer is not "what it does" but "what unit of trade it outputs." The unit of trade converts between layers; the conversion point is the layer boundary.

Layer	Input	Key Conversion	Output (Unit of Trade)
L1 Physical Compute	Infrastructure + Chips	Deployment + Integration	GPU + Rack (physical hardware)
L2 Infra	GPU + Rack	Deployment + Operations	GPU Hours (machine-hours)
L3 Model & API	GPU Hours	Training (→ Model Weights) + Inference	Token API (per M tokens)
L4 Consumption	Token API	Wholesale / Direct Sale / Application Integration	Token API (Path 1·2 → developers) Subscription + Credit (Path 3 → end users)

§ 2Value Chain Framework

Top to bottom, arrows label the unit of trade passed between each layer. L3 and L4 each have an internal sub-structure (1P/3P, wholesale/retail) — expanded below.

L1 Physical Compute Physical Infrastructure

The bottom layer of hardware reality — infrastructure (data center / power / networking) + the AI chips themselves. Outputs "physical carriers of compute."

AInfrastructure

Physical space, stable power, high-speed interconnect, and electrical & cooling equipment — the AI factory's "land + nervous system."

A.aData Center / Colocation

Equinix Digital Realty QTS (Blackstone) Iron Mountain

A.bPower & Backup

Crusoe Energy Caterpillar Cummins GE Vernova (gas turbines) Rolls-Royce · mtu

A.cPower Distribution & Cooling

▸ Electrical (UPS / PDU / switchgear)

Vertiv (full-stack power + cooling) Schneider Electric Eaton ABB Siemens

▸ Liquid cooling · DLC and cold plates (independent players almost entirely acquired by industrial conglomerates)

CoolIT → Ecolab ($4.75B) Boyd → Eaton ($9.5B) Motivair → Schneider LiquidStack → Trane nVent (still independent)

▸ Liquid cooling · Immersion (independent specialist players)

GRC (immersion pioneer) Submer Asperitas

▸ More long-tail players (Legrand · Generac · Kohler · JetCool · Chilldyne, etc.) + full M&A landscape and bottleneck cascade analysis see L1 Physical Compute Deep-Dive Module

A.dNetworking

Arista (Ethernet switches) Broadcom (switch silicon + custom ASIC) NVIDIA · Mellanox (InfiniBand)

Coherent (optical modules + lasers) Lumentum (silicon photonics) Innolight Zhongji Innolight (#1 by shipments)

Baseline · A 1 MW AI DC total construction cost: ~$20–30M (high-density, AI-optimized, including full-stack electrical + cooling + networking equipment) · PUE 1.1–1.3 · Electricity price $0.05–0.12 / kWh · Depreciation 20–30 years · AI rack density 30–100+ kW (traditional ~10 kW)

Sub-item	Share	Key Suppliers
Electrical (UPS / PDU / switchgear / distribution)	40–50%	Vertiv · Schneider · Eaton · ABB
Cooling (mostly liquid)	15–20%	Vertiv · CoolIT · nVent · Boyd
Networking (switches + optical modules + NIC)	12–18%	Arista · Broadcom · NVIDIA · Coherent · Innolight
Building shell	15–20%	Colocation providers + general contractors
Design / management / permits	5–10%	—

Sources: JLL / CBRE / Synergy Research / Wood Mackenzie / Dell'Oro · Vertiv·Schneider·Arista·Broadcom·Coherent·Innolight Q1 2026 earnings (composite estimate)

BChips (AI Accelerators)

Turn silicon into AI compute units, sold or self-used in the form of "cards" or "systems."

B.aMainstream GPU chips (open market, anyone can buy)

NVIDIA · H100 / H200 / B200 / GB200 AMD · MI300X / MI325X Intel · Gaudi 3 Cerebras · WSE-3 (wafer-scale, 850k cores) Huawei · Ascend 910B / 910C

B.bGoogle TPU (in-house, self-use + external via GCP)

Google TPU · v5p / v6 Trillium

B.cCloud-provider custom chips (Amazon · Microsoft, in-house + partial external)

AWS · Trainium 2 (training) AWS · Inferentia 2 (inference) Microsoft · Maia 100 / 200

Baseline · B Representative chip single-card pricing (2026 estimate): NVIDIA H100 ~$25–40k · B200 ~$30–50k · GB200 NVL72 rack ~$3M (72 cards) · AMD MI300X ~$20–30k
Per-card power: H100 700W · B200 1000W · GB200 superchip ~2700W · HBM: H100 80GB · H200 141GB · B200 192GB
Depreciation period: 5–6 years (per hyperscaler 10-K filings) — vs infrastructure 20–30 years. Over a single DC's lifetime, you replace GPUs 4–5 times. Sources: MSFT / AMZN 10-K (depreciation) · SemiAnalysis · secondary-market reporting

Upstream ▲ The physical manufacturing of every B-class chip depends on wafer foundries — the "L0" layer beneath L1. TSMC · Samsung Foundry

Synthesis · 1 MW Capacity Combining A infrastructure + B chips + power: full-stack annualized cost of 1 MW AI capacity.

Item	Capital Investment	Depreciation Period	Annualized Cost
Infrastructure (DC building)	~$15M	25 yr	~$0.6M
GPUs (~750 H100 @ $35k)	~$26M	5 yr	~$5.2M
Power (1 MW × $0.08/kWh × 8760h)	—	—	~$0.7M
Total	~$41M	—	~$6.5M / year

Inference: physical cost floor of a single H100·hour ≈ $1.0 / hr ($6.5M ÷ 750 cards ÷ 8760h). Within that, GPU depreciation accounts for ~80%, infrastructure + power together ~20% → L1 total cost is dominated by the chip.
vs L2 New Cloud (CoreWeave / Lambda) H100 reserved ~$2–3/hr → markup 2–3×, gross margin ~50% (consistent with public earnings disclosures).

Unit of Trade

GPU + Rack

Sale / Lease · one-time capital

L2 Infra Layer Providers / Operators

Directly owns and operates L1 assets, converting hardware into "machine-hours" sold externally.

(a)Hyperscalers

AWS Microsoft Azure Google Cloud Oracle Cloud Alibaba Cloud Tencent Cloud Huawei Cloud

(b)New Cloud (AI-native)

CoreWeave Nebius Lambda Labs RunPod Vast.ai Crusoe Voltage Park

Unit of Trade

GPU Hours

$/GPU·hour · billed by machine-hour

L3 Model & API Layer Model & API

Converts GPU Hours into a callable Token API. Internally formed by the convergence of two streams: training and inference.

Internal conversion flow

Input: GPU Hours

Training

Burns corpora into model weights. A one-off sunk cost, often tens to hundreds of millions of dollars.

→ Model weights (open / closed source)

Inference

Turns weights × GPU into real-time token capacity. Every call is a marginal cost.

→ Inference capacity

▼ Merge ▼

Model weights + inference capacity = Token API

Output: Token API (per M tokens)

Player taxonomy: 1P vs 3P have radically different cost structures

Both kinds ultimately "output a Token API," but the costs they bear are entirely different — a critical cut that any subsequent "value-chain verification" must make cleanly.

L3a · 1P (self-trained + self-operated)

Bears training sunk cost + in-house inference marginal cost. Strong pricing power, must amortize training investment.

OpenAI · GPT Anthropic · Claude Google · Gemini xAI · Grok Mistral DeepSeek Moonshot · Kimi Qwen (Alibaba) Cohere Zhipu · GLM Meta · Llama (open weights only, no in-house API commercialization)

L3b · 3P (deploying others' weights)

Bears only inference cost (open weights are free; licensed weights are paid). Essentially Inference-as-a-Service.

Together AI Fireworks AI Groq (in-house LPU) DeepInfra SambaNova Replicate Modal Baseten

AWS Bedrock (hosting Claude / Llama / Mistral) Azure OpenAI Service (hosting the full GPT line) GCP Vertex (hosting Claude / Gemini)

Engines powering 3P: Inferact (vLLM, $800M)RadixArk (SGLang, $400M)NVIDIA TensorRT-LLM

Baseline · DeepSeek Use DeepSeek as the L3 break-even calibration point. Open weights + extreme compression (MoE architecture + in-house inference optimization) let it, while consuming an equivalent amount of underlying GPU resources, push API pricing down to ~$0.14 / $0.28 per M token (input / output).

Usage: treat DeepSeek as the break-even floor under "zero brand premium + zero training amortization + inference-efficiency ceiling." The spread between any other 1P / 3P pricing and DeepSeek = the sum of that provider's brand premium + training amortization + inference-efficiency loss. Subsequent L3 markup calculations use DeepSeek as the calibration anchor.

Unit of Trade

Token API

$/M tokens · in / out billed separately

L4 Consumption Layer Token Consumption

The Token API is consumed at this layer. After entering from L3, it splits into three parallel paths flowing to different endpoints — some continues to circulate as Token API among developers; some is packaged by application-layer companies into "Subscription + Credit" sold to end users.

Path 1 Wholesale

Via an Aggregator or Token Brokerage, N 1P / 3P APIs are aggregated behind a single interface and then resold at a markup.

Representative players

OpenRouter (300+ models / 100+ providers) Portkey AI Eden AI

Output Token API (with markup) → Other developers

Path 2 Direct Sale

An AI Lab (1P) or a 3P provider sells the Token API directly to developers, with no middleman markup.

Representative players · 1P direct

OpenAI API Anthropic API DeepSeek API Moonshot Kimi API Mistral La Plateforme Google AI Studio

Representative players · 3P direct

Together AI Fireworks AI Groq DeepInfra AWS Bedrock Azure OpenAI GCP Vertex

Output Token API (direct) → Developers

Path 3 Application Integration

The Token API enters application-layer companies, where it is wrapped into vertical products by industry. These companies ultimately sell to end users as "Subscription + Credit" — the highest-markup segment of the entire value chain happens here.

Examples by domain

General Chatbot	ChatGPT Claude.ai Gemini Poe
Coding	Cursor Windsurf GitHub Copilot Cognition · Devin Replit
Legal	Harvey Robin AI Spellbook
Presentation / Slides	Gamma Tome Beautiful.ai Prezi Minds
Customer Service	Crescendo Decagon Sierra
Search	Tavily Perplexity You.com
Sales / Marketing	Attentive Postscript Jasper Copy.ai
Finance	Surf
AI + Payment	Coinbase AgentKit Stripe · Tempo MPP Skyfire Kite AI x402 Protocol
Multi-channel IM	Tinyfish Locus
Personal Agent	Magic Character.ai Pi
Notes / Writing	Notion AI Granola

Output Subscription + Credit → End user

§ 3Vertical Integration

The unit of trade defines crisp layer boundaries, but the leading companies frequently swallow up adjacent layers, folding upstream and downstream into a single house. When doing "value-chain verification," you must first separate these companies' financials from any single-layer markup calculation — otherwise gross margins get blended into a single pot.

Direction	Representatives	Economic Implication
L3 → L1 AI Lab builds own compute	OpenAI Stargate · Google TPU+DC · Meta 350k H100 · xAI Colossus	Big labs skip L2 and procure bare metal / build their own data centers directly. L2 New Cloud's bargaining ceiling is pushed down.
L2 → L3b Cloud providers move up to API	AWS Bedrock · Azure OpenAI · GCP Vertex	L2 uses its distribution to package closed-source models as 3P APIs and resell them, blurring the 1P / 3P boundary and absorbing part of L3 profit.
L2 + L3b integrated Independent 3P with in-house compute	Together AI · Fireworks · Groq	A single company straddles L2 / L3b. Token cost cannot be cleanly separated in the books, and external pricing does not expose per-layer markups.
L1 → L3 / L4 NVIDIA software tax	NVIDIA NIM · DGX Cloud	A hardware vendor jumps over L2 and goes straight to inference services and API platforms. CUDA lock-in extends upward.
L3a → L4 Path 3 1P self-operated end-product	OpenAI ChatGPT · Anthropic Claude.ai · Google Gemini app	An AI Lab runs its own end-product, bundling 1P API + application under one roof. Gross margin is not observable from outside.