From physical compute to the end user: a 4-layer value chain stratified by "Unit of Trade"
The through-line of this research is the production and consumption of tokens. The ultimate goal: from underlying compute to end-user price, see clearly how value is distributed at every link in the chain and who takes the lion's share.
The key change in v0.2 was establishing an organizing principle: each layer uses its "output unit of trade" as the stratification marker. This turns "layer" from a "role label" into a "rigid criterion" — same transaction unit means same layer.
What defines each layer is not "what it does" but "what unit of trade it outputs." The unit of trade converts between layers; the conversion point is the layer boundary.
Top to bottom, arrows label the unit of trade passed between each layer. L3 and L4 each have an internal sub-structure (1P/3P, wholesale/retail) — expanded below.
The bottom layer of hardware reality — infrastructure (data center / power / networking) + the AI chips themselves. Outputs "physical carriers of compute."
Physical space, stable power, high-speed interconnect, and electrical & cooling equipment — the AI factory's "land + nervous system."
| Sub-item | Share | Key Suppliers |
|---|---|---|
| Electrical (UPS / PDU / switchgear / distribution) | 40–50% | Vertiv · Schneider · Eaton · ABB |
| Cooling (mostly liquid) | 15–20% | Vertiv · CoolIT · nVent · Boyd |
| Networking (switches + optical modules + NIC) | 12–18% | Arista · Broadcom · NVIDIA · Coherent · Innolight |
| Building shell | 15–20% | Colocation providers + general contractors |
| Design / management / permits | 5–10% | — |
Turn silicon into AI compute units, sold or self-used in the form of "cards" or "systems."
| Item | Capital Investment | Depreciation Period | Annualized Cost |
|---|---|---|---|
| Infrastructure (DC building) | ~$15M | 25 yr | ~$0.6M |
| GPUs (~750 H100 @ $35k) | ~$26M | 5 yr | ~$5.2M |
| Power (1 MW × $0.08/kWh × 8760h) | — | — | ~$0.7M |
| Total | ~$41M | — | ~$6.5M / year |
Directly owns and operates L1 assets, converting hardware into "machine-hours" sold externally.
Converts GPU Hours into a callable Token API. Internally formed by the convergence of two streams: training and inference.
Both kinds ultimately "output a Token API," but the costs they bear are entirely different — a critical cut that any subsequent "value-chain verification" must make cleanly.
Bears training sunk cost + in-house inference marginal cost. Strong pricing power, must amortize training investment.
Bears only inference cost (open weights are free; licensed weights are paid). Essentially Inference-as-a-Service.
Engines powering 3P: Inferact (vLLM, $800M)RadixArk (SGLang, $400M)NVIDIA TensorRT-LLM
The Token API is consumed at this layer. After entering from L3, it splits into three parallel paths flowing to different endpoints — some continues to circulate as Token API among developers; some is packaged by application-layer companies into "Subscription + Credit" sold to end users.
Via an Aggregator or Token Brokerage, N 1P / 3P APIs are aggregated behind a single interface and then resold at a markup.
An AI Lab (1P) or a 3P provider sells the Token API directly to developers, with no middleman markup.
The Token API enters application-layer companies, where it is wrapped into vertical products by industry. These companies ultimately sell to end users as "Subscription + Credit" — the highest-markup segment of the entire value chain happens here.
| General Chatbot | ChatGPT Claude.ai Gemini Poe |
|---|---|
| Coding | Cursor Windsurf GitHub Copilot Cognition · Devin Replit |
| Legal | Harvey Robin AI Spellbook |
| Presentation / Slides | Gamma Tome Beautiful.ai Prezi Minds |
| Customer Service | Crescendo Decagon Sierra |
| Search | Tavily Perplexity You.com |
| Sales / Marketing | Attentive Postscript Jasper Copy.ai |
| Finance | Surf |
| AI + Payment | Coinbase AgentKit Stripe · Tempo MPP Skyfire Kite AI x402 Protocol |
| Multi-channel IM | Tinyfish Locus |
| Personal Agent | Magic Character.ai Pi |
| Notes / Writing | Notion AI Granola |
The unit of trade defines crisp layer boundaries, but the leading companies frequently swallow up adjacent layers, folding upstream and downstream into a single house. When doing "value-chain verification," you must first separate these companies' financials from any single-layer markup calculation — otherwise gross margins get blended into a single pot.
| Direction | Representatives | Economic Implication |
|---|---|---|
| L3 → L1 AI Lab builds own compute |
OpenAI Stargate · Google TPU+DC · Meta 350k H100 · xAI Colossus | Big labs skip L2 and procure bare metal / build their own data centers directly. L2 New Cloud's bargaining ceiling is pushed down. |
| L2 → L3b Cloud providers move up to API |
AWS Bedrock · Azure OpenAI · GCP Vertex | L2 uses its distribution to package closed-source models as 3P APIs and resell them, blurring the 1P / 3P boundary and absorbing part of L3 profit. |
| L2 + L3b integrated Independent 3P with in-house compute |
Together AI · Fireworks · Groq | A single company straddles L2 / L3b. Token cost cannot be cleanly separated in the books, and external pricing does not expose per-layer markups. |
| L1 → L3 / L4 NVIDIA software tax |
NVIDIA NIM · DGX Cloud | A hardware vendor jumps over L2 and goes straight to inference services and API platforms. CUDA lock-in extends upward. |
| L3a → L4 Path 3 1P self-operated end-product |
OpenAI ChatGPT · Anthropic Claude.ai · Google Gemini app | An AI Lab runs its own end-product, bundling 1P API + application under one roof. Gross margin is not observable from outside. |