May 23, 2026 engineering infrastructure data

Ten Exchanges, Ten Dialects

We added a 10th exchange API to our funding rate oracle this morning. It should have taken twenty minutes. Instead, it surfaced a $256 billion discrepancy, three categories of unit mismatch, and a silent normalization bug that had been degrading our composite for weeks. This is what data aggregation actually looks like — and why APIs, as currently designed, are not ready for machine consumption.

This is a companion piece to our thesis on attested state infrastructure. That paper argues theoretically that the maintenance cost of data aggregation — not the build cost — is what makes shared infrastructure economically rational. This post is the lab notebook.

1. The Task

Our funding rate oracle, MSFR, aggregates perpetual swap funding rates across cryptocurrency exchanges. Funding rates are the mechanism that anchors perpetual futures to spot price — when longs pay shorts, the rate is positive; when shorts pay longs, negative. The aggregate rate across exchanges is a critical input for sentiment indices, stress detection, and carry trade strategies.

Adding a new exchange to the oracle should be mechanical: read their API docs, write a fetcher, map the fields, add it to the composite calculation. The data is conceptually identical across venues — a single number, updated periodically, representing the cost of holding a leveraged position.

One number. Ten providers. How different can it be?

2. Ten Dialects for the Same Number

Every exchange reports the same economic concept — the funding rate on a BTC perpetual swap — but each speaks its own dialect. Not a different language. A dialect: close enough to read, different enough to break.

Exchange	Rate field	Period	OI field	OI unit
Binance	`fundingRate`	8 hours	`openInterest`	Coins (BTC)
Bybit	`fundingRate`	8 hours	`openInterest`	Coins (BTC)
OKX	`fundingRate`	8 hours	`oi` / `oiCcy` / `oiUsd`	Contracts
Deribit	`current_funding`	8 hours	`open_interest`	USD
Kraken	`relativeFundingRate`	1 hour	—	—
Coinbase	`predicted_funding`	1 hour	`open_interest`	USD
Hyperliquid	`fundingRate`	1 hour	Computed	Coins
dYdX	`nextFundingRate`	1 hour	`openInterest`	Coins
Bitget	`fundingRate`	8 hours	—	—
Bitstamp	`funding_rate`	4 hours	`open_interest`	Coins

Ten providers. Seven different field names for the rate. Four different settlement periods. Five different units for open interest. Three exchanges that don't expose open interest at all on their funding endpoint. One exchange (OKX) that exposes three different OI fields in three different units on the same response.

This is one data point — the current funding rate — from one asset class — BTC perpetuals. A single conceptual number, from ten sources, requiring ten different parsing strategies, four different period normalizations, and five different unit conversions before you can compute a meaningful aggregate.

This is not a cryptocurrency problem. This is every API aggregation problem.

3. The Unit Catastrophe

The funding rate itself normalizes straightforwardly — multiply hourly rates by 8 to get an 8-hour equivalent, divide absolute rates by mark price for relative rates. Tedious but predictable.

Open interest is where things break.

An open-interest-weighted composite should give more weight to exchanges with more capital at stake. But "how much capital is at stake" is expressed in different units by every exchange, and the units are never labeled in the response:

Binance returns 105,490 — in BTC. Multiply by mark price: $7.96 billion.
Deribit returns 1,023,233,970 — in USD. Already correct: $1.02 billion.
OKX returns 3,401,143 — in contracts. Each contract is 0.01 BTC. Raw value times mark price: $256 billion.

$256 billion. OKX's actual BTC perpetual open interest is $2.57 billion. The raw API value, multiplied naively by mark price the way every other coin-denominated venue requires, produces a number 100× too large. This doesn't throw an error. It produces a number. It looks like a number. It quietly dominates the composite weighting and makes every downstream calculation wrong.

The fix is to use OKX's oiUsd field instead of their oi field. This field exists. It is not the default field. It is not the field named in their funding rate documentation. It is the third of three open interest fields on the same endpoint, each expressing the same value in a different unit, none of them labeled as the canonical choice for cross-exchange comparison.

There is no standard for how exchanges report open interest. There is no header, no metadata field, no schema annotation that tells a consumer "this number is in contracts, not coins." The consumer must know, per exchange, per instrument type, which field to use and what unit it is in. This knowledge lives in documentation (when accurate), in forum posts (when discoverable), or in production incidents (when neither).

4. The Invisible Bug

Here is a subtler failure mode. Our oracle had been collecting mark prices from all ten exchanges for weeks. Every fetcher returned a mark field. The data was correct. The data was stored. The data was never exposed in the API response.

{
  "exchange": "binance",
  "rate": 4.491e-05,
  "rate_pct": 0.004491,
  "predicted": 4.491e-05,
  "open_interest": 105468.085,
  "next_settlement_utc": "2026-05-23T08:00:00+00:00"
}

No mark field. The response formatting function — written months earlier, when we had four exchanges — constructed a new dictionary with explicitly listed fields. Mark price was not in the list. When we added exchanges five through nine, the fetchers dutifully collected mark prices. The response builder dutifully excluded them.

This meant we could not normalize open interest to USD in the response layer. The raw data existed one function call deeper, invisible to consumers and invisible to us until we needed it for a different purpose. A correct system, silently incomplete.

This class of bug — data collected but not surfaced — is endemic to aggregation systems. It does not fail loudly. It does not return errors. It produces a response that looks complete because the consumer does not know what is missing. It is only discovered when a new use case demands a field that was always available but never exposed. In our case, we discovered it while building the OI normalization that required mark prices to convert coin-denominated OI to USD.

5. The Stale Weight Problem

Before this morning, our funding rate composite used static weights reviewed quarterly:

EXCHANGE_WEIGHTS = {
    "binance": 0.30,   # 21.84% OI — capped
    "gate":    0.20,   # 11.36% OI
    "bybit":   0.18,   # 9.46% OI
    "okx":     0.10,   # 5.46% OI
    "bitget":  0.12,   # 6.02% OI
    "hyperliquid": 0.10  # 4.99% OI
}

These weights were derived from OI data collected on a specific date. They were accurate on that date. They silently drifted every day thereafter. Open interest shifts between exchanges as market conditions, fee promotions, regulatory changes, and liquidity incentives evolve. A weight that was correct in Q1 may be materially wrong in Q2. No alarm fires. The composite just gets quietly less representative.

The fix was to compute weights dynamically from live OI data — which we could only do correctly after solving the unit normalization problem from Section 3. The two problems were coupled: you cannot weight by OI if your OI numbers are in five different units. The stale weights were not laziness. They were a workaround for the unit problem. The unit problem persisted because mark prices were not in the response. The mark prices were not in the response because the response builder was written for four exchanges and never updated. Each problem nested inside the previous one.

Technical debt in aggregation systems does not accumulate linearly. It compounds.

6. This Is Not a Funding Rate Problem

Everything described above — the dialect differences, the unit mismatches, the invisible data, the stale weights — applies to any data aggregation task that pulls from multiple provider APIs.

Weather data. One provider reports temperature in Celsius, another in Fahrenheit. One includes a heat index, another does not. Wind speed in knots vs. meters per second vs. miles per hour. Station elevation in meters vs. feet. Precipitation in millimeters vs. inches. Time zones as UTC offsets vs. IANA identifiers vs. local time with no zone indicator.

Shipping and logistics. Container positions in decimal degrees vs. degrees-minutes-seconds. Vessel speed in knots vs. km/h. Draft in meters vs. feet. Port codes in LOCODE vs. proprietary identifiers. ETA as Unix timestamps vs. ISO 8601 vs. local port time.

Financial market data. Prices in different quote currencies. Volumes in base currency vs. quote currency vs. contracts. Timestamps in exchange-local time vs. UTC vs. Unix milliseconds vs. Unix seconds. Bid/ask depth in different bucket sizes. Trade IDs as integers vs. strings vs. UUIDs.

Economic indicators. Employment figures as raw counts vs. seasonally adjusted vs. percent change. GDP in nominal vs. real vs. PPP-adjusted terms. Rate decisions as basis points vs. percentage vs. absolute rate. Release dates in different fiscal calendar conventions.

The pattern is universal: multiple providers expose conceptually identical data through APIs that were designed for human developers reading documentation, not for machines performing automated aggregation. Each provider chose field names, units, update frequencies, and response structures that made sense for their product. None coordinated with the others. The result is that "the same data" from ten sources requires ten bespoke integration layers, ten unit conversion strategies, and ten ongoing maintenance commitments.

7. Why AI Does Not Solve This

Our thesis argues that AI-assisted development has compressed the build cost of data integrations by 5-10×. A competent developer with an AI coding agent can scaffold an exchange API client in minutes. This is true. We did it this morning.

What AI cannot do is anticipate the semantic traps.

An AI agent reading OKX's API documentation will find a field called oi on the open interest endpoint. The field contains a number. The number is positive. The agent will use it. The agent will not know — cannot know from the documentation alone — that this number is in contracts, that each contract represents 0.01 BTC, and that the correct field for cross-exchange USD comparison is oiUsd, the third of three fields, never mentioned in the funding rate docs, discoverable only by examining the raw response object and knowing to look for it.

The AI can build the integration. The AI cannot debug the silent 100× error that only manifests when the output is compared against independent data sources. That comparison is the maintenance burden. It requires domain knowledge that lives outside the documentation — in production incidents, in forum posts, in the accumulated institutional memory of teams that have been bitten before.

The build cost has compressed. The maintenance cost has not. And the maintenance cost is the dominant cost. It is continuous, cumulative, and proportional to the number of sources. Every new source adds not just its own integration, but new interaction effects with every existing source. Ten sources is not 10× the maintenance of one source. It is closer to 10² — because every normalization decision, every unit conversion, every field mapping must be consistent across all ten, and a change in any one can silently break the aggregate.

8. The Economic Argument for Shared Infrastructure

Consider the alternative. An agent needs the current aggregate funding rate across major exchanges. It has two options:

Option A: Build it

Integrate 10 exchange APIs

Write 10 fetchers. Normalize 4 different rate periods. Convert 5 different OI unit types. Handle 3 exchanges that don't expose OI. Detect when OKX changes their field schema. Monitor Kraken's absolute-to-relative rate conversion. Watch for Coinbase updating their funding period from 1 hour to something else. Maintain this forever.

Option B: Query an oracle

One endpoint. One schema. One number.

GET /oracle/funding/btc/usd — returns a normalized, OI-weighted, USD-denominated composite rate across 10 exchanges, with per-exchange breakdown, divergence metrics, and a cryptographic signature proving provenance. $0.05 per query.

Option A costs minutes to scaffold and months to maintain correctly. Option B costs five cents and is correct now, because the maintenance has already been done — this morning, by us, debugging OKX's $256 billion OI number so that no downstream consumer ever has to.

This is the economic structure described in the thesis: high fixed costs of aggregation and maintenance, near-zero marginal cost of serving additional consumers, non-rival consumption. The rational outcome is shared infrastructure. Not because building is hard — AI has made building cheap — but because the maintenance does not compress, and duplicating it across thousands of independent consumers is pure economic waste.

9. What We Shipped

This morning's session produced seven commits:

Coinbase INTX added as 10th exchange — hourly funding normalized to 8-hour equivalents, OI in USD natively.
Open interest normalized to USD across all 10 venues — coin-denominated venues multiplied by mark price, OKX switched to native oiUsd field, 30% per-venue cap.
Dynamic OI weighting replaced static quarterly weights — weights now computed from live data every 60 seconds.
Funding rate composite rewired from 6 independent fetchers to a single internal oracle call consuming all 10 venues.
Cross-venue funding dispersion added as a stress signal — the spread between the highest and lowest funding rate across venues, averaged across three asset pairs.
All machine-facing documentation updated — OpenAPI spec, agent discovery files, methodology document, eight files total.

Total BTC perpetual open interest across 10 venues, properly normalized: $18.3 billion.

Before normalization, the naive sum was over $260 billion — a number that looks plausible to a machine and absurd to a human. Machines do not have the luxury of recognizing absurdity. They compute on what they receive.

10. The Generalization

APIs are interfaces designed for human developers. They assume a human will read the documentation, understand the context, recognize the units, and build appropriate conversion logic. They are not normalized for machine consumption. They do not self-describe their units, their update frequencies, their edge cases, or their breaking changes.

As software systems increasingly consume data from multiple providers — and as AI agents accelerate the rate at which new integrations are attempted — the gap between "API available" and "data usable in aggregate" becomes the binding constraint. The data exists. It is accessible. It is not interoperable.

Attested state infrastructure closes this gap: aggregate the sources once, normalize the units once, maintain the integrations once, and serve the result to every consumer with a cryptographic proof that the methodology was followed. The aggregation cost is borne once. The maintenance cost is borne once. Every consumer after the first is marginal cost.

We spent a morning adding one exchange and fixing the normalization bugs that had accumulated across the previous nine. The work was not difficult. It was specific, tedious, undocumented, and invisible until it produced a number that was 100× wrong. That is the real cost of data aggregation — and it is why shared, attested infrastructure is not a convenience but an economic inevitability.

About Mycelia Signal

Sovereign cryptographic oracle delivering real-time signed data across 100+ endpoints. Crypto, FX, commodities, weather, marine, and proprietary indices. Pay-per-query via Lightning or USDC on Base. No API keys. No subscriptions.

myceliasignal.com Read the thesis → Documentation →