From Raw Data to Insights

The collector generates millions of raw timing records. Turning them into something a browser can visualize requires a multi-stage pipeline.

Stage 1: Export to Parquet

The archiver stores data in DuckDB and exports it as Parquet files:

File	Size	Contents
`timings.parquet`	~840 MB (22 shards)	One row per (message, peer) — the core timing data
`messages.parquet`	~10 MB	Message metadata: hash, type, timestamp, payload
`metadata.parquet`	~5 MB	Peer metadata: pubkey, alias, addresses

A Python script (preprocess.py) transforms the raw data into visualization-ready JSON:

Arrival percentiles — For each message, rank peers by arrival time. A peer’s avg_arrival_pct across all messages determines its radial position in the visualization.
First-responder scores — Peers that consistently deliver messages before others get high scores. These are candidates for being topologically close to message originators.
Message selection — From ~416,000 total messages, we select ~181 “interesting” ones: messages received by at least 50 peers, deduplicated, with clear propagation patterns.
Community assignment — Peers are grouped into communities using a combination of:
- Known hubs: ~15 manually identified pubkeys (major nodes like ACINQ, Bitfinex, River)
- Alias matching: Nodes with “LNT” in their alias are grouped together
- Unknown: The remaining ~970 of 978 peers fall into the catch-all “unknown” community

The pipeline produces 7 JSON files that the frontend loads directly: