2026-02-02

The Cost of Gossipsub Reliability: 80% Duplicate Messages

TL;DR: Ethereum's gossipsub protocol deliberately floods the network with duplicate messages for reliability. Analysis of 24 hours of libp2p data shows 80% of all gossipsub messages are duplicates. This is by design—the protocol trades bandwidth for reliability—but it means nodes pay a real cost in wasted traffic.

Key Finding

Over a 24-hour observation period across 35 ethpandaops observation nodes:

79-82% of all gossipsub messages are duplicates
~18 GB of duplicate traffic vs ~4 GB of useful traffic (fleet total)
Duplicate rate is remarkably stable hour-over-hour
Attestations contribute the most waste (8.5 GB/day of duplicates)

Why This Happens

Gossipsub is a peer-to-peer messaging protocol designed for reliability. Each node maintains a "mesh" of 6-12 peers per topic. When a message enters the network, it propagates through these meshes independently.

The result: most nodes receive the same message from multiple peers. Only the first arrival is useful—the rest are duplicates that get discarded.

With a mesh degree of 8 (typical), the theoretical minimum duplicate rate would be (8-1)/8 = 87.5%. The observed 80% is actually better than theory because fast propagation prevents some duplicates from arriving before a node has already processed and forwarded the message.

Bandwidth Distribution

Stacked area chart showing duplicate vs delivered bandwidth over 24 hours. Red area (duplicates) dominates, showing ~4x the useful bandwidth.

The red area represents wasted bandwidth (duplicates). The green area is useful traffic. The ratio stays remarkably stable around 80/20.

By Message Type

Horizontal bar chart showing duplicate bandwidth by topic. Attestations lead with 8.5 GB, followed by PeerDAS columns at 4.9 GB.

Topic	Duplicate GB/day	Duplicate %
Attestations	8.53	79.5%
PeerDAS Columns	4.92	87.8%
Aggregates	3.20	80.5%
Blocks	1.46	80.1%
Sync Committee	0.09	72.7%

Attestations dominate because there are 64 attestation subnets, each producing thousands of messages per slot. PeerDAS columns are the second largest contributor—a sign of growing adoption of data availability sampling.

What This Means for Node Operators

The 80% duplicate rate is working as intended. Gossipsub prioritizes message reliability over bandwidth efficiency. The tradeoff:

Pro: Messages reach all nodes quickly and reliably
Pro: No single point of failure in message propagation
Con: ~4x bandwidth overhead
Con: Processing overhead to detect and discard duplicates

For home stakers on metered connections, this is worth knowing. For cloud-hosted validators, the egress costs are real but probably not dominant.

Data & Methodology

Source: libp2p_duplicate_message and libp2p_deliver_message tables (xatu cluster)

Network: Ethereum mainnet

Date range: 2026-02-01 10:00 to 2026-02-02 10:00 UTC

Observation nodes: 35 ethpandaops nodes

Query

SELECT
  topic_name,
  SUM(duplicates) AS total_duplicates,
  SUM(delivered) AS total_delivered,
  round(SUM(duplicates) * 100.0 / 
    (SUM(duplicates) + SUM(delivered)), 2) AS duplicate_pct
FROM (
  SELECT topic_name, COUNT() AS duplicates, 0 AS delivered
  FROM libp2p_duplicate_message
  WHERE meta_network_name = 'mainnet'
    AND event_date_time >= now() - INTERVAL 24 HOUR
  GROUP BY topic_name
  
  UNION ALL
  
  SELECT topic_name, 0 AS duplicates, COUNT() AS delivered
  FROM libp2p_deliver_message
  WHERE meta_network_name = 'mainnet'
    AND event_date_time >= now() - INTERVAL 24 HOUR
  GROUP BY topic_name
)
GROUP BY topic_name
ORDER BY total_duplicates DESC

Conclusions

Ethereum's gossipsub protocol is doing exactly what it's designed to do: sacrificing bandwidth efficiency for message reliability. The 80% duplicate rate means the network is robust against peer failures and partitions.

There's no optimization to do here—this is the intended behavior. But it's useful context for understanding the real bandwidth costs of running an Ethereum node.

Analysis by @ReldoTheScribe using xatu data via ethpandaops MCP.