Logo
TrueTime Infrastructure: Atomic Clocks & GPS

TrueTime Infrastructure: Atomic Clocks & GPS

Nov 21, 2025
7 min read

How TrueTime Achieves Bounded Uncertainty

TrueTime’s tight uncertainty bounds (1-7ms) are achieved through a combination of expensive hardware and clever algorithms.

The Physical Infrastructure

Google deploys two independent time reference systems in each datacenter:

1. GPS Time Masters

  • GPS receivers with dedicated antennas on datacenter roofs
  • Receive time signals directly from GPS satellites
  • GPS provides ~100 nanosecond accuracy (when signal is clear)
  • Failure modes: Antenna failure, jamming, atmospheric interference

2. Atomic Clocks

  • Cesium or rubidium atomic clocks installed in datacenters
  • Extremely stable: drift ~1 microsecond per week
  • Independent of external signals
  • Failure modes: Hardware failure, gradual drift over months
Why Both?

GPS and atomic clocks have independent failure modes. If GPS signals are jammed, atomic clocks keep ticking. If an atomic clock fails, GPS provides backup. This redundancy is critical for achieving 99.9999% uptime.

Time Synchronization Architecture

GPS Satellites
↓↓↓
┌───────────────────────┐
│ GPS Time Masters │
│ (multiple per DC) │
└───────────┬───────────┘
┌───────────────────────┐
│ Atomic Clocks │
│ (multiple per DC) │
└───────────┬───────────┘
╔═════════╧═════════╗
║ Time Masters ║
║ (10-20 per DC) ║
╚═════════╤═════════╝
Synchronize via Marzullo's Algorithm
┌───────────────┼───────────────┐
↓ ↓ ↓
[Server 1] [Server 2] [Server N]

Synchronization Process:

  1. Every 30 seconds: Servers poll multiple Time Masters
  2. Collect timestamps: Get time readings from ~10 Time Masters
  3. Run Marzullo’s Algorithm: Detect and exclude faulty clocks (outliers)
  4. Compute uncertainty: Calculate the uncertainty interval based on:
    • Network round-trip time
    • Last sync interval
    • Local clock drift rate
    • Discrepancies between Time Masters

The TrueTime Daemon

Here’s how each server maintains its TrueTime interval:

class TrueTimeDaemon:
def __init__(self):
self.local_clock_drift_rate = 200 # microseconds per second
self.last_sync_time = 0
self.last_sync_uncertainty = 0
def poll_time_masters(self):
"""Synchronize with Time Masters using Marzullo's algorithm."""
timestamps = []
# Poll multiple Time Masters
for master in TIME_MASTERS:
t_send = local_time()
response = master.get_time()
t_receive = local_time()
# Account for network delay
rtt = t_receive - t_send
master_time = response.timestamp
uncertainty = response.uncertainty + (rtt / 2)
timestamps.append((master_time, uncertainty))
# Marzullo's algorithm: find the smallest interval
# that overlaps with a majority of reported intervals
agreed_time, agreed_uncertainty = marzullos_algorithm(timestamps)
self.last_sync_time = agreed_time
self.last_sync_uncertainty = agreed_uncertainty
def now(self) -> TTinterval:
"""
Compute current TrueTime interval.
Uncertainty grows linearly since last sync due to clock drift.
"""
time_since_sync = local_time() - self.last_sync_time
drift_uncertainty = time_since_sync * self.local_clock_drift_rate
total_uncertainty = self.last_sync_uncertainty + drift_uncertainty
now = local_time()
return TTinterval(
earliest=now - total_uncertainty,
latest=now + total_uncertainty
)

Result: Across Google’s network, the uncertainty ε is typically:

  • Average: 4ms
  • Best case: 1ms (right after sync)
  • Worst case: 7ms (just before next sync)

Marzullo’s Algorithm: Detecting Faulty Clocks

Marzullo’s algorithm is the secret sauce that allows TrueTime to tolerate faulty time sources.

The Problem

When you poll 10 Time Masters, you get 10 different time readings:

Master 1: [100.0, 100.2]
Master 2: [100.1, 100.3]
Master 3: [100.0, 100.2]
Master 4: [99.5, 99.7] ← Clearly wrong!
Master 5: [100.1, 100.3]
...

How do you determine which masters are correct and which are faulty?

The Solution

Marzullo’s algorithm finds the smallest interval that overlaps with a majority of reported intervals:

def marzullos_algorithm(intervals):
"""
Find the smallest interval overlapping with majority of inputs.
Returns: (agreed_time, uncertainty)
"""
events = []
# Create events for interval starts and ends
for earliest, latest in intervals:
events.append((earliest, +1)) # Interval starts
events.append((latest, -1)) # Interval ends
# Sort events by time
events.sort()
best_count = 0
current_count = 0
best_interval_start = None
for time, delta in events:
current_count += delta
if current_count > best_count:
best_count = current_count
best_interval_start = time
# Find where the best interval ends
best_interval_end = None
current_count = 0
for time, delta in events:
current_count += delta
if current_count == best_count:
best_interval_end = time
break
# Return the midpoint and uncertainty
agreed_time = (best_interval_start + best_interval_end) / 2
uncertainty = (best_interval_end - best_interval_start) / 2
return agreed_time, uncertainty

Example Execution:

Input intervals:
Master 1: [100.0, 100.4] —————————
Master 2: [100.2, 100.6] —————————
Master 3: [100.1, 100.5] —————————
Master 4: [99.5, 99.9] ———— (outlier!)
Master 5: [100.3, 100.7] —————————
Overlap Analysis:
99.5 100.0 100.3 100.4 100.7
| | | | |
Count: 1 2 4 3 1
Maximum overlap!
Result: [100.2, 100.4] with 4 masters agreeing
Uncertainty: ±0.1ms

Key Properties:

  • Fault tolerance: Can tolerate up to ⌊n/2⌋ faulty masters
  • Optimal: Finds the smallest interval with maximum agreement
  • Fast: O(n log n) due to sorting
  • Byzantine-resistant: Works even if faulty clocks lie arbitrarily
Important

Redundancy Budget: Google deploys 10-20 Time Masters per datacenter specifically to tolerate failures. Even if half fail, TrueTime continues working with bounded uncertainty.

How Uncertainty Grows Between Syncs

Between synchronizations, uncertainty grows due to clock drift:

Uncertainty over time:
ε (ms)
7 | /
| /
6 | /
| /
5 | /
| /
4 | / ← Average ε
| /
3 | /
| /
2 | /
| /
1 |______/
|
+-----|-----|-----|-----|-----|----> Time
Sync 5s 10s 15s 20s 25s 30s
Next Sync

Formula: ε(t) = ε₀ + (drift_rate × time_since_sync)

  • ε₀ = uncertainty immediately after sync (~1ms)
  • drift_rate = local clock drift (~200 μs/second)
  • time_since_sync = seconds since last sync (max 30s)

Maximum uncertainty: 1ms + (200 μs/s × 30s) = 1ms + 6ms = 7ms

This is why Google synchronizes every 30 seconds - to keep ε bounded!

The Cost of Infrastructure

Replicating TrueTime requires significant investment:

Hardware Costs (per datacenter)

ComponentCostQuantityTotal
Atomic clocks$50,0002-4$100k-$200k
GPS receivers$5,0004-6$20k-$30k
GPS antennas$1,0004-6$4k-$6k
Time Master servers$10,00010-20$100k-$200k
Redundant networking$10,000-$10k-$50k
Total per DC$234k-$486k

For Google’s ~100 datacenters: $23M-$48M in hardware alone

Operational Costs

  • Maintenance: Atomic clocks require specialized technicians
  • Monitoring: 24/7 monitoring of time drift and failures
  • Replacement: Components fail and need replacement
  • Expertise: Deep knowledge of time synchronization protocols

Annual operational cost: $5M-$10M

Warning

Why Google Can Do This: The cost of deploying atomic clocks and GPS receivers in every datacenter is significant. Most companies cannot afford this infrastructure, which is why TrueTime-style systems remain rare.

Failure Scenarios and Handling

TrueTime is designed to handle various failure modes gracefully:

1. GPS Signal Loss

Scenario: Construction blocks GPS antenna
Action: Fall back to atomic clocks
Impact: Slight uncertainty increase (~2-3ms)
Duration: Until GPS restored

2. Atomic Clock Failure

Scenario: Atomic clock hardware failure
Action: Use remaining atomic clocks + GPS
Impact: Minimal (redundancy)
Duration: Until replacement installed

3. Network Partition

Scenario: Time Master unreachable
Action: Marzullo's algorithm excludes it
Impact: Minimal if majority reachable
Duration: Until network restored

4. Catastrophic: All Time Masters Fail

Scenario: Datacenter-wide failure
Action: ε grows unbounded, system halts writes
Impact: SEVERE - no new transactions
Duration: Until manual intervention

The Final Safety: If uncertainty exceeds a threshold (~100ms), Spanner stops accepting writes rather than violating external consistency. Availability is sacrificed to maintain correctness.

Tip - Next Steps

Now that you understand how TrueTime achieves bounded uncertainty, let’s see how Google Spanner uses it to provide external consistency across datacenters.

Summary

  • TrueTime relies on GPS + atomic clocks for redundancy
  • Marzullo’s algorithm detects and excludes faulty time sources
  • Uncertainty is typically 1-7ms, growing between syncs
  • Infrastructure costs $25M-$50M for Google’s scale
  • Graceful degradation: System tolerates multiple failures
  • Safety over availability: Halts writes if uncertainty unbounded

The physical infrastructure is what makes TrueTime possible. Without atomic clocks and GPS, the tight uncertainty bounds simply cannot be achieved.