TrueTime Infrastructure: Atomic Clocks & GPS

How TrueTime Achieves Bounded Uncertainty

TrueTime’s tight uncertainty bounds (1-7ms) are achieved through a combination of expensive hardware and clever algorithms.

The Physical Infrastructure

Google deploys two independent time reference systems in each datacenter:

1. GPS Time Masters

GPS receivers with dedicated antennas on datacenter roofs
Receive time signals directly from GPS satellites
GPS provides ~100 nanosecond accuracy (when signal is clear)
Failure modes: Antenna failure, jamming, atmospheric interference

2. Atomic Clocks

Cesium or rubidium atomic clocks installed in datacenters
Extremely stable: drift ~1 microsecond per week
Independent of external signals
Failure modes: Hardware failure, gradual drift over months

Why Both?

GPS and atomic clocks have independent failure modes. If GPS signals are jammed, atomic clocks keep ticking. If an atomic clock fails, GPS provides backup. This redundancy is critical for achieving 99.9999% uptime.

Time Synchronization Architecture

1
                    GPS Satellites
2
                        ↓↓↓
3
            ┌───────────────────────┐
4
            │  GPS Time Masters     │
5
            │  (multiple per DC)    │
6
            └───────────┬───────────┘
7
                        │
8
            ┌───────────────────────┐
9
            │  Atomic Clocks        │
10
            │  (multiple per DC)    │
11
            └───────────┬───────────┘
12
                        │
13
              ╔═════════╧═════════╗
14
              ║   Time Masters    ║
15
              ║  (10-20 per DC)   ║
16
              ╚═════════╤═════════╝
17
                        │
18
        Synchronize via Marzullo's Algorithm
19
                        │
20
        ┌───────────────┼───────────────┐
21
        ↓               ↓               ↓
22
   [Server 1]      [Server 2]      [Server N]

Synchronization Process:

Every 30 seconds: Servers poll multiple Time Masters
Collect timestamps: Get time readings from ~10 Time Masters
Run Marzullo’s Algorithm: Detect and exclude faulty clocks (outliers)
Compute uncertainty: Calculate the uncertainty interval based on:
- Network round-trip time
- Last sync interval
- Local clock drift rate
- Discrepancies between Time Masters

The TrueTime Daemon

Here’s how each server maintains its TrueTime interval:

1
class TrueTimeDaemon:
2
    def __init__(self):
3
        self.local_clock_drift_rate = 200  # microseconds per second
4
        self.last_sync_time = 0
5
        self.last_sync_uncertainty = 0
6

7
    def poll_time_masters(self):
8
        """Synchronize with Time Masters using Marzullo's algorithm."""
9
        timestamps = []
10

11
        # Poll multiple Time Masters
12
        for master in TIME_MASTERS:
13
            t_send = local_time()
14
            response = master.get_time()
15
            t_receive = local_time()
16

17
            # Account for network delay
18
            rtt = t_receive - t_send
19
            master_time = response.timestamp
20
            uncertainty = response.uncertainty + (rtt / 2)
21

22
            timestamps.append((master_time, uncertainty))
23

24
        # Marzullo's algorithm: find the smallest interval
25
        # that overlaps with a majority of reported intervals
26
        agreed_time, agreed_uncertainty = marzullos_algorithm(timestamps)
27

28
        self.last_sync_time = agreed_time
29
        self.last_sync_uncertainty = agreed_uncertainty
30

31
    def now(self) -> TTinterval:
32
        """
33
        Compute current TrueTime interval.
34

35
        Uncertainty grows linearly since last sync due to clock drift.
36
        """
37
        time_since_sync = local_time() - self.last_sync_time
38
        drift_uncertainty = time_since_sync * self.local_clock_drift_rate
39

40
        total_uncertainty = self.last_sync_uncertainty + drift_uncertainty
41

42
        now = local_time()
43
        return TTinterval(
44
            earliest=now - total_uncertainty,
45
            latest=now + total_uncertainty
46
        )

Result: Across Google’s network, the uncertainty ε is typically:

Average: 4ms
Best case: 1ms (right after sync)
Worst case: 7ms (just before next sync)

Marzullo’s Algorithm: Detecting Faulty Clocks

Marzullo’s algorithm is the secret sauce that allows TrueTime to tolerate faulty time sources.

The Problem

When you poll 10 Time Masters, you get 10 different time readings:

1
Master 1: [100.0, 100.2]
2
Master 2: [100.1, 100.3]
3
Master 3: [100.0, 100.2]
4
Master 4: [99.5, 99.7]    ← Clearly wrong!
5
Master 5: [100.1, 100.3]
6
...

How do you determine which masters are correct and which are faulty?

The Solution

Marzullo’s algorithm finds the smallest interval that overlaps with a majority of reported intervals:

1
def marzullos_algorithm(intervals):
2
    """
3
    Find the smallest interval overlapping with majority of inputs.
4

5
    Returns: (agreed_time, uncertainty)
6
    """
7
    events = []
8

9
    # Create events for interval starts and ends
10
    for earliest, latest in intervals:
11
        events.append((earliest, +1))  # Interval starts
12
        events.append((latest, -1))    # Interval ends
13

14
    # Sort events by time
15
    events.sort()
16

17
    best_count = 0
18
    current_count = 0
19
    best_interval_start = None
20

21
    for time, delta in events:
22
        current_count += delta
23

24
        if current_count > best_count:
25
            best_count = current_count
26
            best_interval_start = time
27

28
    # Find where the best interval ends
29
    best_interval_end = None
30
    current_count = 0
31

32
    for time, delta in events:
33
        current_count += delta
34
        if current_count == best_count:
35
            best_interval_end = time
36
            break
37

38
    # Return the midpoint and uncertainty
39
    agreed_time = (best_interval_start + best_interval_end) / 2
40
    uncertainty = (best_interval_end - best_interval_start) / 2
41

42
    return agreed_time, uncertainty

Example Execution:

1
Input intervals:
2
  Master 1: [100.0, 100.4]  —————————
3
  Master 2: [100.2, 100.6]        —————————
4
  Master 3: [100.1, 100.5]      —————————
5
  Master 4: [99.5,  99.9]   ————          (outlier!)
6
  Master 5: [100.3, 100.7]          —————————
7

8
Overlap Analysis:
9
          99.5  100.0  100.3  100.4  100.7
10
           |      |      |      |      |
11
Count:     1      2      4      3      1
12
                        ↑
13
                  Maximum overlap!
14

15
Result: [100.2, 100.4] with 4 masters agreeing
16
Uncertainty: ±0.1ms

Key Properties:

Fault tolerance: Can tolerate up to ⌊n/2⌋ faulty masters
Optimal: Finds the smallest interval with maximum agreement
Fast: O(n log n) due to sorting
Byzantine-resistant: Works even if faulty clocks lie arbitrarily

Important

Redundancy Budget: Google deploys 10-20 Time Masters per datacenter specifically to tolerate failures. Even if half fail, TrueTime continues working with bounded uncertainty.

How Uncertainty Grows Between Syncs

Between synchronizations, uncertainty grows due to clock drift:

1
Uncertainty over time:
2

3
ε (ms)
4
  7 |                              /
5
    |                            /
6
  6 |                          /
7
    |                        /
8
  5 |                      /
9
    |                    /
10
  4 |                  /        ← Average ε
11
    |                /
12
  3 |              /
13
    |            /
14
  2 |          /
15
    |        /
16
  1 |______/
17
    |
18
    +-----|-----|-----|-----|-----|----> Time
19
        Sync  5s   10s  15s  20s  25s  30s
20
                                        Next Sync

Formula: ε(t) = ε₀ + (drift_rate × time_since_sync)

ε₀ = uncertainty immediately after sync (~1ms)
drift_rate = local clock drift (~200 μs/second)
time_since_sync = seconds since last sync (max 30s)

Maximum uncertainty: 1ms + (200 μs/s × 30s) = 1ms + 6ms = 7ms

This is why Google synchronizes every 30 seconds - to keep ε bounded!

The Cost of Infrastructure

Replicating TrueTime requires significant investment:

Hardware Costs (per datacenter)

Component	Cost	Quantity	Total
Atomic clocks	$50,000	2-4	$100k-$200k
GPS receivers	$5,000	4-6	$20k-$30k
GPS antennas	$1,000	4-6	$4k-$6k
Time Master servers	$10,000	10-20	$100k-$200k
Redundant networking	$10,000	-	$10k-$50k
Total per DC			$234k-$486k

For Google’s ~100 datacenters: $23M-$48M in hardware alone

Operational Costs

Maintenance: Atomic clocks require specialized technicians
Monitoring: 24/7 monitoring of time drift and failures
Replacement: Components fail and need replacement
Expertise: Deep knowledge of time synchronization protocols

Annual operational cost: $5M-$10M

Warning

Why Google Can Do This: The cost of deploying atomic clocks and GPS receivers in every datacenter is significant. Most companies cannot afford this infrastructure, which is why TrueTime-style systems remain rare.

Failure Scenarios and Handling

TrueTime is designed to handle various failure modes gracefully:

1. GPS Signal Loss

1
Scenario: Construction blocks GPS antenna
2
Action: Fall back to atomic clocks
3
Impact: Slight uncertainty increase (~2-3ms)
4
Duration: Until GPS restored

2. Atomic Clock Failure

1
Scenario: Atomic clock hardware failure
2
Action: Use remaining atomic clocks + GPS
3
Impact: Minimal (redundancy)
4
Duration: Until replacement installed

3. Network Partition

1
Scenario: Time Master unreachable
2
Action: Marzullo's algorithm excludes it
3
Impact: Minimal if majority reachable
4
Duration: Until network restored

4. Catastrophic: All Time Masters Fail

1
Scenario: Datacenter-wide failure
2
Action: ε grows unbounded, system halts writes
3
Impact: SEVERE - no new transactions
4
Duration: Until manual intervention

The Final Safety: If uncertainty exceeds a threshold (~100ms), Spanner stops accepting writes rather than violating external consistency. Availability is sacrificed to maintain correctness.

Tip - Next Steps

Now that you understand how TrueTime achieves bounded uncertainty, let’s see how Google Spanner uses it to provide external consistency across datacenters.

Summary

TrueTime relies on GPS + atomic clocks for redundancy
Marzullo’s algorithm detects and excludes faulty time sources
Uncertainty is typically 1-7ms, growing between syncs
Infrastructure costs $25M-$50M for Google’s scale
Graceful degradation: System tolerates multiple failures
Safety over availability: Halts writes if uncertainty unbounded

The physical infrastructure is what makes TrueTime possible. Without atomic clocks and GPS, the tight uncertainty bounds simply cannot be achieved.