Alternatives to TrueTime & Real-World Impact

Beyond Spanner: Can You Build This?

The Reality: It’s Hard

Replicating TrueTime requires:

Hardware Investment:

$50,000+ per datacenter for atomic clocks
GPS receivers and antennas at every location
Redundant power and networking
Maintenance and monitoring

Expertise:

Deep understanding of time synchronization
Marzullo’s algorithm implementation
Fault-tolerant distributed systems
GPS and atomic clock failure modes

Scale: Only makes economic sense at Google’s scale (hundreds of datacenters)

Warning

For Mere Mortals: Most companies should use managed Spanner (Google Cloud Spanner) rather than trying to build TrueTime themselves. The infrastructure investment alone is prohibitive.

Alternatives Without TrueTime

If you can’t use TrueTime, consider these alternatives:

1. Hybrid Logical Clocks (HLC)

HLC combines physical time with logical counters to provide causal ordering:

1
# HLC combines physical time with logical counters
2
class HLC:
3
    def __init__(self):
4
        self.physical_time = 0
5
        self.logical_counter = 0
6

7
    def now(self):
8
        pt = wall_clock_time()
9
        if pt > self.physical_time:
10
            self.physical_time = pt
11
            self.logical_counter = 0
12
        else:
13
            self.logical_counter += 1
14
        return (self.physical_time, self.logical_counter)
15

16
    def update(self, remote_time):
17
        """Update on receiving message from another node."""
18
        remote_pt, remote_lc = remote_time
19
        local_pt = wall_clock_time()
20

21
        # Take max of all physical times
22
        self.physical_time = max(local_pt, remote_pt, self.physical_time)
23

24
        if self.physical_time == remote_pt and self.physical_time == self.physical_time:
25
            self.logical_counter = max(self.logical_counter, remote_lc) + 1
26
        elif self.physical_time == remote_pt:
27
            self.logical_counter = remote_lc + 1
28
        elif self.physical_time == self.physical_time:
29
            self.logical_counter = self.logical_counter + 1
30
        else:
31
            self.logical_counter = 0
32

33
# Example usage
34
node_a = HLC()
35
node_b = HLC()
36

37
# Node A creates event
38
event_a = node_a.now()  # (100, 0)
39

40
# Node B receives message from A
41
node_b.update(event_a)
42
event_b = node_b.now()  # (100, 1) - logical counter incremented
43

44
# Ordering: event_a < event_b (causal order preserved)

Pros:

Works with standard NTP (no atomic clocks needed)
Preserves causal ordering
Low overhead
Simple to implement

Cons:

Does NOT provide external consistency
Can’t order concurrent events definitively
Logical counters can drift from real time

Used by: CockroachDB, MongoDB (partially), Riak

Important

Key Difference: HLC provides causal consistency (if A causes B, then A < B), but NOT external consistency (if A completes before B starts in real time, then A < B). TrueTime provides external consistency.

2. Centralized Timestamp Oracle

Simple but effective: one server (or Raft group) assigns monotonic timestamps.

1
class TimestampOracle:
2
    """Centralized timestamp allocation."""
3

4
    def __init__(self):
5
        self.current = 0
6
        self.lock = threading.Lock()
7

8
    def get_timestamp(self):
9
        """Get next timestamp (atomic)."""
10
        with self.lock:
11
            self.current += 1
12
            return self.current
13

14
# Centralized oracle (or Raft consensus group)
15
oracle = TimestampOracle()
16

17
# Every transaction asks oracle for timestamp
18
tx1_ts = oracle.get_timestamp()  # 1
19
tx2_ts = oracle.get_timestamp()  # 2
20
# Guaranteed: tx1_ts < tx2_ts
21

22
# External consistency achieved!

Pros:

Simple to implement
Guarantees monotonic ordering
No special hardware needed
Provides external consistency

Cons:

Single point of failure (unless using Raft/Paxos)
Network hop to oracle for every transaction
Scalability bottleneck
Oracle becomes hotspot

Used by: Many traditional distributed databases, Percolator (Google’s older system)

Optimization: Timestamp Batching

1
class BatchedOracle:
2
    """Oracle that pre-allocates timestamp ranges."""
3

4
    def allocate_range(self, size=1000):
5
        """Allocate range [start, start+size]."""
6
        with self.lock:
7
            start = self.current
8
            self.current += size
9
            return (start, start + size)
10

11
# Each server requests range
12
server_range = oracle.allocate_range(1000)  # [1000, 2000]
13

14
# Server can assign 1000 timestamps locally without network call
15
# Reduces oracle load by 1000x!

3. Spanner as a Service (Recommended)

The pragmatic choice for most organizations:

Google Cloud Spanner:

Managed TrueTime (included)
Pay-per-use pricing
Global distribution
ACID guarantees
SQL interface

Pricing Example:

1
Workload: 10 nodes, 1TB data, 1M transactions/day
2

3
Costs:
4
  - Nodes: 10 × $0.90/hr × 730 hrs/month = $6,570/month
5
  - Storage: 1000 GB × $0.30/GB = $300/month
6
  - Network: ~$100/month
7

8
Total: ~$7,000/month = $84,000/year
9

10
Compare to building TrueTime: $1M+ upfront + $200k/year ops

Break-even: Building makes sense only if spending >$500k/year on Spanner

Tip

Best Option: For most companies, use Google Cloud Spanner and get TrueTime “for free” without the infrastructure burden.

4. CockroachDB: HLC + Causality

CockroachDB uses Hybrid Logical Clocks without TrueTime:

1
# CockroachDB approach
2
class CockroachTransaction:
3
    def __init__(self):
4
        self.hlc = HybridLogicalClock()
5
        self.timestamp = self.hlc.now()
6

7
    def commit(self):
8
        # No commit wait needed!
9
        # But: only causal consistency, not external
10
        storage.write(self.writes, timestamp=self.timestamp)
11
        return self.timestamp

Trade-off:

✅ No atomic clocks needed
✅ No commit wait latency
⚠️ Weaker consistency than Spanner
⚠️ Concurrent transactions may have ambiguous ordering

Best for: Applications that need strong causal consistency but not strict external consistency

Comparison Table

Approach	Consistency	Hardware	Latency	Complexity
TrueTime	External	Atomic clocks + GPS	+4-7ms	Very High
HLC	Causal	Standard NTP	None	Low
Timestamp Oracle	External	None	+1-5ms	Medium
Cloud Spanner	External	Managed	+4-7ms	None (managed)
CockroachDB	Causal	Standard NTP	None	Medium

Real-World Impact

Google’s Usage

Spanner powers critical Google services:

Gmail: Email storage and indexing
Google Play: App metadata and purchases
Google Photos: Photo metadata
AdWords: Financial transactions

Workload: Trillions of transactions per day across hundreds of datacenters

Production Statistics

Public data from Google Cloud Spanner:

Metric	Value
Availability	99.999% (5 nines)
Read Latency	1-2ms (single region)
Write Latency	5-10ms (single region)
Cross-region Write	100-200ms
Max Throughput	10,000+ QPS per node

The Academic Impact

The Spanner paper (2012) fundamentally changed how computer scientists think about distributed systems:

Before: “You can’t have strong consistency in a distributed system without sacrificing availability or performance”
After: “Strong consistency is possible with TrueTime, but it requires expensive infrastructure”

It proved that with sufficient investment in physical infrastructure, you can push the boundaries of what’s theoretically possible.

Citations: The Spanner paper has been cited 5,000+ times in academic literature.

Lessons from TrueTime

TrueTime is a masterclass in solving hard problems through engineering rather than algorithms. Instead of trying to make perfect time synchronization (impossible), Google:

Acknowledged uncertainty explicitly
Bounded the uncertainty through expensive hardware
Made the API expose uncertainty to applications
Built strong guarantees on top of bounded uncertainty

Key Takeaways

Time is hard: Distributed systems cannot perfectly agree on current time
Bounded uncertainty: Knowing “how wrong” your clock might be is as valuable as accuracy
Hardware matters: GPS + atomic clocks enable 1-7ms uncertainty bounds
Commit wait: Trading latency for consistency unlocks external consistency
Not for everyone: TrueTime requires Google-scale infrastructure investment

Tip - Further Study

Read the original Spanner paper (2012) for deeper technical details on TrueTime’s implementation and the mathematical proofs of external consistency.

Also recommended:

Conclusion

TrueTime represents a fundamental trade-off: expensive infrastructure for strong guarantees. For most organizations, the pragmatic approach is to use managed services like Google Cloud Spanner and benefit from TrueTime without the infrastructure burden.

Understanding TrueTime helps you:

Appreciate the complexity of distributed databases
Make informed decisions about consistency models
Recognize when to use managed Spanner vs. build your own
Understand the trade-offs between consistency and performance

The lesson: Sometimes the best solution to distributed systems problems isn’t a clever algorithm - it’s investing in the physical infrastructure to make the problem simpler.