Modern Indexing: GIN and Spatial Indexes

Beyond Simple Key-Value Lookups

So far, we’ve explored indexes optimized for simple data types: integers, strings, dates. But modern applications store rich, complex data:

JSON documents with nested fields
Arrays of tags or categories
Geographic coordinates for location-based queries
Full-text documents for search

B+ Trees, LSM Trees, and Hash indexes struggle with this kind of data. Enter specialized indexes designed for these exact use cases.

GIN: Generalized Inverted Index

When you search Google for “database indexing”, Google doesn’t scan every webpage word-by-word. It uses an inverted index - a mapping from words to the documents containing them.

PostgreSQL’s GIN (Generalized Inverted Index) brings this same power to your database.

How Inverted Indexes Work

Instead of mapping Row → Data, an inverted index maps Value → List of Rows.

Example: Blog posts with tags

1
Post 1: ['databases', 'performance', 'indexing']
2
Post 2: ['databases', 'sql']
3
Post 3: ['performance', 'caching']

Traditional B+ Tree: Would index the entire array as a single value (useless for finding posts by individual tag)

GIN Inverted Index:

1
'databases'   → [Post 1, Post 2]
2
'performance' → [Post 1, Post 3]
3
'indexing'    → [Post 1]
4
'sql'         → [Post 2]
5
'caching'     → [Post 3]

Now you can instantly find all posts tagged with ‘databases’!

Tip

Query Speed: Finding all documents containing a specific word is O(1) with an inverted index, versus O(N) with a full table scan.

Use Cases for GIN

1. Full-Text Search

1
CREATE INDEX articles_content_gin ON articles
2
USING GIN (to_tsvector('english', content));
3

4
SELECT * FROM articles
5
WHERE to_tsvector('english', content) @@ to_tsquery('database & indexing');

2. JSONB Indexing

1
CREATE INDEX user_metadata_gin ON users USING GIN (metadata);
2

3
SELECT * FROM users
4
WHERE metadata @> '{"premium": true}';

3. Array Containment

1
CREATE INDEX post_tags_gin ON posts USING GIN (tags);
2

3
SELECT * FROM posts WHERE tags @> ARRAY['databases'];

GIN Trade-Offs

Pros:

Extremely fast for containment queries (@>, @@)
Supports complex data types (arrays, JSON, full-text)
Compresses posting lists for space efficiency

Cons:

Slower bulk inserts (must update many index entries per row)
Larger index size than B+ Trees
Not suitable for equality or range queries on simple types

Did You Know?

GIN uses posting list compression techniques like VarByte encoding to reduce index size by 50-70% compared to naive implementations.

Spatial Indexes: R-Trees and Geohashes

“Find all gas stations within 5km of my current location.”

This is a proximity query - fundamentally different from equality or range queries. Standard indexes fail because:

Latitude/longitude are 2D coordinates (B+ Trees are 1D)
“Closeness” doesn’t have a simple sort order

R-Trees (Rectangle Trees)

R-Trees organize spatial data using Minimum Bounding Rectangles (MBRs).

How It Works:

Group nearby objects into rectangles
Group these rectangles into larger rectangles
Build a tree hierarchy of nested rectangles
To search: traverse rectangles that overlap with your query region

Example: Finding coffee shops near you

1
CREATE INDEX locations_gist ON places USING GIST (location);
2

3
SELECT * FROM places
4
WHERE ST_DWithin(
5
    location,
6
    ST_MakePoint(-122.4194, 37.7749),  -- San Francisco
7
    5000  -- 5km radius in meters
8
);

Used By: PostGIS (PostgreSQL extension), Oracle Spatial, MySQL Spatial

Geohashes

Geohashes convert 2D coordinates into a single string by interleaving latitude and longitude bits.

Example:

1
(lat: 37.7749, lon: -122.4194) → "9q8yy"

Key Property: Points that are geographically close have similar geohash prefixes!

1
San Francisco:     9q8yy9mf
2
Oakland (nearby):  9q9p1b9h  (shares "9q" prefix)
3
New York (far):    dr5ru7p4  (completely different)

This allows using a standard B+ Tree for proximity queries by searching for geohash prefixes.

Tip

Real-World Scale: Uber uses H3 (hexagonal hierarchical geospatial indexing system) to handle 100+ billion trips and location queries.

Spatial Index Trade-Offs

Pros:

Enable fast proximity/containment queries
Handle complex shapes (polygons, lines, not just points)
Essential for GIS applications

Cons:

More complex than B+ Trees
Higher memory overhead for MBRs
Update performance degrades with high overlap

Real-World Examples

PostgreSQL GIN for JSON

1
-- E-commerce: product catalog with varying attributes
2
CREATE TABLE products (
3
    id SERIAL PRIMARY KEY,
4
    name TEXT,
5
    attributes JSONB
6
);
7

8
CREATE INDEX attributes_gin ON products USING GIN (attributes);
9

10
-- Find all laptops with 16GB RAM
11
SELECT * FROM products
12
WHERE attributes @> '{"category": "laptop", "ram_gb": 16}';

Result: Millisecond queries on millions of products with complex, schema-less attributes.

PostGIS for Delivery Routing

1
-- Find all delivery drivers within 10km of an order
2
SELECT driver_id, ST_Distance(location, order_location) AS distance
3
FROM drivers
4
WHERE ST_DWithin(location, order_location, 10000)
5
ORDER BY distance
6
LIMIT 5;

Used By: Uber, DoorDash, Lyft for real-time driver matching.

MongoDB Geospatial Queries

MongoDB includes built-in support for 2D and 2D sphere indexes:

1
db.places.createIndex({ location: "2dsphere" });
2

3
db.places.find({
4
  location: {
5
    $near: {
6
      $geometry: { type: "Point", coordinates: [-122.4194, 37.7749] },
7
      $maxDistance: 5000
8
    }
9
  }
10
});

Comparison Table

Index Type	Best For	Not Suitable For
B+ Tree	Equality, ranges, sorting	Complex types, containment
GIN	Full-text, JSON, arrays	Simple equality, ranges
R-Tree	Proximity, containment	Exact matches, 1D data
Hash	Exact equality	Everything else
LSM	Write-heavy workloads	Random point lookups

When to Use Modern Indexes

Use GIN When:

Searching within JSON documents
Full-text search requirements
Array containment queries
Tag-based filtering

Use Spatial Indexes When:

Location-based queries (nearby restaurants, drivers, etc.)
GIS applications (mapping, routing)
Geometry containment (is this point in this polygon?)
Any 2D/3D proximity search

Warning

Don’t Over-Index: GIN and spatial indexes are larger and slower to maintain than B+ Trees. Only use them for queries that actually need their specialized capabilities.

Conclusion

Modern indexes expand databases beyond simple tables and rows. They enable:

Searching unstructured data (JSON, documents)
Building location-aware applications
Implementing full-text search without external services

Understanding these specialized indexes helps you:

Build schema-less features with JSONB + GIN
Implement proximity search for ride-sharing or delivery apps
Add full-text search without Elasticsearch
Choose PostgreSQL over NoSQL for complex query needs

Series Conclusion

We’ve journeyed from the fundamentals of indexing through WAL, B+ Trees, LSM Trees, Hash indexes, and finally specialized modern indexes. Each has its place:

B+ Trees: The reliable default for general OLTP
WAL: The foundation ensuring durability
LSM Trees: The write-optimized choice for ingestion
Hash Indexes: The specialist for exact lookups
GIN/Spatial: The specialists for complex data

Choose wisely, and your database will reward you with excellent performance!