Beyond Simple Key-Value Lookups
So far, we’ve explored indexes optimized for simple data types: integers, strings, dates. But modern applications store rich, complex data:
- JSON documents with nested fields
- Arrays of tags or categories
- Geographic coordinates for location-based queries
- Full-text documents for search
B+ Trees, LSM Trees, and Hash indexes struggle with this kind of data. Enter specialized indexes designed for these exact use cases.
GIN: Generalized Inverted Index
When you search Google for “database indexing”, Google doesn’t scan every webpage word-by-word. It uses an inverted index - a mapping from words to the documents containing them.
PostgreSQL’s GIN (Generalized Inverted Index) brings this same power to your database.
How Inverted Indexes Work
Instead of mapping Row → Data, an inverted index maps Value → List of Rows.
Example: Blog posts with tags
Post 1: ['databases', 'performance', 'indexing']Post 2: ['databases', 'sql']Post 3: ['performance', 'caching']Traditional B+ Tree: Would index the entire array as a single value (useless for finding posts by individual tag)
GIN Inverted Index:
'databases' → [Post 1, Post 2]'performance' → [Post 1, Post 3]'indexing' → [Post 1]'sql' → [Post 2]'caching' → [Post 3]Now you can instantly find all posts tagged with ‘databases’!
Tip
Query Speed: Finding all documents containing a specific word is O(1) with an inverted index, versus O(N) with a full table scan.
Use Cases for GIN
1. Full-Text Search
CREATE INDEX articles_content_gin ON articlesUSING GIN (to_tsvector('english', content));
SELECT * FROM articlesWHERE to_tsvector('english', content) @@ to_tsquery('database & indexing');2. JSONB Indexing
CREATE INDEX user_metadata_gin ON users USING GIN (metadata);
SELECT * FROM usersWHERE metadata @> '{"premium": true}';3. Array Containment
CREATE INDEX post_tags_gin ON posts USING GIN (tags);
SELECT * FROM posts WHERE tags @> ARRAY['databases'];GIN Trade-Offs
Pros:
- Extremely fast for containment queries (
@>,@@) - Supports complex data types (arrays, JSON, full-text)
- Compresses posting lists for space efficiency
Cons:
- Slower bulk inserts (must update many index entries per row)
- Larger index size than B+ Trees
- Not suitable for equality or range queries on simple types
Did You Know?
GIN uses posting list compression techniques like VarByte encoding to reduce index size by 50-70% compared to naive implementations.
Spatial Indexes: R-Trees and Geohashes
“Find all gas stations within 5km of my current location.”
This is a proximity query - fundamentally different from equality or range queries. Standard indexes fail because:
- Latitude/longitude are 2D coordinates (B+ Trees are 1D)
- “Closeness” doesn’t have a simple sort order
R-Trees (Rectangle Trees)
R-Trees organize spatial data using Minimum Bounding Rectangles (MBRs).
How It Works:
- Group nearby objects into rectangles
- Group these rectangles into larger rectangles
- Build a tree hierarchy of nested rectangles
- To search: traverse rectangles that overlap with your query region
Example: Finding coffee shops near you
CREATE INDEX locations_gist ON places USING GIST (location);
SELECT * FROM placesWHERE ST_DWithin( location, ST_MakePoint(-122.4194, 37.7749), -- San Francisco 5000 -- 5km radius in meters);Used By: PostGIS (PostgreSQL extension), Oracle Spatial, MySQL Spatial
Geohashes
Geohashes convert 2D coordinates into a single string by interleaving latitude and longitude bits.
Example:
(lat: 37.7749, lon: -122.4194) → "9q8yy"Key Property: Points that are geographically close have similar geohash prefixes!
San Francisco: 9q8yy9mfOakland (nearby): 9q9p1b9h (shares "9q" prefix)New York (far): dr5ru7p4 (completely different)This allows using a standard B+ Tree for proximity queries by searching for geohash prefixes.
Tip
Real-World Scale: Uber uses H3 (hexagonal hierarchical geospatial indexing system) to handle 100+ billion trips and location queries.
Spatial Index Trade-Offs
Pros:
- Enable fast proximity/containment queries
- Handle complex shapes (polygons, lines, not just points)
- Essential for GIS applications
Cons:
- More complex than B+ Trees
- Higher memory overhead for MBRs
- Update performance degrades with high overlap
Real-World Examples
PostgreSQL GIN for JSON
-- E-commerce: product catalog with varying attributesCREATE TABLE products ( id SERIAL PRIMARY KEY, name TEXT, attributes JSONB);
CREATE INDEX attributes_gin ON products USING GIN (attributes);
-- Find all laptops with 16GB RAMSELECT * FROM productsWHERE attributes @> '{"category": "laptop", "ram_gb": 16}';Result: Millisecond queries on millions of products with complex, schema-less attributes.
PostGIS for Delivery Routing
-- Find all delivery drivers within 10km of an orderSELECT driver_id, ST_Distance(location, order_location) AS distanceFROM driversWHERE ST_DWithin(location, order_location, 10000)ORDER BY distanceLIMIT 5;Used By: Uber, DoorDash, Lyft for real-time driver matching.
MongoDB Geospatial Queries
MongoDB includes built-in support for 2D and 2D sphere indexes:
db.places.createIndex({ location: "2dsphere" });
db.places.find({ location: { $near: { $geometry: { type: "Point", coordinates: [-122.4194, 37.7749] }, $maxDistance: 5000 } }});Comparison Table
| Index Type | Best For | Not Suitable For |
|---|---|---|
| B+ Tree | Equality, ranges, sorting | Complex types, containment |
| GIN | Full-text, JSON, arrays | Simple equality, ranges |
| R-Tree | Proximity, containment | Exact matches, 1D data |
| Hash | Exact equality | Everything else |
| LSM | Write-heavy workloads | Random point lookups |
When to Use Modern Indexes
Use GIN When:
- Searching within JSON documents
- Full-text search requirements
- Array containment queries
- Tag-based filtering
Use Spatial Indexes When:
- Location-based queries (nearby restaurants, drivers, etc.)
- GIS applications (mapping, routing)
- Geometry containment (is this point in this polygon?)
- Any 2D/3D proximity search
Warning
Don’t Over-Index: GIN and spatial indexes are larger and slower to maintain than B+ Trees. Only use them for queries that actually need their specialized capabilities.
Conclusion
Modern indexes expand databases beyond simple tables and rows. They enable:
- Searching unstructured data (JSON, documents)
- Building location-aware applications
- Implementing full-text search without external services
Understanding these specialized indexes helps you:
- Build schema-less features with JSONB + GIN
- Implement proximity search for ride-sharing or delivery apps
- Add full-text search without Elasticsearch
- Choose PostgreSQL over NoSQL for complex query needs
Series Conclusion
We’ve journeyed from the fundamentals of indexing through WAL, B+ Trees, LSM Trees, Hash indexes, and finally specialized modern indexes. Each has its place:
- B+ Trees: The reliable default for general OLTP
- WAL: The foundation ensuring durability
- LSM Trees: The write-optimized choice for ingestion
- Hash Indexes: The specialist for exact lookups
- GIN/Spatial: The specialists for complex data
Choose wisely, and your database will reward you with excellent performance!