Many database performance issues can be solved with proper index design. However, indexes shouldn’t be added blindly—they require understanding of data structures and query patterns. This article systematically covers everything from index internals to practical optimization techniques.
What is an Index
Book Index Analogy
An index is like the index at the back of a book.
| Search Type | Method | Time Complexity |
|---|---|---|
| Without index | Scan every page from page 1 | O(n) |
| With index | Look up keyword in index → Get page number | O(log n) |
Basic Index Operation
-- Without index: Full Table Scan
SELECT * FROM users WHERE email = 'user@example.com';
-- Scans all 1 million rows
-- With index: Index Scan
CREATE INDEX idx_users_email ON users(email);
SELECT * FROM users WHERE email = 'user@example.com';
-- Uses index for fast search
Index Types
1. B-Tree Index
The most common index structure, PostgreSQL’s default.
flowchart TB
Root["[50]"] --> L["[25]"]
Root --> R["[75]"]
L --> LL["[10,20]"]
L --> LR["[30,40]"]
R --> RL["[60,70]"]
R --> RR["[80,90]"]
LL --> D1["Data pages"]
LR --> D2["Data pages"]
RL --> D3["Data pages"]
RR --> D4["Data pages"]
Characteristics:
- Equality search (=): O(log n)
- Range search (<, >, BETWEEN): O(log n + m)
- Access to sorted data: Efficient
-- Operations suited for B-Tree
SELECT * FROM orders WHERE created_at > '2025-12-01'; -- Range search
SELECT * FROM products ORDER BY price; -- Sorting
SELECT * FROM users WHERE id = 12345; -- Equality search
2. Hash Index
Index specialized for equality searches only.
Hash index structure:
hash('user@example.com') = 42
| Bucket | Value |
|---|---|
| [0] | NULL |
| [1] | NULL |
| … | … |
| [42] | (user@example.com, row_pointer) |
| … | … |
| [99] | NULL |
Characteristics:
- Equality search (=): O(1)
- Range search: Not possible
- Sorting: Not possible
-- PostgreSQL 10+ improved Hash indexes
CREATE INDEX idx_users_email_hash ON users USING hash(email);
-- Suitable operations
SELECT * FROM users WHERE email = 'user@example.com'; -- ✓
-- Unsuitable operations
SELECT * FROM users WHERE email LIKE 'user%'; -- ✗
SELECT * FROM users ORDER BY email; -- ✗
3. GIN (Generalized Inverted Index)
Index suited for arrays, JSONB, and full-text search.
-- GIN index on JSONB column
CREATE INDEX idx_products_metadata ON products USING gin(metadata);
-- Efficient search
SELECT * FROM products
WHERE metadata @> '{"category": "electronics"}';
-- Array column
CREATE INDEX idx_posts_tags ON posts USING gin(tags);
SELECT * FROM posts WHERE tags @> ARRAY['javascript', 'react'];
Index Type Comparison
| Index | Equality | Range | Sort | Use Case |
|---|---|---|---|---|
| B-Tree | ◎ | ◎ | ◎ | General purpose |
| Hash | ◎ | ✗ | ✗ | Equality only |
| GIN | ◎ | △ | ✗ | Array/JSONB/Full-text |
| GiST | ○ | ◎ | ✗ | Geospatial/Range |
| BRIN | △ | ◎ | △ | Large-scale/Time-series |
Composite Indexes
Column Order Matters
In composite indexes, column order significantly affects query performance.
-- Composite index
CREATE INDEX idx_orders_user_status ON orders(user_id, status);
-- Queries that use this index:
SELECT * FROM orders WHERE user_id = 123; -- ✓
SELECT * FROM orders WHERE user_id = 123 AND status = 'paid'; -- ✓
SELECT * FROM orders WHERE user_id = 123 ORDER BY status; -- ✓
-- Queries that don't use this index:
SELECT * FROM orders WHERE status = 'paid'; -- ✗ Leading column missing
Column Order Decision Criteria
-- Basic principle: Higher cardinality columns first
-- users table
-- user_id: 1 million types (high cardinality)
-- status: 3 types (low cardinality)
-- country: 200 types (medium cardinality)
-- Recommended: High cardinality → Low cardinality
CREATE INDEX idx_users_composite ON users(user_id, country, status);
-- However, also consider query patterns
-- If WHERE status = 'active' is frequent, status might go first
Covering Index (INCLUDE clause)
-- PostgreSQL 11+ INCLUDE clause
CREATE INDEX idx_orders_covering ON orders(user_id, status)
INCLUDE (total_amount, created_at);
-- Enables Index Only Scan
SELECT user_id, status, total_amount, created_at
FROM orders
WHERE user_id = 123 AND status = 'paid';
-- Completes with index only, no table access
Reading Execution Plans
EXPLAIN ANALYZE
EXPLAIN ANALYZE
SELECT * FROM orders WHERE user_id = 123 AND status = 'paid';
How to read execution plans:
Index Scan using idx_orders_user_status on orders
(cost=0.43..8.45 rows=1 width=100)
(actual time=0.025..0.026 rows=1 loops=1)
Index Cond: ((user_id = 123) AND (status = 'paid'))
Planning Time: 0.150 ms
Execution Time: 0.050 ms
| Field | Description |
|---|---|
| cost=0.43..8.45 | 0.43: Cost to return first row (startup cost), 8.45: Total cost to return all rows |
| rows=1 | Estimated row count |
| width=100 | Estimated bytes per row |
| actual time=0.025..0.026 | Actual execution time (ms) |
| rows=1 (actual) | Actual rows returned |
| loops=1 | Number of times operation executed |
Scan Type Comparison
-- Seq Scan: Full table scan
Seq Scan on orders (cost=0.00..18584.00 rows=100000 width=100)
Filter: (status = 'paid')
-- Index Scan: Uses index to access table
Index Scan using idx_orders_status on orders (cost=0.43..8.45 rows=1)
Index Cond: (status = 'paid')
-- Index Only Scan: Completes with index only (fastest)
Index Only Scan using idx_orders_covering on orders (cost=0.43..4.45 rows=1)
Index Cond: (status = 'paid')
-- Bitmap Index Scan: Combines multiple indexes
Bitmap Heap Scan on orders (cost=5.00..100.00 rows=100)
-> Bitmap Index Scan on idx_orders_status (cost=0.00..4.50 rows=100)
Index Cond: (status = 'paid')
Index Design Best Practices
1. Query Pattern Analysis
-- Check frequently executed queries with pg_stat_statements
SELECT query, calls, mean_time, total_time
FROM pg_stat_statements
ORDER BY total_time DESC
LIMIT 20;
-- Identify slow queries
SELECT query, mean_time, calls
FROM pg_stat_statements
WHERE mean_time > 100 -- Over 100ms
ORDER BY mean_time DESC;
2. Monitor Index Usage
-- Detect unused indexes
SELECT
schemaname,
relname AS table_name,
indexrelname AS index_name,
idx_scan AS times_used,
pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM pg_stat_user_indexes
WHERE idx_scan = 0
ORDER BY pg_relation_size(indexrelid) DESC;
3. Partial Indexes
Index only specific data.
-- Index only active users
CREATE INDEX idx_users_active ON users(email)
WHERE status = 'active';
-- Recent data only
CREATE INDEX idx_orders_recent ON orders(created_at)
WHERE created_at > '2024-01-01';
-- Non-NULL data only
CREATE INDEX idx_products_sku ON products(sku)
WHERE sku IS NOT NULL;
4. Expression Indexes
Index computed results or function outputs.
-- Case-insensitive search
CREATE INDEX idx_users_email_lower ON users(lower(email));
SELECT * FROM users WHERE lower(email) = 'user@example.com';
-- Part of date
CREATE INDEX idx_orders_year_month ON orders(
date_trunc('month', created_at)
);
-- Specific JSONB key
CREATE INDEX idx_users_settings_theme ON users((settings->>'theme'));
SELECT * FROM users WHERE settings->>'theme' = 'dark';
Index Anti-patterns
1. Excessive Indexes
-- Anti-pattern: Index on every column
CREATE INDEX idx_users_id ON users(id); -- Already created by PK
CREATE INDEX idx_users_name ON users(name); -- Really needed?
CREATE INDEX idx_users_email ON users(email); -- Really needed?
CREATE INDEX idx_users_created ON users(created_at);
CREATE INDEX idx_users_updated ON users(updated_at);
CREATE INDEX idx_users_status ON users(status);
-- Problems:
-- - INSERT/UPDATE/DELETE become slower
-- - Wasted storage space
-- - Increased VACUUM processing load
2. Low Selectivity Indexes
-- Anti-pattern: Column with few distinct values
CREATE INDEX idx_users_gender ON users(gender); -- Only 3 types: M/F/Other
-- Problems:
-- - Even with index, accesses many rows
-- - Seq Scan often more efficient
-- Solution: Combine with other columns
CREATE INDEX idx_users_gender_age ON users(gender, birth_date);
3. Functions in WHERE Clause
-- Anti-pattern: Index not used
SELECT * FROM users WHERE YEAR(created_at) = 2025;
-- Index-friendly rewrite
SELECT * FROM users
WHERE created_at >= '2025-12-01'
AND created_at < '2026-01-01';
-- Or create expression index
CREATE INDEX idx_users_created_year ON users(EXTRACT(YEAR FROM created_at));
4. Type Mismatch
-- Anti-pattern: Implicit type conversion
-- user_id is INTEGER type
SELECT * FROM users WHERE user_id = '123'; -- String literal
-- Index-friendly version
SELECT * FROM users WHERE user_id = 123; -- Integer literal
Summary
Database indexes can achieve dramatic performance improvements when designed and managed properly.
Design Principles
- Query patterns first: Design based on actual queries
- Composite indexes: Consider selectivity and query patterns for column order
- Partial indexes: Index only necessary data
- Avoid excessive indexes: Balance with write costs
Monitoring and Optimization
- EXPLAIN ANALYZE: Regular execution plan review
- pg_stat_statements: Identify slow queries
- Unused indexes: Regular inventory
- REINDEX: Rebuild bloated indexes
Indexes aren’t “design and forget”—continuous monitoring and optimization are essential.