Database Index Mechanics and Optimization - Query Acceleration Principles

22 min read | 2025.12.02

Many database performance issues can be solved with proper index design. However, indexes shouldn’t be added blindly—they require understanding of data structures and query patterns. This article systematically covers everything from index internals to practical optimization techniques.

What is an Index

Book Index Analogy

An index is like the index at the back of a book.

Search TypeMethodTime Complexity
Without indexScan every page from page 1O(n)
With indexLook up keyword in index → Get page numberO(log n)

Basic Index Operation

-- Without index: Full Table Scan
SELECT * FROM users WHERE email = 'user@example.com';
-- Scans all 1 million rows

-- With index: Index Scan
CREATE INDEX idx_users_email ON users(email);
SELECT * FROM users WHERE email = 'user@example.com';
-- Uses index for fast search

Index Types

1. B-Tree Index

The most common index structure, PostgreSQL’s default.

flowchart TB
    Root["[50]"] --> L["[25]"]
    Root --> R["[75]"]
    L --> LL["[10,20]"]
    L --> LR["[30,40]"]
    R --> RL["[60,70]"]
    R --> RR["[80,90]"]
    LL --> D1["Data pages"]
    LR --> D2["Data pages"]
    RL --> D3["Data pages"]
    RR --> D4["Data pages"]

Characteristics:

  • Equality search (=): O(log n)
  • Range search (<, >, BETWEEN): O(log n + m)
  • Access to sorted data: Efficient
-- Operations suited for B-Tree
SELECT * FROM orders WHERE created_at > '2025-12-01';  -- Range search
SELECT * FROM products ORDER BY price;                 -- Sorting
SELECT * FROM users WHERE id = 12345;                  -- Equality search

2. Hash Index

Index specialized for equality searches only.

Hash index structure:

hash('user@example.com') = 42

BucketValue
[0]NULL
[1]NULL
[42](user@example.com, row_pointer)
[99]NULL

Characteristics:

  • Equality search (=): O(1)
  • Range search: Not possible
  • Sorting: Not possible
-- PostgreSQL 10+ improved Hash indexes
CREATE INDEX idx_users_email_hash ON users USING hash(email);

-- Suitable operations
SELECT * FROM users WHERE email = 'user@example.com';  -- ✓
-- Unsuitable operations
SELECT * FROM users WHERE email LIKE 'user%';          -- ✗
SELECT * FROM users ORDER BY email;                    -- ✗

3. GIN (Generalized Inverted Index)

Index suited for arrays, JSONB, and full-text search.

-- GIN index on JSONB column
CREATE INDEX idx_products_metadata ON products USING gin(metadata);

-- Efficient search
SELECT * FROM products
WHERE metadata @> '{"category": "electronics"}';

-- Array column
CREATE INDEX idx_posts_tags ON posts USING gin(tags);
SELECT * FROM posts WHERE tags @> ARRAY['javascript', 'react'];

Index Type Comparison

IndexEqualityRangeSortUse Case
B-TreeGeneral purpose
HashEquality only
GINArray/JSONB/Full-text
GiSTGeospatial/Range
BRINLarge-scale/Time-series

Composite Indexes

Column Order Matters

In composite indexes, column order significantly affects query performance.

-- Composite index
CREATE INDEX idx_orders_user_status ON orders(user_id, status);

-- Queries that use this index:
SELECT * FROM orders WHERE user_id = 123;                    -- ✓
SELECT * FROM orders WHERE user_id = 123 AND status = 'paid'; -- ✓
SELECT * FROM orders WHERE user_id = 123 ORDER BY status;    -- ✓

-- Queries that don't use this index:
SELECT * FROM orders WHERE status = 'paid';  -- ✗ Leading column missing

Column Order Decision Criteria

-- Basic principle: Higher cardinality columns first

-- users table
-- user_id: 1 million types (high cardinality)
-- status: 3 types (low cardinality)
-- country: 200 types (medium cardinality)

-- Recommended: High cardinality → Low cardinality
CREATE INDEX idx_users_composite ON users(user_id, country, status);

-- However, also consider query patterns
-- If WHERE status = 'active' is frequent, status might go first

Covering Index (INCLUDE clause)

-- PostgreSQL 11+ INCLUDE clause
CREATE INDEX idx_orders_covering ON orders(user_id, status)
INCLUDE (total_amount, created_at);

-- Enables Index Only Scan
SELECT user_id, status, total_amount, created_at
FROM orders
WHERE user_id = 123 AND status = 'paid';

-- Completes with index only, no table access

Reading Execution Plans

EXPLAIN ANALYZE

EXPLAIN ANALYZE
SELECT * FROM orders WHERE user_id = 123 AND status = 'paid';

How to read execution plans:

Index Scan using idx_orders_user_status on orders
    (cost=0.43..8.45 rows=1 width=100)
    (actual time=0.025..0.026 rows=1 loops=1)
    Index Cond: ((user_id = 123) AND (status = 'paid'))
Planning Time: 0.150 ms
Execution Time: 0.050 ms
FieldDescription
cost=0.43..8.450.43: Cost to return first row (startup cost), 8.45: Total cost to return all rows
rows=1Estimated row count
width=100Estimated bytes per row
actual time=0.025..0.026Actual execution time (ms)
rows=1 (actual)Actual rows returned
loops=1Number of times operation executed

Scan Type Comparison

-- Seq Scan: Full table scan
Seq Scan on orders (cost=0.00..18584.00 rows=100000 width=100)
  Filter: (status = 'paid')

-- Index Scan: Uses index to access table
Index Scan using idx_orders_status on orders (cost=0.43..8.45 rows=1)
  Index Cond: (status = 'paid')

-- Index Only Scan: Completes with index only (fastest)
Index Only Scan using idx_orders_covering on orders (cost=0.43..4.45 rows=1)
  Index Cond: (status = 'paid')

-- Bitmap Index Scan: Combines multiple indexes
Bitmap Heap Scan on orders (cost=5.00..100.00 rows=100)
  -> Bitmap Index Scan on idx_orders_status (cost=0.00..4.50 rows=100)
     Index Cond: (status = 'paid')

Index Design Best Practices

1. Query Pattern Analysis

-- Check frequently executed queries with pg_stat_statements
SELECT query, calls, mean_time, total_time
FROM pg_stat_statements
ORDER BY total_time DESC
LIMIT 20;

-- Identify slow queries
SELECT query, mean_time, calls
FROM pg_stat_statements
WHERE mean_time > 100  -- Over 100ms
ORDER BY mean_time DESC;

2. Monitor Index Usage

-- Detect unused indexes
SELECT
    schemaname,
    relname AS table_name,
    indexrelname AS index_name,
    idx_scan AS times_used,
    pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM pg_stat_user_indexes
WHERE idx_scan = 0
ORDER BY pg_relation_size(indexrelid) DESC;

3. Partial Indexes

Index only specific data.

-- Index only active users
CREATE INDEX idx_users_active ON users(email)
WHERE status = 'active';

-- Recent data only
CREATE INDEX idx_orders_recent ON orders(created_at)
WHERE created_at > '2024-01-01';

-- Non-NULL data only
CREATE INDEX idx_products_sku ON products(sku)
WHERE sku IS NOT NULL;

4. Expression Indexes

Index computed results or function outputs.

-- Case-insensitive search
CREATE INDEX idx_users_email_lower ON users(lower(email));
SELECT * FROM users WHERE lower(email) = 'user@example.com';

-- Part of date
CREATE INDEX idx_orders_year_month ON orders(
    date_trunc('month', created_at)
);

-- Specific JSONB key
CREATE INDEX idx_users_settings_theme ON users((settings->>'theme'));
SELECT * FROM users WHERE settings->>'theme' = 'dark';

Index Anti-patterns

1. Excessive Indexes

-- Anti-pattern: Index on every column
CREATE INDEX idx_users_id ON users(id);           -- Already created by PK
CREATE INDEX idx_users_name ON users(name);       -- Really needed?
CREATE INDEX idx_users_email ON users(email);     -- Really needed?
CREATE INDEX idx_users_created ON users(created_at);
CREATE INDEX idx_users_updated ON users(updated_at);
CREATE INDEX idx_users_status ON users(status);

-- Problems:
-- - INSERT/UPDATE/DELETE become slower
-- - Wasted storage space
-- - Increased VACUUM processing load

2. Low Selectivity Indexes

-- Anti-pattern: Column with few distinct values
CREATE INDEX idx_users_gender ON users(gender);  -- Only 3 types: M/F/Other

-- Problems:
-- - Even with index, accesses many rows
-- - Seq Scan often more efficient

-- Solution: Combine with other columns
CREATE INDEX idx_users_gender_age ON users(gender, birth_date);

3. Functions in WHERE Clause

-- Anti-pattern: Index not used
SELECT * FROM users WHERE YEAR(created_at) = 2025;

-- Index-friendly rewrite
SELECT * FROM users
WHERE created_at >= '2025-12-01'
  AND created_at < '2026-01-01';

-- Or create expression index
CREATE INDEX idx_users_created_year ON users(EXTRACT(YEAR FROM created_at));

4. Type Mismatch

-- Anti-pattern: Implicit type conversion
-- user_id is INTEGER type
SELECT * FROM users WHERE user_id = '123';  -- String literal

-- Index-friendly version
SELECT * FROM users WHERE user_id = 123;    -- Integer literal

Summary

Database indexes can achieve dramatic performance improvements when designed and managed properly.

Design Principles

  1. Query patterns first: Design based on actual queries
  2. Composite indexes: Consider selectivity and query patterns for column order
  3. Partial indexes: Index only necessary data
  4. Avoid excessive indexes: Balance with write costs

Monitoring and Optimization

  1. EXPLAIN ANALYZE: Regular execution plan review
  2. pg_stat_statements: Identify slow queries
  3. Unused indexes: Regular inventory
  4. REINDEX: Rebuild bloated indexes

Indexes aren’t “design and forget”—continuous monitoring and optimization are essential.

← Back to list