← Back to Databases

Database Indexing: How It Works & Why It Matters for Performance

Database indexing creates a separate data structure that maps column values to row locations, allowing the database engine to find data without scanning every row. Without indexes, a query searching 10 million records might take seconds; with the right index, that same query executes in milliseconds.

What Is Database Indexing?

Think of a database index like a book's index. Instead of reading every page to find mentions of "performance," you flip to the index, find the page numbers, and jump directly there. Database indexes work the same way—they're auxiliary data structures that store sorted copies of selected columns, paired with pointers to the actual row data.

When you create an index on a column, the database maintains a separate lookup table. Every time you insert, update, or delete data, the index gets updated too. This costs write performance but dramatically improves read performance for queries that filter, join, or sort on indexed columns.

Most databases use B-tree indexes by default because they're balanced, sorted, and extremely efficient for range queries. However, other index types exist for specialized use cases—hash indexes for equality checks, bitmap indexes for low-cardinality columns, and full-text indexes for searching text content.

How B-Tree Indexes Work Internally

A B-tree (balanced tree) is a self-balancing structure where data is organized in nodes. Each node can hold multiple keys and pointers to child nodes. The database navigates down the tree by comparing your search value against keys in each node, eliminating half the remaining data at each step.

Here's why this matters: searching through a sorted array of 1 million items takes roughly 20 comparisons using binary search. A B-tree with the same data takes about 3-4 disk reads. Since disk I/O is the slowest operation in databases, minimizing disk reads is critical for performance.

B-trees stay balanced automatically. When a node gets too full, it splits. When nodes have too few entries, they merge. This prevents the tree from becoming lopsided, guaranteeing O(log n) lookup time regardless of data size or insertion order.

-- Example: Creating a B-tree index (default in most databases)
CREATE INDEX idx_user_email ON users(email);

-- The database creates a structure like:
--        [M]
--       /   \
--    [A-L]  [N-Z]
--    /  |  \  /  |  \
--   ... leaf nodes with pointers to rows ...

Common Index Types and When to Use Them

Primary Key Index

Created automatically when you define a primary key. It enforces uniqueness and is the fastest way to retrieve a single row by ID. Most databases use a clustered index for the primary key, meaning the actual data is stored in the primary key's B-tree order.

Unique Index

Similar to a primary key but allows NULL values (depending on the database). Enforces uniqueness while enabling fast lookups. Useful for columns like email addresses or usernames where duplicates aren't allowed.

CREATE UNIQUE INDEX idx_user_username ON users(username);

Composite Index

An index on multiple columns. Useful when your queries frequently filter on two or more columns together. The order matters—the leftmost columns are the most selective, so put the most restrictive conditions first.

-- Good for queries: WHERE country = 'US' AND state = 'CA'
CREATE INDEX idx_location ON users(country, state);

-- Bad for queries: WHERE state = 'CA' (doesn't use first column efficiently)

Full-Text Index

Optimized for searching within text content. It tokenizes text into words, stores them in an inverted index, and supports fuzzy matching and relevance ranking. Essential for search features.

CREATE FULLTEXT INDEX idx_article_content ON articles(title, body);

Partial/Filtered Index

An index on only a subset of rows matching a condition. Saves storage and update overhead when you frequently query on a specific condition, like active users or non-deleted records.

-- PostgreSQL example: only index active users
CREATE INDEX idx_active_users ON users(email) WHERE status = 'active';

The Write-Read Tradeoff

Every index you create slows down INSERT, UPDATE, and DELETE operations because the database must update the index alongside the main table. On a table with 5 indexes, a single row insertion might require 6 separate write operations.

However, if that index accelerates 100 queries per second by 50ms each, you're saving 5 seconds of combined query time while only adding microseconds to occasional writes. The math almost always favors read optimization in production systems.

The real cost emerges with over-indexing. A table with 20 indexes wastes memory, slows down writes unnecessarily, and confuses the query optimizer. Most well-designed tables have 3-8 indexes.

Index Performance Monitoring and Optimization

You can't optimize what you don't measure. Most databases provide tools to identify missing or unused indexes.

Finding Slow Queries

Enable the slow query log to capture queries exceeding a threshold (e.g., 100ms). These are your optimization targets.

-- MySQL: Enable slow query log
SET GLOBAL slow_query_log = 'ON';
SET GLOBAL long_query_time = 0.1;

-- PostgreSQL: Enable query logging
SET log_min_duration_statement = 100;

Analyzing Query Plans

Use EXPLAIN or EXPLAIN ANALYZE to see how the database executes a query. Look for "sequential scans" on large tables—that's a sign a missing index could help.

-- PostgreSQL example
EXPLAIN ANALYZE SELECT * FROM users WHERE email = '[email protected]';

-- Output shows: Seq Scan on users (cost=0.00..35.50)
-- Add index: CREATE INDEX idx_email ON users(email);
-- Rerun EXPLAIN: Index Scan using idx_email (cost=0.28..8.30)

Removing Unused Indexes

Databases track index usage. Remove indexes that haven't been accessed recently—they're just wasting space and slowing writes.

-- PostgreSQL: Find unused indexes
SELECT schemaname, tablename, indexname
FROM pg_stat_user_indexes
WHERE idx_scan = 0
ORDER BY idx_blks_read DESC;

Indexing Best Practices

Index foreign keys: Joins are faster when both sides are indexed.
Index WHERE clauses: If a query filters on a column, index it.
Index JOIN columns: Columns used in ON clauses benefit from indexes.
Consider column selectivity: Indexing a column with only 2 unique values (like gender) wastes resources. Aim for high-cardinality columns.
Order composite indexes wisely: Put the most restrictive columns first to eliminate more rows early.
Avoid over-indexing: More indexes don't always mean better performance. Test and measure.
Monitor index fragmentation: Over time, indexes become fragmented. Rebuild or reorganize them periodically.

Real-World Example: E-Commerce Database

Imagine an e-commerce platform with a `orders` table (100 million rows). Without indexes:

Finding orders by user ID: 2-5 seconds (full table scan)
Finding orders by status: 3-7 seconds
Joining with users table: 10-30 seconds

With strategic indexes:

CREATE INDEX idx_user_id ON orders(user_id);
CREATE INDEX idx_status ON orders(status);
CREATE INDEX idx_created_at ON orders(created_at);
CREATE INDEX idx_user_status ON orders(user_id, status);

Now the same queries execute in 5-50 milliseconds. The write overhead is negligible compared to the read performance gain. This is why indexing is often the first optimization step for slow databases.

When NOT to Create an Index

Indexes aren't always the answer. Don't index:

Low-cardinality columns: Gender, boolean flags, status with only a few values. A full table scan might be faster.
Columns that are rarely queried: Dead weight on write performance.
Highly volatile tables: If you're constantly deleting and reinserting, index maintenance overhead crushes benefits.
Small tables: Below a few thousand rows, sequential scans are often faster than index lookups.

Always test. Create an index, run your queries with and without it, and measure actual performance. Numbers don't lie.

Indexing in Different Databases

While B-tree indexes are universal, each database system has unique features and syntax. PostgreSQL supports partial indexes and expression indexes. MySQL allows covering indexes. SQL Server has filtered indexes and columnstore indexes for analytics. Understanding your specific database's indexing capabilities helps you optimize more effectively.

For more on database architecture fundamentals, check out our guide on query optimization strategies and our article on database normalization. These concepts work together with indexing to build high-performance systems.

Frequently Asked Questions

Q: Will adding an index always improve query performance?

Not always. If the query optimizer determines that a sequential scan is faster than an index lookup (common with small result sets or low-cardinality data), it might ignore the index. Additionally, poorly designed indexes or indexes on columns with low selectivity can actually slow things down. Always verify with EXPLAIN and benchmark.

Q: How many indexes should a table have?

There's no universal number, but most production tables have 3-8 indexes. The right number depends on read vs. write ratio, query patterns, and data size. Start with indexes on primary keys, foreign keys, and frequently filtered columns. Add more if profiling shows they help. Remove any unused indexes.

Q: Can indexes help with ORDER BY and GROUP BY?

Yes. If you sort or group by an indexed column, the database can traverse the index in order without a separate sort operation, saving significant time. Composite indexes are especially useful here—an index on (status, created_at) helps queries that filter on status and sort by created_at.

Q: What's the difference between clustered and non-clustered indexes?

A clustered index determines the physical order of data rows. Most tables have one clustered index (usually the primary key). Non-clustered indexes are separate structures that point back to the main table. SQL Server and some other databases distinguish between these; PostgreSQL doesn't explicitly—it uses different terminology. The concept remains: clustered indexes store actual data while non-clustered indexes store pointers.