GuidesDatabase Normalization

Database Normalization

Learn database design principles, normal forms (1NF, 2NF, 3NF), and how to structure relational data efficiently.

10 minute read Advanced

Why Normalize Databases?

Database normalization is the process of organizing data to minimize redundancy and dependency. It divides large tables into smaller ones and defines relationships between them.

Benefits of Normalization

  • Eliminates redundancy: Store each piece of data once
  • Ensures consistency: Updates happen in one place
  • Reduces storage: Less duplicate data
  • Easier maintenance: Clear structure, less confusion
  • Better data integrity: Fewer anomalies

Problems with Unnormalized Data

Unnormalized Orders Table
OrderID CustomerName CustomerEmail Products Prices
1 John Doe [email protected] Laptop, Mouse $999, $25
2 John Doe [email protected] Keyboard $75

Problems:
- Customer data duplicated (update anomaly)
- Multiple products in one cell (violates atomicity)
- Hard to query individual products

Anomalies

Update Anomaly: Change customer email in multiple rows
Insert Anomaly: Can't add customer without an order
Delete Anomaly: Deleting order loses customer data

Keys Explained

Primary Key

Uniquely identifies each row in a table. Cannot be NULL.

CREATE TABLE Users (
    user_id INT PRIMARY KEY,
    username VARCHAR(50) UNIQUE NOT NULL
);

Foreign Key

Links tables together. References primary key in another table.

CREATE TABLE Orders (
    order_id INT PRIMARY KEY,
    user_id INT,
    FOREIGN KEY (user_id) REFERENCES Users(user_id)
);

Composite Key

Primary key made of multiple columns:

CREATE TABLE Enrollments (
    student_id INT,
    course_id INT,
    PRIMARY KEY (student_id, course_id)
);

First Normal Form (1NF)

Rule: Each column must contain atomic (indivisible) values, and each row must be unique.

Before 1NF (Violates Atomicity)

StudentID Name Courses
1 Alice Math, Physics, Chemistry

After 1NF (Atomic Values)

StudentID Name Course
1 Alice Math
1 Alice Physics
1 Alice Chemistry

Requirements for 1NF:

  • Each column contains single values (no lists, no arrays)
  • All entries in a column are the same type
  • Each row is unique (has a primary key)
  • Order of rows doesn't matter

Second Normal Form (2NF)

Rule: Must be in 1NF AND all non-key columns must depend on the ENTIRE primary key (no partial dependencies).

Only applies to tables with composite primary keys.

Before 2NF (Partial Dependency)

StudentID CourseID StudentName CourseName
1 101 Alice Math
1 102 Alice Physics

Problem: StudentName depends only on StudentID, not on (StudentID, CourseID)

After 2NF (Split Into Three Tables)

Students Table:

StudentID StudentName
1 Alice

Courses Table:

CourseID CourseName
101 Math
102 Physics

Enrollments Table:

StudentID CourseID
1 101
1 102

Third Normal Form (3NF)

Rule: Must be in 2NF AND no transitive dependencies (non-key columns depend only on primary key, not on other non-key columns).

Before 3NF (Transitive Dependency)

EmployeeID Name DepartmentID DepartmentName
1 John 5 Sales
2 Jane 5 Sales

Problem: DepartmentName depends on DepartmentID, not directly on EmployeeID

After 3NF (Remove Transitive Dependency)

Employees Table:

EmployeeID Name DepartmentID
1 John 5
2 Jane 5

Departments Table:

DepartmentID DepartmentName
5 Sales
Normal Form Checklist

1NF: Atomic values, unique rows
2NF: 1NF + no partial dependencies
3NF: 2NF + no transitive dependencies
Most applications stop at 3NF (good balance)

When to Denormalize

Sometimes, intentionally violating normalization improves performance:

Reasons to Denormalize

  • Read-heavy applications: Reduce JOINs for faster queries
  • Reporting databases: Pre-aggregate data
  • Caching layers: Store computed values
  • NoSQL databases: Optimized for document storage
Denormalization Example

Add order_total column to Orders table instead of calculating from OrderItems each time:

-- Normalized (requires JOIN and SUM)
SELECT o.order_id, SUM(oi.price * oi.quantity) as total
FROM Orders o
JOIN OrderItems oi ON o.order_id = oi.order_id
GROUP BY o.order_id;

-- Denormalized (direct access)
SELECT order_id, order_total FROM Orders;

Trade-offs

Normalized Denormalized
Storage Less space More space
Consistency Easier Harder
Read Speed Slower (JOINs) Faster
Write Speed Faster Slower (multiple updates)

Practical Guidelines

For OLTP (Transactional) Systems

  • Normalize to 3NF for data integrity
  • Use foreign keys to enforce relationships
  • Index frequently queried columns
  • Denormalize only for proven performance issues

For OLAP (Analytics) Systems

  • Star schema (fact tables + dimension tables)
  • Denormalize for query performance
  • Use materialized views for aggregations
  • ETL processes handle data consistency

Common Patterns

One-to-Many:

Users (user_id, name)
Orders (order_id, user_id, order_date)
-- One user has many orders

Many-to-Many (requires junction table):

Students (student_id, name)
Courses (course_id, name)
Enrollments (student_id, course_id, enrolled_date)
-- Many students in many courses

One-to-One (rare):

Users (user_id, username)
UserProfiles (user_id, bio, avatar)
-- Each user has exactly one profile
Beyond 3NF

BCNF (Boyce-Codd NF): Stricter version of 3NF
4NF: Eliminates multi-valued dependencies
5NF: Eliminates join dependencies
Most projects don't need beyond 3NF

Database normalization is about finding the right balance between data integrity and performance. Start normalized, denormalize when measurements prove it's necessary.