Database Normalization Explained: 1NF, 2NF, 3NF, and BCNF with examples

Database normalization is the process of organizing a relational schema so each fact is stored in the right place and data anomalies are reduced.

If you only remember one idea, remember this: normalization helps you avoid storing the same business fact in many rows where it can later become inconsistent.

This article follows Relational Data Modeling and prepares you for SQL Index Performance.

What is database normalization?
#

Normalization is a design technique used in relational databases to reduce:

duplicated data;
inconsistent updates;
insert problems;
delete problems;
hidden dependencies.

It does this by splitting badly structured tables into smaller, well-defined tables connected by keys.

Why normalization matters
#

A non-normalized schema often works at the beginning because it looks simple. But once the data grows, the same shortcut creates maintenance problems.

Normalization improves:

data consistency;
schema clarity;
maintainability;
integrity enforcement;
long-term evolution of the database.

The three classic anomalies
#

Insert anomaly
#

You cannot add one fact without inventing another.

Imagine a table that mixes instructors and courses:

instructor_name	birth_date	course_code
Alessandro Ferrini	1986-11-03	DBMS-101
Donata Bini	1992-12-09	PYTH-206

If a new instructor has not been assigned a course yet, where do you store them?

Update anomaly
#

The same value appears in many rows and must be updated everywhere.

full_name	hire_date	duration	email
Alessandro Ferrini	2024-10-23	3	im@alessandroferrini.com
Alessandro Ferrini	2025-01-20	20	aferrini@example.com

Now the same person has two emails in the database.

Delete anomaly
#

Removing one row accidentally removes another important fact.

If the only row containing an instructor also contains their only course assignment, deleting that assignment may delete the instructor record entirely.

Key concepts before normal forms
#

Candidate key
#

A candidate key is any minimal set of attributes that uniquely identifies a row.

Primary key
#

The primary key is the candidate key chosen as the official row identifier.

Functional dependency
#

A functional dependency means that one attribute determines another.

If city -> region, then knowing the city is enough to know the region.

Partial dependency
#

A partial dependency exists when a non-key column depends on only part of a composite key.

Transitive dependency
#

A transitive dependency exists when a non-key column depends on another non-key column instead of depending directly on the primary key.

First Normal Form (1NF)
#

Rule
#

A table is in 1NF when:

each column stores atomic values;
there are no repeating groups;
each row can be uniquely identified.

Bad example
#

order_id	customer	products	order_date
1	Alice Rossi	Pen, Notebook	2026-01-10
2	Marco Bianchi	Mouse	2026-01-11

The products column stores a list, not a single value.

Better 1NF version
#

order_id	customer	product	order_date
1	Alice Rossi	Pen	2026-01-10
1	Alice Rossi	Notebook	2026-01-10
2	Marco Bianchi	Mouse	2026-01-11

Now each field is atomic.

Second Normal Form (2NF)
#

Rule
#

A table is in 2NF when:

it is already in 1NF;
every non-key attribute depends on the whole primary key.

2NF matters mainly when the primary key is composite.

Bad example
#

Suppose the key is (product_id, sale_point_id):

product_id	sale_point_id	product_name	region	qty
1	10	Caffe Arabica	Lombardia	300
1	11	Caffe Arabica	Piemonte	250
2	10	Te Verde	Lombardia	200

Problems:

product_name depends only on product_id;
region depends only on sale_point_id.

That means the table contains partial dependencies.

Better 2NF version
#

products

product_id	product_name
1	Caffe Arabica
2	Te Verde

sale_points

sale_point_id	region
10	Lombardia
11	Piemonte

sales

product_id	sale_point_id	qty
1	10	300
1	11	250
2	10	200

Now each non-key attribute depends on the proper key.

Third Normal Form (3NF)
#

Rule
#

A table is in 3NF when:

it is already in 2NF;
it has no transitive dependencies.

Bad example
#

client_id	client_name	city	region
1	Alice Rossi	Prato	Toscana
2	Marco Bianchi	Milano	Lombardia

If each city belongs to one region, then:

client_id -> city -> region

So region does not depend directly on client_id; it depends on city.

Better 3NF version
#

clients

client_id	client_name	city
1	Alice Rossi	Prato
2	Marco Bianchi	Milano

cities

city	region
Prato	Toscana
Milano	Lombardia

Now each table stores one type of fact.

Boyce-Codd Normal Form (BCNF)
#

Rule
#

A table is in BCNF when every determinant in a non-trivial functional dependency is a candidate key.

Example
#

professor	room	course
Prof. Bianchi	Aula Magna	DBMS-101
Prof. Bianchi	Aula Magna	DBMS-102
Prof. Rossi	Aula 2	PYTH-206

If room -> professor, but room is not a candidate key for the table, the design violates BCNF.

Better BCNF version
#

rooms

room	professor
Aula Magna	Prof. Bianchi
Aula 2	Prof. Rossi

courses

course	room
DBMS-101	Aula Magna
DBMS-102	Aula Magna
PYTH-206	Aula 2

Do you always need 4NF and 5NF?
#

Usually no.

For most transactional systems, getting to 3NF or BCNF is already enough. Fourth and fifth normal forms are useful in more specialized cases involving multivalued dependencies or join dependencies, but they are far less common in everyday application design.

Normalization vs denormalization
#

Normalization is about correctness and maintainability.

Denormalization is a deliberate trade-off for performance or reporting convenience.

Examples of denormalization:

storing a precomputed total;
duplicating a descriptive field to avoid a join in a read-heavy system;
using summary tables for dashboards.

The key point is that denormalization should be intentional, documented, and measurable.

Common mistakes when studying normalization
#

Treating normalization as only theory
#

Normalization is not just for exams. It explains why some schemas become painful in production.

Splitting tables too early without understanding dependencies
#

You should split a table because of a dependency problem, not because “more tables feels more relational”.

Confusing 1NF with full normalization
#

Atomic columns are only the first step. A table can satisfy 1NF and still be poorly designed.

Ignoring query patterns completely
#

A normalized schema is usually the right starting point, but performance tuning still matters later. That is why indexing comes after modeling and normalization, not before.

FAQ
#

What is normalization in DBMS?
#

It is the process of structuring relational tables so that redundancy and data anomalies are reduced.

What are 1NF, 2NF, and 3NF?
#

They are successive normal forms:

1NF removes repeating groups and non-atomic values;
2NF removes partial dependencies;
3NF removes transitive dependencies.

What is the difference between 3NF and BCNF?
#

BCNF is stricter. A table can satisfy 3NF but still violate BCNF when a determinant is not a candidate key.

Is normalization always good?
#

It is almost always the best design starting point. Later, if a proven performance issue appears, you can denormalize carefully.

Next step
#

After normalization, study SQL Index Performance to understand how a clean schema interacts with indexes, query plans, and real workload tuning.

What is database normalization?#

Why normalization matters#

The three classic anomalies#

Insert anomaly#

Update anomaly#

Delete anomaly#

Key concepts before normal forms#

Candidate key#

Primary key#

Functional dependency#

Partial dependency#

Transitive dependency#

First Normal Form (1NF)#

Rule#

Bad example#

Better 1NF version#

Second Normal Form (2NF)#

Rule#

Bad example#

Better 2NF version#

Third Normal Form (3NF)#

Rule#

Bad example#

Better 3NF version#

Boyce-Codd Normal Form (BCNF)#

Rule#

Example#

Better BCNF version#

Do you always need 4NF and 5NF?#

Normalization vs denormalization#

Common mistakes when studying normalization#

Treating normalization as only theory#

Splitting tables too early without understanding dependencies#

Confusing 1NF with full normalization#

Ignoring query patterns completely#

FAQ#

What is normalization in DBMS?#

What are 1NF, 2NF, and 3NF?#

What is the difference between 3NF and BCNF?#

Is normalization always good?#

Next step#

What is database normalization?
#

Why normalization matters
#

The three classic anomalies
#

Insert anomaly
#

Update anomaly
#

Delete anomaly
#

Key concepts before normal forms
#

Candidate key
#

Primary key
#

Functional dependency
#

Partial dependency
#

Transitive dependency
#

First Normal Form (1NF)
#

Rule
#

Bad example
#

Better 1NF version
#

Second Normal Form (2NF)
#

Rule
#

Bad example
#

Better 2NF version
#

Third Normal Form (3NF)
#

Rule
#

Bad example
#

Better 3NF version
#

Boyce-Codd Normal Form (BCNF)
#

Rule
#

Example
#

Better BCNF version
#

Do you always need 4NF and 5NF?
#

Normalization vs denormalization
#

Common mistakes when studying normalization
#

Treating normalization as only theory
#

Splitting tables too early without understanding dependencies
#

Confusing 1NF with full normalization
#

Ignoring query patterns completely
#

FAQ
#

What is normalization in DBMS?
#

What are 1NF, 2NF, and 3NF?
#

What is the difference between 3NF and BCNF?
#

Is normalization always good?
#

Next step
#