Skip to main content

Database Normalization Explained: 1NF, 2NF, 3NF, and BCNF with examples

·1114 words·6 mins
Author
Alessandro Ferrini

Database normalization is the process of organizing a relational schema so each fact is stored in the right place and data anomalies are reduced.

If you only remember one idea, remember this: normalization helps you avoid storing the same business fact in many rows where it can later become inconsistent.

This article follows Relational Data Modeling and prepares you for SQL Index Performance.

What is database normalization?
#

Normalization is a design technique used in relational databases to reduce:

  • duplicated data;
  • inconsistent updates;
  • insert problems;
  • delete problems;
  • hidden dependencies.

It does this by splitting badly structured tables into smaller, well-defined tables connected by keys.

Why normalization matters
#

A non-normalized schema often works at the beginning because it looks simple. But once the data grows, the same shortcut creates maintenance problems.

Normalization improves:

  • data consistency;
  • schema clarity;
  • maintainability;
  • integrity enforcement;
  • long-term evolution of the database.

The three classic anomalies
#

Insert anomaly
#

You cannot add one fact without inventing another.

Imagine a table that mixes instructors and courses:

instructor_namebirth_datecourse_code
Alessandro Ferrini1986-11-03DBMS-101
Donata Bini1992-12-09PYTH-206

If a new instructor has not been assigned a course yet, where do you store them?

Update anomaly
#

The same value appears in many rows and must be updated everywhere.

full_namehire_datedurationemail
Alessandro Ferrini2024-10-233im@alessandroferrini.com
Alessandro Ferrini2025-01-2020aferrini@example.com

Now the same person has two emails in the database.

Delete anomaly
#

Removing one row accidentally removes another important fact.

If the only row containing an instructor also contains their only course assignment, deleting that assignment may delete the instructor record entirely.

Key concepts before normal forms
#

Candidate key
#

A candidate key is any minimal set of attributes that uniquely identifies a row.

Primary key
#

The primary key is the candidate key chosen as the official row identifier.

Functional dependency
#

A functional dependency means that one attribute determines another.

If city -> region, then knowing the city is enough to know the region.

Partial dependency
#

A partial dependency exists when a non-key column depends on only part of a composite key.

Transitive dependency
#

A transitive dependency exists when a non-key column depends on another non-key column instead of depending directly on the primary key.

First Normal Form (1NF)
#

Rule
#

A table is in 1NF when:

  • each column stores atomic values;
  • there are no repeating groups;
  • each row can be uniquely identified.

Bad example
#

order_idcustomerproductsorder_date
1Alice RossiPen, Notebook2026-01-10
2Marco BianchiMouse2026-01-11

The products column stores a list, not a single value.

Better 1NF version
#

order_idcustomerproductorder_date
1Alice RossiPen2026-01-10
1Alice RossiNotebook2026-01-10
2Marco BianchiMouse2026-01-11

Now each field is atomic.

Second Normal Form (2NF)
#

Rule
#

A table is in 2NF when:

  • it is already in 1NF;
  • every non-key attribute depends on the whole primary key.

2NF matters mainly when the primary key is composite.

Bad example
#

Suppose the key is (product_id, sale_point_id):

product_idsale_point_idproduct_nameregionqty
110Caffe ArabicaLombardia300
111Caffe ArabicaPiemonte250
210Te VerdeLombardia200

Problems:

  • product_name depends only on product_id;
  • region depends only on sale_point_id.

That means the table contains partial dependencies.

Better 2NF version
#

products

product_idproduct_name
1Caffe Arabica
2Te Verde

sale_points

sale_point_idregion
10Lombardia
11Piemonte

sales

product_idsale_point_idqty
110300
111250
210200

Now each non-key attribute depends on the proper key.

Third Normal Form (3NF)
#

Rule
#

A table is in 3NF when:

  • it is already in 2NF;
  • it has no transitive dependencies.

Bad example
#

client_idclient_namecityregion
1Alice RossiPratoToscana
2Marco BianchiMilanoLombardia

If each city belongs to one region, then:

client_id -> city -> region

So region does not depend directly on client_id; it depends on city.

Better 3NF version
#

clients

client_idclient_namecity
1Alice RossiPrato
2Marco BianchiMilano

cities

cityregion
PratoToscana
MilanoLombardia

Now each table stores one type of fact.

Boyce-Codd Normal Form (BCNF)
#

Rule
#

A table is in BCNF when every determinant in a non-trivial functional dependency is a candidate key.

Example
#

professorroomcourse
Prof. BianchiAula MagnaDBMS-101
Prof. BianchiAula MagnaDBMS-102
Prof. RossiAula 2PYTH-206

If room -> professor, but room is not a candidate key for the table, the design violates BCNF.

Better BCNF version
#

rooms

roomprofessor
Aula MagnaProf. Bianchi
Aula 2Prof. Rossi

courses

courseroom
DBMS-101Aula Magna
DBMS-102Aula Magna
PYTH-206Aula 2

Do you always need 4NF and 5NF?
#

Usually no.

For most transactional systems, getting to 3NF or BCNF is already enough. Fourth and fifth normal forms are useful in more specialized cases involving multivalued dependencies or join dependencies, but they are far less common in everyday application design.

Normalization vs denormalization
#

Normalization is about correctness and maintainability.

Denormalization is a deliberate trade-off for performance or reporting convenience.

Examples of denormalization:

  • storing a precomputed total;
  • duplicating a descriptive field to avoid a join in a read-heavy system;
  • using summary tables for dashboards.

The key point is that denormalization should be intentional, documented, and measurable.

Common mistakes when studying normalization
#

Treating normalization as only theory
#

Normalization is not just for exams. It explains why some schemas become painful in production.

Splitting tables too early without understanding dependencies
#

You should split a table because of a dependency problem, not because “more tables feels more relational”.

Confusing 1NF with full normalization
#

Atomic columns are only the first step. A table can satisfy 1NF and still be poorly designed.

Ignoring query patterns completely
#

A normalized schema is usually the right starting point, but performance tuning still matters later. That is why indexing comes after modeling and normalization, not before.

FAQ
#

What is normalization in DBMS?
#

It is the process of structuring relational tables so that redundancy and data anomalies are reduced.

What are 1NF, 2NF, and 3NF?
#

They are successive normal forms:

  • 1NF removes repeating groups and non-atomic values;
  • 2NF removes partial dependencies;
  • 3NF removes transitive dependencies.

What is the difference between 3NF and BCNF?
#

BCNF is stricter. A table can satisfy 3NF but still violate BCNF when a determinant is not a candidate key.

Is normalization always good?
#

It is almost always the best design starting point. Later, if a proven performance issue appears, you can denormalize carefully.

Next step
#

After normalization, study SQL Index Performance to understand how a clean schema interacts with indexes, query plans, and real workload tuning.