Skip to main content

Relational Normalization

·1388 words·7 mins
Author
Alessandro Ferrini

This article builds on the foundation in Introduction to Databases. It explains the relational mindset, why normalization keeps the schema clean, and how the model prepares you for the Relational Data Modeling workflow and SQL Index Performance lessons that follow.

Relational data models and the DBMS role
#

Relational modeling turns business concepts into rows, columns, and constraints so a Database Management System can enforce integrity, manage storage, and present consistent interfaces. During analysis you name entities, describe attributes, and define relationships so the DBMS can guard foreign keys, unique constraints, and cardinality rules. This clarity keeps joins predictable, safeguards migrations, and lets the optimizer reason about queries-see how relational algebra operators surface in relational algebra operations.

Why normalization matters
#

Normalization reorganizes tables so each fact lives in exactly one place. That minimizes redundancy, prevents contradictory rows, and makes the dataset easier to understand. Clean tables also reduce storage costs and produce simpler joins, which helps query planners and downstream developers alike.

Insert anomalies
#

When you try to add a new instructor without enrolling them in a course, you either duplicate irrelevant data or introduce NULL placeholders.

full_namedate_of_birthcourse_code
Alessandro Ferrini1986-11-03DBMS-101
Donata Bini1992-12-09PYTH-206
Fabrizio Moneta2005-04-11PYTH-206

Splitting personal details from enrollment records lets you add the instructor before their first assignment appears.

Update anomalies
#

When the same person appears in multiple rows, updating an email requires touching all of them.

full_namehire_datedurationemail
Alessandro Ferrini2024-10-233im@alessandroferrini.com
Donata Bini2024-11-0912donata.bini@example.com
Fabrizio Moneta2024-07-115f.moneta@example.com
Alessandro Ferrini2025-01-2020aferrini@example.com

Normalization centralizes personal information, so a single update keeps every row aligned.

Deletion anomalies
#

Deleting the DBMS-101 row currently removes Alessandro’s profile entirely, even if he teaches other courses.

full_namedate_of_birthcourse_code
Alessandro Ferrini1986-11-03DBMS-101
Donata Bini1992-12-09PYTH-206
Fabrizio Moneta2005-04-11PYTH-206

Separating enrollments from instructor data makes it safe to delete a course record without losing the person.

Benefits and normal forms
#

Normalization delivers smaller tables with a single source of truth. That results in lower redundancy, stronger data integrity, easier maintenance, and faster planning because the optimizer can reason about focused structures.

Normal forms are the steps that deliver this benefit. 1NF requires atomic values, 2NF removes partial dependencies, 3NF removes transitive dependencies, and BCNF or higher guards more subtle cases such as overlapping candidate keys or multivalued dependencies.

Key definitions
#

  • Candidate key: The minimal attribute set that uniquely identifies each row (for example, codice_fiscale or email on PERSONE).
  • Primary key: The chosen candidate key that serves as the table’s official identifier.
  • Prime attribute: An attribute that participates in at least one candidate key.
  • Non-prime attribute: An attribute that does not participate in any candidate key and therefore should depend on prime attributes.
  • Functional dependency: A -> B means that identical values for A guarantee identical values for B (for example, city -> region in a Comuni table).

Normalization - First Normal Form (1NF)
#

1NF insists that every attribute holds atomic data, forbidding repeating groups and lists, and that each row is identifiable via a primary key. When a field stores several values, constraints, indexes, and predictable joins break down.

order_idclientproductsorder_date
1Alessandro FerriniProdottoA, ProdottoB10-01-2024
2Donata BiniProdottoC11-01-2024
3Fabrizio MonetaProdottoA, ProdottoD12-01-2024

Enforcing 1NF reshapes the table so each row contains a single product entry but repeats the order_id and client values:

order_idclientproductorder_date
1Alessandro FerriniProdottoA10-01-2024
1Alessandro FerriniProdottoB10-01-2024
2Donata BiniProdottoC11-01-2024
3Fabrizio MonetaProdottoA12-01-2024
3Fabrizio MonetaProdottoD12-01-2024

Now the products attribute is atomic and the primary key (for example, (order_id, product)) distinguishes each row without ambiguity.

Normalization - Second Normal Form (2NF)
#

Starting from 1NF, 2NF removes partial dependencies. When the primary key spans multiple columns, every non-key attribute must depend on the whole key rather than just part of it. If an attribute depends on a portion of a composite key, the table suffers from redundancy and update anomalies.

Imagine a fact table whose key is (product_id, sale_point_id) while keeping product and sale point metadata in the same row:

product_idsale_pointproduct_nameregionqtymonth
110Caffe ArabicaLombardia30001-2024
111Caffe ArabicaPiemonte25001-2024
210Te VerdeLombardia20001-2024
211Te VerdePiemonte18001-2024

Here product_name depends only on product_id and region depends only on sale_point_id, so the non-key attributes do not require the full composite key. The cure is to separate the metadata: create a Products table keyed by product_id with the product name, and a SalePoints table keyed by sale_point_id with the region. The fact table then references those tables with foreign keys and retains only the measurable columns (qty, month). Splitting this way eliminates repetition and ensures each column depends on the determinant that defines 2NF.

Normalization - Third Normal Form (3NF)
#

3NF addresses transitive dependencies. If a non-key attribute depends on another non-key attribute rather than on the primary key, the table violates 3NF even if it already satisfies 2NF.

A Client table illustrates the issue:

client_idclient_namecityregion
1Alessandro FerriniPratoToscana
2Donata BiniMilanoLombardia
3Fabrizio MonetaTorinoPiemonte

If each city belongs to exactly one region, we have the functional chain client_id -> city -> region. To obey 3NF, move city and region into a separate Cities table where city is the key. The Client table then references Cities, and every non-key attribute depends solely on client_id, eliminating transitive redundancy.

Normalization - Boyce-Codd Normal Form (BCNF)
#

BCNF strengthens 3NF by requiring that for every non-trivial functional dependency X -> Y, the determinant X must be a candidate key in that table. BCNF prevents cases where a non-key attribute determines another column despite not being a candidate key itself.

Consider a university schedule where each classroom hosts exactly one professor:

professorroomcourse
Prof. BianchiAula MagnaDBMS-101
Prof. BianchiAula MagnaDBMS-102
Prof. RossiAula 2PYTH-206

Here room -> professor, but room is not the table’s primary key (course might be). This violates BCNF because room determines professor without being a candidate key. The fix is to extract a Rooms table keyed by room that records the professor in charge, while Courses keeps (course, room) and refers to Rooms. Now every dependency resides in a table where the left-hand side is a candidate key, restoring BCNF.

Fourth Normal Form (4NF)
#

4NF prohibits a single key from expressing unrelated multivalued attributes. When a row links Instructor -> Languages and Instructor -> Subjects, and those lists are independent, every row repeats each language with each subject. 4NF splits each multivalued dependency into its own relation so languages and subjects no longer form a Cartesian product.

instructorlanguagessubjects
Prof. BianchiIngleseSQL
Prof. BianchiIngleseDatabase Design
Prof. BianchiFranceseSQL
Prof. RossiIngleseData Mining
Prof. RossiIngleseAI
Prof. RossiSpagnoloData Mining
Prof. RossiSpagnoloAI

Split the table into dedicated relations for languages, subjects, and instructors. Each relation captures a single multivalued dependency, eliminating redundant pairings and making it possible to add a new language or subject without duplicating the other list.

Fifth Normal Form (5NF)
#

5NF (Projection-Join Normal Form) removes redundancy that appears only in ternary or higher relations. A table that records (Supplier, Product, Customer) belongs in 5NF when every valid triple can be reconstructed from joins of the binary projections: (Supplier, Product), (Product, Customer), and (Supplier, Customer). Working at this level makes join dependencies explicit and prevents the insertion of combinations that cannot be explained by simpler relations. Use 5NF in analytic schemas when the added integrity justifies the extra joins.

Trade-offs and deliberate denormalization
#

Adding 4NF and 5NF tables increases joins and can hurt latency, so many systems stop at BCNF or 3NF unless analytics demand the extra rigor. When joins do become too costly, denormalize intentionally: duplicate descriptive data, precompute aggregates, or maintain reporting tables that trade redundancy for speed. Always document the purpose of denormalization and automate its refresh so the schema remains predictable.

Carry these normalization checks into the modeling lab and the SQL Origins & Standards lesson so you understand how SQL syntax, planner rules, and index costs relate to the clean schema you just built.