Normalization
Normalization
is the process of splitting the bigger table into smaller table without
changing the functionality. This is done during db design. This is done to
improve the data correctness or redundancy and avoid problems due to anomalies.
(Anomalies = Inconsistency, due to Insert or Update or Delete).
Steps
in Normalization:
-
1st
Normal Form – 1NF
-
2nd
Normal Form – 2NF
-
3rd
Normal Form – 3NF
-
Boyce-Codd
Normal Form – BCNF
-
4th
Normal Form – 4NF
-
5th
Normal Form – 5NF
1st
Normal Form (1NF)
A table (relation) is
in 1NF if:
1. There are no duplicated rows in the table.
2. Each cell is single-valued (no repeating groups or arrays).
1. There are no duplicated rows in the table.
2. Each cell is single-valued (no repeating groups or arrays).
3. Entries in a column (field) are of the same kind.
*The requirement that
there be no duplicated rows in the table means that the table has a key
(although the key might be made up of more than one column, even possibly, of
all the colomns).
2nd
Normal Form (2NF)
A table is in 2NF if
it is in 1NF and if all non-key attributes are dependent on all of the key.
Since a partial dependency occurs when a non-key attribute is dependent on only
a part of the composite key, the definition of 2NF is sometimes phrased
as, “A table is in 2NF if it is in 1NF and if it has no partial dependencies.”
In 2NF the complete
dependencies are identified and move such columns to the separate tables.
3rd
Normal Form (3NF)
A table is in 3NF if
it is in 2NF and if it has no transitive dependencies.
-
Here we identify the partial dependencies
and move such columns to separate tables.
-
Approximate relationships are
created.
Boyce-Codd
Normal Form (BCNF)
A table is in BCNF if
it is in 3NF and if every determinant is a candidate key.
4th
Normal Form (4NF)
A table is in 4NF if
it is in BCNF and if it has no multi-valued dependencies.
5th
Normal Form (5NF)
A table is in 5NF,
also called “Projection-join Normal Form” (PJNF), if it is in 4NF and if every
join dependency in the table is a consequence of the candidate keys of the
table.
Insertion
Anomaly
It is a failure to place information about a new database entry into all the places in the database where information about the new entry needs to be stored. In a properly normalized database, information about a new entry needs to be inserted into only one place in the database, in an inadequately normalized database, information about a new entry may need to be inserted into more than one place, and human fallibility being what it is, some of the needed additional insertions may be missed.
It is a failure to place information about a new database entry into all the places in the database where information about the new entry needs to be stored. In a properly normalized database, information about a new entry needs to be inserted into only one place in the database, in an inadequately normalized database, information about a new entry may need to be inserted into more than one place, and human fallibility being what it is, some of the needed additional insertions may be missed.
Deletion
Anomaly
It is a failure to remove information about an existing database entry when it is time to remove that entry. In a properly normalized database, information about an old, to-be-gotten-rid-of entry needs to be deleted from only one place in the database, in an inadequately normalized database, information about that old entry may need to be deleted from more than one place.
It is a failure to remove information about an existing database entry when it is time to remove that entry. In a properly normalized database, information about an old, to-be-gotten-rid-of entry needs to be deleted from only one place in the database, in an inadequately normalized database, information about that old entry may need to be deleted from more than one place.
Update
Anomaly
An update of a database involves modifications that may be additions, deletions, or both. Thus “update anomalies” can be either of the kinds discussed above.
An update of a database involves modifications that may be additions, deletions, or both. Thus “update anomalies” can be either of the kinds discussed above.
All three kinds
of anomalies are highly undesirable, since thier occurence constitutes
corruption of the database. Properly normalized database are much less
susceptible to corruption than are un-normalized databases.
To normalize
databases, there are certain rules to keep in mind. These pages will illustrate
the basics of normalization in a simplified way.
Normalized
Model
|
Denormalized
Model
|
The number of tables is more.
|
Less number of tables is used in denormalized Model.
|
Used in OLTP Env
|
Used in OLAP or DWH Env
|
Will have limited number of Indexes as it’s a OLTP
system
|
Will have limited indexes as it is a DWH environment.
|
OLTP
|
OLAP
|
Here the data is always modified (Mostly DML’s)
|
Here the data will not be updated frequently(mostly
select statement)
|
The data volume will be from medium to large.
|
Here the data volume is large to very large (Terabyte
to Petabyte)
|
OLTP is for Transaction purpose.
|
It is for Reporting Purpose or Analytical Purpose
|
They also called as source systems.
|
DWH is also called as Target Systems
|
OLTP contains the point in time data or Present data
|
DWH contains the historical data.
|
No comments:
Post a Comment