rightclickdisable

Topics Categories

Normalization

Normalization

Normalization is the process of splitting the bigger table into smaller table without changing the functionality. This is done during db design. This is done to improve the data correctness or redundancy and avoid problems due to anomalies. (Anomalies = Inconsistency, due to Insert or Update or Delete).
Steps in Normalization:
-       1st Normal Form – 1NF
-       2nd Normal Form – 2NF
-       3rd Normal Form – 3NF
-       Boyce-Codd Normal Form – BCNF
-       4th Normal Form – 4NF
-       5th Normal Form – 5NF
1st Normal Form (1NF)
A table (relation) is in 1NF if:
1. There are no duplicated rows in the table.
2. Each cell is single-valued (no repeating groups or arrays).
3. Entries in a column (field) are of the same kind.
*The requirement that there be no duplicated rows in the table means that the table has a key (although the key might be made up of more than one column, even possibly, of all the colomns).
2nd Normal Form (2NF)
A table is in 2NF if it is in 1NF and if all non-key attributes are dependent on all of the key. Since a partial dependency occurs when a non-key attribute is dependent on only a part of the composite key, the definition of 2NF is sometimes phrased as, “A table is in 2NF if it is in 1NF and if it has no partial dependencies.”
In 2NF the complete dependencies are identified and move such columns to the separate tables.
3rd Normal Form (3NF)
A table is in 3NF if it is in 2NF and if it has no transitive dependencies.
-       Here we identify the partial dependencies and move such columns to separate tables.
-       Approximate relationships are created.
Boyce-Codd Normal Form (BCNF)
A table is in BCNF if it is in 3NF and if every determinant is a candidate key.
 4th Normal Form (4NF)
A table is in 4NF if it is in BCNF and if it has no multi-valued dependencies.
5th Normal Form (5NF)
A table is in 5NF, also called “Projection-join Normal Form” (PJNF), if it is in 4NF and if every join dependency in the table is a consequence of the candidate keys of the table.
Insertion Anomaly
It is a failure to place information about a new database entry into all the places in the database where information about the new entry needs to be stored. In a properly normalized database, information about a new entry needs to be inserted into only one place in the database, in an inadequately normalized database, information about a new entry may need to be inserted into more than one place, and human fallibility being what it is, some of the needed additional insertions may be missed.
Deletion Anomaly
It is a failure to remove information about an existing database entry when it is time to remove that entry. In a properly normalized database, information about an old, to-be-gotten-rid-of entry needs to be deleted from only one place in the database, in an inadequately normalized database, information about that old entry may need to be deleted from more than one place.
Update Anomaly
An update of a database involves modifications that may be additions, deletions, or both. Thus “update anomalies” can be either of the kinds discussed above.
All three kinds of anomalies are highly undesirable, since thier occurence constitutes corruption of the database. Properly normalized database are much less susceptible to corruption than are un-normalized databases.
To normalize databases, there are certain rules to keep in mind. These pages will illustrate the basics of normalization in a simplified way.
Normalized Model
Denormalized Model
The number of tables is more.
Less number of tables is used in denormalized Model.
Used in OLTP Env
Used in OLAP or DWH Env
Will have limited number of Indexes as it’s a OLTP system
Will have limited indexes as it is a DWH environment.

OLTP
OLAP
Here the data is always modified (Mostly DML’s)
Here the data will not be updated frequently(mostly select statement)
The data volume will be from medium to large.
Here the data volume is large to very large (Terabyte to Petabyte)
OLTP is for Transaction purpose.
It is for Reporting Purpose or Analytical Purpose
They also called as source systems.
DWH is also called as Target Systems
OLTP contains the point in time data or Present data
DWH contains the historical data.

No comments:

Post a Comment