Print

Without question, the most important task in any database development is the proper definition of the data model.  It is important for the proper fast execution of queries in the database, and goes a long way to defining the business model which underlies the data.  No matter what database manager you use, defining the data model properly is job 1.

There are five main rules of data normalization.  They all have to do with eliminating redundant and inconsistent data in the design of the table structure

We begin with a listing of characteristics of a membership list which includes, members, their employers, their employers' addresses, and children.

Zero Form

members

name

company

address

kid1

kid2

Rob

ABC

Suite 17 Industrial Park

Harvey

Justin

Carol

XYZ

Main Street Office Park

Harvey

Justin

We say this data is not normalized because we have not applied any of the normalization rules to it yet.  Look at the columns kid1, and kid2.  It would be inconvenient to add another column in case one of the members decided to enlarge his/her family.  So we will in the first rule decide to combine the kid columns.

First Normal Form

  1. Eliminate repeating groups in individual tables.
  2. Create a separate table for each set of related data.
  3. Identify each set of related data with a primary key.

In addition to annexing the kid fields, we need to set a unique identifier for each record, so that if another member named Rob joins we can identify him. When we apply the rules of the First Normal Form the result is with the following table:

members

userId

name

company

address

kid

1

Rob

ABC

Suite 17 Industrial Park

Harvey

1

Rob

ABC

Suite 17 Industrial Park

Justin

2

Carol

XYZ

Main Street Office Park

Harvey

2

Carol

XYZ

Main Street Office Park

Justin

Second Normal Form

  1. Create separate tables for sets of values that apply to multiple records.
  2. Relate these tables with a foreign key.

Annex the kid values to a second table, and add a column in that table to relate to the members table. That way, if anyone's family grows, we just need to add another record to the kid table, which relates to the member table- hence the term, relational database.  Use the primary key value to relate these fields:

members

userId

name

company

address

1

Rob

ABC

Suite 17 Industrial Park

2

Carol

XYZ

Main Street Office Park

kids

kidId

relUserId

kid

1

1

Harvey

2

1

Justin

3

2

Harvey

4

2

Justin

Now we have two tables: members and kids, and keys in each.  The foreign key relUserId is used in the kids table to relate the kids to their parents.

However, when we want to add another employee of an existing company, there is duplication with the company name and address.

Third Normal Form

  1. Eliminate fields that do not depend on the key.

Our Company Name and Address have nothing to do with the User Id, so they should have their own Company Id.  These fields relate to the company.

members

userId

name

relCompId

 


Rob

1

2

Carol

2

companies

compId

company

address

1

ABC

Suite 17 Industrial Park

2

XYZ

Main Street Office Park

kids

kidId

relUserId

kid

1

1

Harvey

2

1

Justin

3

2

Harvey

4

2

Justin

Annex the companies and their addresses to a separate table, add unique key fields (compId), and use the fierign key relCompId in the members table to relate those members to their company.

Coincidently, Rob and Carol have kids with the same names.  We could also make a table of possible kids' names, just as we did for the companies.

In some cases this would be 'OK', but in cases where the universe of names is small, it would be preferable to normalize it as well.

Data Relationships

There are three basic data relationships: one-to-one, one-to-many, and many-to-many. If each company had one and only one address, we could make a table for each and link them, creating a one-to-one relationship.

The relationship of the kids to the members is a one-to-many relationship, there each member can have several kids

In our Third Normal Form example we have one user related to many kids. Now, we want to change that structure to allow many members to be related to many kids, and thus we want a many-to-many relationship.

members

userId

name

relCompId

1

Rob

1

2

Carol

2

companies

compId

company

address

1

ABC

Suite 17 Industrial Park

2

XYZ

Main Street Office Park

kids

kidId

kid

1

Harvey

2

Justin

kid_relations

relationId

relatedKidId

relatedUserId

1

1

1

2

1

2

3

2

1

4

2

2

To minimize the duplication of data, now there is a table which contains only keys and foreign keys: the kid-relations table.

Fourth Normal Form

  1. In a many-to-many relationship, independent entities should not be stored in the same table.

Sometimes this rule is disregarded because it applies only to many-to-many relationships..

To give an example, we can select all of Rob's kids by invoking the following SQL statement:

SELECT name, kid FROM members, kids, kid_relations WHERE kid_relations.relatedUserId = 1 AND members.userId = 1 AND kids.kidId = kid_relations.relatedKidId

Fifth Normal Form

The final form or normalization is a check on the deconstruction. It ensures that there have been no additional unnecessary fields or tables created

  1. The original table must be reconstructed from the tables into which it has been broken down.

 

 

Metro NY / NJ SQL Server Consultants

We specialize is custom database software. Call us for a free consultation (973) 635 0080