Why Databases and Normalization with Databases

October 15, 2009 23:53 by vinod

I have been wanting to drop a post for a long time and after Praveens Post on Data Patterns, I thought I also start on some of these fundamentals that will help you understand some of these principles better … Let us get into the actual topic -

In the late 1980s and early 1990s, the RDBMS (Relational Database Management System) was one of the most popular and innovative topics. But the entry fee to purchase an RDBMS system was out of the reach of many small businesses and workgroups. Though in my various jobs I have had an opportunity to work on many database vendors, it also gave me an unique insight to the fact that the principles of Database architecture and development do remain consistent. End of the day we are all talking about storing our customers data efficiently, consistently and securely.

Off late, I have seen notions around getting away from RDBMs. Though I respect some of their points of view, I tend to see this differently. What is the USP for RDBMS engines around the world -
1.1 - Data integrity: The primary goal of the relational model is data integrity. Data integrity prevents incorrect or invalid data from being stored. In an RDBMS, data integrity can be implemented at the server level rather than the application level. This approach offers the advantage of centralized control. When data integrity is changed at the RDBMS level, it is automatically represented at the application level, ensuring consistency and alleviating the need to modify application logic.
1.2 - Structured Query Language (SQL, pronounced sequel): The SQL language was developed by IBM during the mid-1970s. The SQL language provides a common method for accessing and manipulating data in a relational database. This common language has been adapted by RDBMS vendors as an industry standard. The standardization of SQL allows someone to move to a new RDBMS without having to learn a new data access language.
1.3 - Flexibility: Modifications can be made to the structure of the database without having to recompile or shut down and restart the database. New tables can be created on the fly and existing tables can be modified without affecting the operation of the RDBMS.
1.4 - Efficient data storage: Through a process called normalization, redundant data is reduced. Normalization is a primary concept of the relational model.
1.5 - Security: Data security can be implemented at the RDBMS level rather than the application level. As with data integrity, this approach offers the advantage of centralized control at the database level as opposed to the application level.

Moreover, not to mention the availability options that many of these database solutions offer that can be of great help to customers.

Normalization
When we start talking about designs we must take a quick lap around Normalization. Normalization is a set of standard rules that test the soundness of database design. It can help prevent the problems described in the first part of this chapter. By applying these standard rules, you can pinpoint design flaws that may jeopardize data integrity and complicate data maintenance. After a design successfully passes a rule, it is said to be in #normal form (where the # represents 1st, 2nd, or 3rd). Rules are cumulative. For example, for a design to be in 3rd normal form, it must satisfy the requirements of the 3rd normal form as well as the requirements for 2nd and 1st normal forms. Technically speaking, there are other types of normalization rules beyond 3rd normal form. However, for most database designs, the first three normal forms are sufficient. You will seldom need to apply the other types of normalization. Therefore, this section concentrates only on the 1st, 2nd, and 3rd normal forms of database design.

1st normal form: No repeating groups
2nd normal form: No nonkey attributes depend on a portion of the primary key
3rd normal form: No attributes depend on other nonkey attributes

Let me refrain from making this post just a talk around normalization but this sets some of the basics for understanding. As much as we talk about this, there is another school of thought around DeNormalization too. Denormalization means that you are purposely designing your database so that it is not in 3rd normal form. This is done to maximize performance or to simplify end-user reporting. Whenever you denormalize a database, you must be willing to forego the benefits gained from the 3rd normal form. I recommend that you start your initial database design in 3rd normal form. If you find that performance problems exist, selectively step back to 2nd or 1st normal form. Keep in mind that when you denormalize a database, you do so for a specific set of application
requirements. Future requirements may not need or benefit from past denormalization decisions. Only denormalize when you have to.

Though Performance is one of the reasons we move away from Normalization, Another reason to denormalize a database is to simplify ad-hoc reporting. Ad-hoc reporting is the  unstructured reporting and querying performed by end users. End users are often confused when they have to join a significant number of tables. To avoid the confusion, DBAs can create a special set of tables designed for ad-hoc reporting. If the data is used for reporting and not online processing, you can avoid some of the problems associated with a denormalized design. Following are important notes to remember when designing databases:
2.1 - Strive for 3rd normal form to maximize data consistency and minimize update anomalies.
2.2 - When a significant number of tables must be joined to process a query, you may want to selectively denormalize the database to improve performance.

Another practical way to look at this is to look at the application that will consume or update this data. It is very important to see if this application is an OLTP (Transaction processing) or OLAP (Analytical processing) or reporting based solutions. These patterns can change the way we normalize or denormalize our database design.

This is just the starting, we will discuss and outline many of these patterns on our future posts too. Keep reading and do post your valuable comments.


Comments

Add comment


 

Enter the word
captcha word
(hear it spoken)


  Country flag

biuquote
  • Comment
  • Preview
Loading