Relational Data Patterns – Part 1

October 13, 2009 03:36 by blogadmin

Guest blog by Praveen Srivatsa, Founder and Director of AsthraSoft.
Praveen’s detailed profile can be found at: http://blog.area91.in/page/Area91-Guest-Bloggers.aspx

One of the technologies under software development is the concept of relational databases. Late in 1970's Codds rules for database normalization set a benchmark that most RDBMs have tried to support. But the process of designing a database is unusually not as refined in most software companies. Even today, we still have projects that fail due to a badly designed database. There are religious discussions on which database is better than what other database. Inefficient data stores are attributed to the underlying database - I told you Oracle performs much better than SQL Server. Why did we go with an Open Source Database if it can't scale? SQL ruled the TPCC benchmarks for a long time, we should have gone with it.

Over time, databases have started providing much more than just a relational data engine. Storing binary file streams, supporting queued transaction, replication, clustering, hot-stand bys, distributed computing, reporting, data ware housing, data transformation services etc all make their way into database solutions. Most enterprise databases - Oracle, SQL Server, DB2 are like Formula 1 cars. They are in the top league out there and what they can do is what you can push them to do. Other databases, mySQL, PostGreSQL, SQL Express work just fine as a data store. Light weight databases like SQLLite and SQLCE are tiny embedded data stores that are very personalized. OODB (a rage that keeps coming up and disappearing) are worshipped, but have never made it main stream.

But in spite of all this, Relational databases are best at working with structured data. And we are still not mature enough to design good databases all the time. So in this multi-part series, lets take a look at how we go about designing a database and what are the considerations of a DB design that we should keep in mind. Databases are very logical. A DB design talks to you and expresses how the data is related. Reading a DB design is like reading a book. Lets begin by taking a small example and seeing how we go about designing a data store for the same.

"We would like to build a time-sheet system. We want to track people with basic details about the person like name, emailid, manager. We also want to track projects and tasks associated with the project. A manager should be able to assign users to projects as well as to tasks in a project along with estimated efforts in hours. Users should be able to login and enter timesheets against their project tasks on a daily basis. They can also enter time against projects and tasks that are NOT assigned to them. Users time should be tracked as regular and overtime and the manager should get a report of estimated vs actual efforts as well as regular and overtime costing reports"

We take the requirement defined above... and typically that's the level of details that we get as part of a high level requirement definition... and we start off the DB design process. Even though we aspire to get as much details of the users requirements, its not always practical to expect the user to be able to fully define the specifications. In the case of a product, we have to engineer the solutions without keeping a specific user in mind, thus making the process of defining the requirements even tougher. Given the knowledge we start off defining the following:

   1: EmployeeTable with EmpID, EmpName, EmpEMailID, EmpManagerID (EmpID is the PK, EmpManagerID is a FK to EmployeeTable.EmpID)
   2:  
   3: ProjectTable with ProjectID, ProjectName, ProjectStartDate, ProjectEndDate, ProjectManager (ProjectID is the PK, ProjectManager is a FK to EmployeeTable.EmpID)
   4:  
   5: ProjectTaskTable with TaskID, TaskName,ProjectID (TaskID is the PK, ProjectID is the FK to ProjectTable.ProjectID)
   6:  
   7: ProjectAllocationTable with ProjectID, EmpID, StartDate, EndDate (ProjectID + EmpID is the PK, ProjectID is a FK to ProjectTable.ProjectID, EmpID is a FK to EmployeeTable.EmpID)
   8:  
   9: TaskAllocationTable with EmpID, TaskID, StartDate, EndDate, Effort (in hrs) (EmpID + TaskID is the PK, TaskID is a FK to ProjectTasksTable.TaskID, EmpID is a FK to EmployeeTable.EmpID)
  10:  
  11: TimeSheetTable with EmpID, TaskID, Date, Effort (in hrs) (EmpID + TaskID + Date is the PKTaskID is a FK to ProjectTasksTable.TaskID, EmpID is a FK to EmployeeTable.EmpID)

Well we are pretty much done. Whew - that was simple enough and hardly took time. Now lets look at scenarios and see how they are addressed as part of the design.

1. Who defines employees, managers, projects and tasks? Do we need an Admin? Aaahh - we also need user logins and roles and password management.

2. What happens when an employee leaves? What happens to tasks allocated?

3. Can we have all the time sheets going to the manager who then approves it? Or at the least can we ensure that the manger can view all timesheets of his reportees?

4. What is the impact of changing a manager of an employee? Can the past managers view timesheets of an employee (should be able to), can the current manager view past timesheets of the employee(should be able to)? How do we track this?

5. Should timesheets go to the employees manager or to the project manager? What happens in this case if the project manager changes?

6. Can an employee be allotted to a project multiple times (From 10-Feb to 25-Feb and then again from 4-Mar to 10-Mar). How can we define this if the ProjectAllocation table has the ProjectID+EmpID as the primary key

7. Can an employee enter tasks for projects they are not allotted to? If so, which project manager can see that timesheet data?

8. If I want to know total hours worked per employee as a live count - to display in a dashboard and assuming that people update timesheets every hour.... where can we store that data? We would like to show the sorted by EmployeeHours.

9. Oh... and we forgot how to figure out overtimes???

As we can see, even though we think that DB design is a very simple process, there needs to be a logic to the madness in designing and creating database tables and relationships. As we progress through the software development cycle, we get more questions and we just add more columns to the DB design without looking back as the basic tenants of the DB design. Starting from the next post, lets start taking a look at how we can plan our DB design better.

 


Comments

Add comment


 

Enter the word
captcha word
(hear it spoken)


  Country flag

biuquote
  • Comment
  • Preview
Loading