A Database Design Tutorial
Everything in this tutorial is going to be explained using an example database. This example database is a simple database that serves well to illustrate various concepts. It's most likely not a very useful database to use in practice, but using it was not the intention.
This database is a variant on a common theme, a database to keep track of customers, items in stock, the different orders of the mentioned customers, the items on each order and the number of items ordered. In addition we will keep track of one bank account for each customer from which we will get the money for each order.
Why a database?
A database is very useful for various reasons, but if one has to choose its crowning glory it will most likely be the fact that one can get its data in a nearly endless variety of ways. That means one can create reports to show just what one needs. In our example database we can get reports for such complicated questions like who bought two or more of item a, but not item b, and had a total order amount between x and y dollars between date p and date q. Think of going through your paper records to come up with the answer to that one.
Relational database foundations
Edgar Codd is the father of the relational database. It all started in 1969 when Codd was a researcher at IBM. The relational database is built on mathematical foundations. It has to be correctly designed to work faultlessly. The relational in relational database refers to the tables (relations) of which the database is composed. These tables are not standalone tables, no, they are related to each other in very specific ways. This brings us to the two kinds of database laws:
- laws about table structure
- laws about how tables are related to each other
What is a table?
A table is one object or entity about which we want to keep data. Try to make these objects indivisible. For instance, keep data about an employee in one table and data about his dependents in another table. We get compound data using the relationships that exist between tables. One can have a table or tables to handle these relationships. In our example database we have tables for:
- customers
- a customer's bank account details
- orders
- items
- a relationship table to handle the different items on the different orders
Table structure
So, what does a table look like? Well, something like this:
P.K | Surname | Names | D.O.B |
---|---|---|---|
1 | Brown | John Jack | 23 Mar 1962 |
2 | Smith | Peter George | 16 Jun 1953 |
3 | Williams | Agnes Semolina | 7 Jan 1991 |
4 | Jackson | Charlene Gail | 5 Aug 1952 |
You will notice that the table has rows (horizontal) and columns (vertical). In this case the first row contains the column headings. The first column heading is the heading for the primary key. The rest are self explanatory. This brings us to the first law: in one table, column headings may not be repeated. Close on the heels of this is the second law: every row must have a primary key which uniquely identifies that row and is not repeated in that table. In the example the primary key is a single value. Primary keys can also be composite, i.e. consist of more than one value in more than one column. In this last case the total combination of values must be unique, e.g. one can have the primary key values (A,B), (A,C), (B,C) and even (B,A), all nice and legal.
All tables must be in at least third normal format for the database to work faultlessly. So, what are these normal forms?
I found the easiest to remember classification of the normal forms in a book on database design by Fidel Captain. It goes from the smallest data unit, one column in one row, to the biggest, the table.
First normal form
A table is in first normal form if all column values are indivisible. Say you have an order table and in this table you have a column called items. Now for one order you enter the following values in the items column: pen, ink, paper. This table is now not in first normal form and your design is flawed. The items should be in a second table related to the order table.
Furthermore, all possible values for any column must be of the right data type and in the domain of allowed values. That's to say if you decided only to allow integers between 0 and 1,000,000 in a certain column, each value in that column should be an integer between one and a million, both values inclusive.
A column must also be unique in a table. You cannot have more than one column named 'article' in one table.
Second normal form
The second normal form concerns a whole row in any table, i.e. an entity in that table.
A table is in second normal form only if it is in first normal form and every non-key column is fully dependent on the whole primary key. This is only an issue with composite primary keys. Say you have succumbed to the temptation of everything in one table and you have an order_item table with the order_id and the item_id together a composite primary key. Desirous to see everything in one table, you have also included the customer name in this table. Obviously, the customer name has nothing to do with the item_id, which leads to the items table. However, using the order_id to get to the orders table you will find a customer_id there which will take you to the relevant customer in the customers table. That means your table is not in second normal form as the customer name is only dependent on the customer_id part of the primary key. You'll have to remove the customer name from this table. See the example to see how it should be done.
Some people will class what follows as a requirement for the third normal form. As it concerns the integrity of just one row (an entity in a table), I put it here.
Besides the requirement that every column in a row must be dependent on the the whole foreign key, it also must not be dependent on any non-key column. The classical example of this is a derived value. Say the sale table has an item price column and a column indicating how many of the specific item went into that sale, then there should be no total amount column. The reason for this is obvious: if either the price or number of items is updated, the total amount has to be updated as well. Sooner or later someone is going to make a mistake and the data in that row will be corrupt. Such a built in weakness decreases reliability of the data. Rather, let your application accessing the database do the calculation and show the total amount where it has to be shown.
One can summarise the second normal form by saying that the value of any non-key field in any table should be dependent on only the key, the whole key and nothing but the key.
Third normal form
The third normal form has to do with referential integrity between tables.
Foreign keys are used to establish relationships between tables. Third normal form demands that the table must be already in second normal form and, additionally, that every foreign key in the table has a corresponding primary key in the table it references. There can be no orphaned foreign keys.
A consequence of the above is that if any primary key is changed, all foreign keys depending on that primary key should also be changed as a cascade effect of the change in the primary key.
If a row containing a primary key referenced by any number of foreign keys is deleted, all rows containing the relevant foreign keys should also be deleted, as should all rows containing a foreign key depending on the primary key of the second row mentioned. And so on and so on. That means just deleting one row may cause many other rows to be deleted. That will happen when 'on delete cascade' has been specified in the creation of the relevant tables. This can be disasterous. It may be better to not specify cascade deletes and just let the database refuse to delete a row which has a primary key used as a foreign key in another table. All entries with dependent foreign keys will then explicitly have to be deleted before the first entry (row) can be deleted. This decreases the risk of unintended deletions.
Relationships
First of all, realise that the word many in the context of database table relationships can have any value, from 0 (zero) to any positive number. Keeping that in mind, there are three relationships:
- one to one
- one to many
- many to many
How do we establish relationships between tables?
This is done with the primary key of one table which is used in a related table where it is known as a foreign key.
Take the primary key of any table from which you want to establish a relationship between just one member (row) of that table at any one time and use it in the other table. If a member of table A can have a relationship to many members of table B, and similarly a member of table B can have a relationship to many members of table A, then both the primary keys of table A and table B are used as foreign keys in a third table known as a relationship table.
Types of relationships
One to one
One customer has one bank account on record and one bank account can belong to only one customer. Most, if not all, one to one relationship partaking tables can successfully be merged into one table. For this reason one to one relationships are not common. But there may be legitimate reasons to do this. We now take the customer primary key and use it in the bank_account table, but in such a way that each customer primary key can only appear once in the bank_account table. How does one do that? One can use the customer primary key, say customer_id, as an independent foreign key (a foreign key is a primary key from table A appearing in table B and establishing a relationship between a row in table B to a row in table A) and ensure that this key is not used more than once. One tells the database every value in this foreign key column needs to be unique in this table. The second way one can do this is to use the primary key from table A as the primary key of table B. The database automatically sees to it that the primary key is unique. The primary key in table B is therefore also a foreign key pointing to one row in table A.
One to many
One customer can have many orders, but one order can belong to only one customer. In the preceding sentence when we talked about the customer we always used the word one, but when we talked about order/s it was plural in one case. Therefore the customer is the one (owning) side of the relationship and the order is the many side of the relationship. Remember how we establish relationships? So, we take the primary key of customer and use it in the orders table. The only difference with the one to one relationship is that this time the foreign key must not have a unique requirement, it can be repeated over and over. Many orders can belong to one customer.
Many to many
One order can contain many items and one item can be on many orders. Clearly, there is no one side here. So what to do with the primary key? This is where one needs a relationship table. To establish a many to many relationship between tables A and B, one creates a relationship table R and create a one to many relationship between A and R and again between B and R. The relationship table R really needs only two columns, one for the primary key from A as a foreign key in R, let's say a_id, and one for the primary key from B as a foreign key in R, let's say b_id. These two foreign key columns together can then be a composite primary key for table R. Neat, isn't it?
Here's how it all comes together
The image shows the tables and their relationships from the example database.
The column to the left of the arrow pointing down and in to the table is a column
used in another table as a foreign key. The little key indicates a primary key, whether it is used in another table as a foreign key, or not.
The column to the left of the little arrow pointing up and to the outside of the table is a foreign key that references a primary key in some other table.
The little blue T's indicate a text data type, the hashes (#) a numerical data type and the little clocks a time (like a date) data type.
I used DbSchema, which you can download, to create
the database. One can use a text editor to do so if one knows SQL (standard querying language).
The arrow end of the lines between tables indicate the one side of the relationship, that's to say the owning table. The forked
side indicates the many side of the relationship. Just to the left of the bank_accounts table, on the line, one can see a
short, single vertical line. Being not forked, this indicates a one side, but not the owning side, of a one to one relationship.
The top of the order_details table diagram is in a different colour from the rest. This is not a feature of DbSchema, but take
it in this case to be because order_details is a relationship table. You can see that order_details contains the
number_ordered column (also called a field) in addition to the keys. This is completely legal. In fact, this table may contain more
fields. It depends what you need.
You can download an sql file that you can use to create the example database using PostgreSQL here.
This is it, then. This was only about database design. You now know enough to design a useful database. We haven't touched upon data input forms or reports, both essential components of a database application. But one can't do everything in one short tutorial.
If you find what you learned on this page useful, please use the social media widgets at the bottom and pin, tweet, plus-one or whatever this page.
Use the form below to send any comments.
Use and empty line to separate paragraphs in the "Comment" text area.
Links and html markup are not allowed.