Persisting objects in tables

With a tip of the hat to Stefan Wenig, whose diagrams I lifted from his slides.

The overview in the previous What is re-motion? chapter briefly explains that three methods are popular when storing object instance data in relational database tables:

  • single table inheritance
  • concrete table inheritance
  • class table inheritance

If you are familiar with these approaches, you can safely skip this section. An alternative discussion can be found at the castle project's website

In the following example, we use four classes:

  • The abstract class Person
  • The concrete class Employee, derived from Person
  • The concrete class Customer, also derived from Person
  • The abstract class Order, not derived from Person

Here is the UML diagram:

In the following discussion, we will simply use dashed boxes for signaling table borders.

Mapping with single table inheritance

For single table inheritance, objects instances belonging into the same tree of ancestry go into the same table.

This gives us two tables:

1.) Table Person, with the columns

  • FirstName
  • LastName
  • EmployeeNumber (NULL for objects of type Customer)
  • Wage (NULL for objects of type Customer)
  • CustomerID (NULL for objects of type Employee)

2.) Table Order, with the columns

  • OrderID
  • Date}

Since Order is entirely independent from Person, it gets its own table. This strategy for storing objects has three big advantages:

  • the localization of instances in the table is very fast
  • the reading of instances in the table is very fast
  • the writing of instances in the table is very fast

The big disadvantage can't be seen in the handful of objects in our example, but for a practical number of classes and more bushy trees of inheritance, the number of columns in tables explodes and quickly becomes unmanageable.

Mapping with concrete table inheritance

Concrete table inheritance reduces the number of columns in tables significantly, at the expense of speed and programmer convenience. This diagram shows how the inheritance tree is sliced up for concrete table inheritance:

For concrete table inheritance, we end up with separate tables for each concrete class (hence the name "concrete table inheritance"). In our example, we give each of the Employee- and Customer- branches its own table. Nothing changes for Order, of course. Here is a list of tables and their columns:

The table Employee has these columns:

  • FirstName
  • LastName
  • EmployeeNumber
  • Wage

The table Customer has these columns:

  • FirstName
  • LastName
  • CustomerID

Order is not affected by our move to concrete table inheritance. It's columns remain

  • OrderID
  • Date

Concrete table inheritance does not need redundant columns for objects of a base-class not sporting the columns of the sub-class, so it solves both the problem of not having a NULL for member variables and the problem of exploding numbers of columns. However, please note that there still is some redundancy, because both the Employee and the Customer table require their own FirstName and LastName columns.
Concrete table inheritance trades the column redundancy in single table inheritance for column duplication. Concrete table inheritance does not give you extra columns in your instance that have a purpose only in instances of some other type, but it gives you columns that are duplicated in the tables for other types. In the example above both the Employee and the Customer table require their own FirstName and LastName columns.

Mapping with class table inheritance

With class table inheritance, you can reduce the number of columns even further, albeit at the expense of even more speed and programmer convenience. Here is the diagram that shows that each class gets its own table:

Consequently, we end up with four tables.

Person has the columns

  • FirstName
  • LastName

(and a column with a reference to either an object/row in the Customer or Employee table)

Employee has the columns

  • EmployeeNumber
  • Wage

(and a column with a reference to an object/row in the Person table)
Customer has the column

  • CustomerID

(and a column with a reference to an object/row in the Person table)

Order is not affected in this thought experiment and happy as a clam with its OrderID and Date columns.
As explained in the re-store section of the overview, class table inheritance transforms the "is-a" relationship into a "has-a" relationship. Expressed in UML:

In the left ("IS A") diagram we see that the object someEmployee is an Employee which in turn "is a" (subclass of) Person. It is possible to store the same attributes of an Employee without inheriting from Person, however, by making a separate instance of Person and putting a reference to it into someEmployee. In the right diagram we see that "HAS A"-relationship. This is essentially what class table inheritance does to inheritance behind the scenes: spreading a single "instance" over several instances and establishing references between them. This is very economical in terms of redundancy, but clumsy in terms of speed and programmer convenience. After all, the query for assembling all instance data is complicated by JOINs.
As you see, the three approaches to object-relational mapping must make compromises between minimizing space or minimizing complications. Which approach is actually used in a re-motion application must be decided by a developer after taking into account how the data is used (but be aware that class table inheritance is not supported by re-motion yet). In practice a mixture between class table inheritance and concrete table inheritance is used. Before we turn to discussing how such decisions are made for a given domain, here is a recap of the advantages and disadvantages of the three table inheritance strategies in one handy grid:

table inheritance

single

concrete

class

accessing instances

fast, easy

fast, easy

complicated by required JOIN

number of columns

very high

considerably high

low

spoilers

object members not nullable

accessing all instances, including those of common base-classes, requires UNION; table size, performance

requires JOIN

In order to give you an impression how these approaches can be combined, we look at a simple toy application. The toy application is a typical enterprise application with the following concrete types of domain objects:

  • Employees
  • Premium Customers
  • Standard Customers
  • Order

The concrete classes for those domain objects are derived from the abstract classes Person and Customer classes in fairly obvious ways, and, just for luck, we throw in our mascot class Order in for good measure. Here is the UML diagram, with dashed lines delineating table boundaries:

FIXME ILLU

As you can see, a mixture of single table inheritance, concrete table inheritance and class table inheritance is used. The layout of the given design is not arbitrary, careful analysis of how many objects there are of each class and how they are used must be taken into account. The design is based on the following assumptions:

  • The company will process many more orders than it has customers or employees
  • The company will have many more customers than employees
  • The company will have many more standard customers than premium customers

Observe that the deepest line of ancestry is the one going from Object to StandardCustomer, along Person and Customer. In terms of extra columns, this makes single table inheritance somewhat problematic. On the other hand, overall performance of the system in typical use will be decided by how quickly instances of these classes can be processed. For this reason, we decide that quick access is the most important requirement and put all the member fields of StandardCustomer and all its superclasses into a single table Person:
StandardCustomer columns:

  • CreatedBy
  • CreatedAt
  • FirstName
  • LastName
  • CustomerID
  • CreditCardInfo

In this particular case, the decision of using single table inheritance looks particularly clever, because there are no other subclasses in the same table spawning off that go into the same table. Alright, Employee is a subclass of Person, and PremiumCustomer is a subclass of Customer, but these classes get their own tables, so their columns don't pollute the StandardCustomer columns. In this particular case, StandardCustomer's single table inheritance comes with all the benefits and only modest disadvantages.
As can be seen in the diagram, Employee and PremiumCustomer each get their own tables, but they need references back into the StandardCustomer table, because some of their instance data is stored there. Consequently:

Employee columns:

  • EmployeeNumber
  • Wage
  • (A foreign key into the StandardCustomer table storing some more instance data)

PremiumCustomer columns:

  • KeyAcctMgr
  • (A foreign key into StandardCustomer table storing some more instance data)

Observe that the foreign keys into another table where the instance data shared with the superclass can be found clearly makes this a class table inheritance mapping, but here it comes at a price: For every Employee object and every PremiumCustomer object, not only a row in the Employee, or PremiumCustomer table must be created, but also one in StandardCustomer. This StandardCustomer table comes with an extra {{CreditCardInfo column that is neither part of an Employee object or PremiumCustomer object.
This column will be redundant for Employee and PremiumCustomer instances, but since we don't expect that many of either, we believe it is a good decision to hitch them a ride on the Person table. Assembling the instance data for Employee or PremiumCustomer objects is complicated, of course. Whenever we need instance data for either, we have to join it with the instance data in Person. This, too, is a fair price to pay here, because Employee data is needed only once a month for payroll, and PremiumCustomers are premium because they place huge orders once in a blue moon, giving the company large sales with very little administrative overhead. This means that slow access to instance data won't matter much for overall system performance.
Mapping the Order class is a no-brainer. It has nothing to share with any subclass of Person, so it needs no relation to that table. Since we don't have a separate Object table, we use concrete table inheritance for storing both Object members and Order members in a single table.

Order table columns:

  • CreatedBy
  • CreatedAt
  • OrderID
  • Date

In practice, concrete table inheritance is the most frequently used type. Again, be reminded that re-motion does not support class table inheritance yet. What re-motion does support is combinations of single table inheritance and class table inheritance within its domain.
A very practical feature is support for database views. Views can make any type of * table inheritance look like single table inheritance by merging the data from multiple tables into a single result set.

More on object-relational mapping

O/R-mappers are an interesting topic. The author found a very good introduction to its basic concepts and challenges here

Object-relational mapping is hard to get right, as the article behind the link above points out. The focus for re-motion's O/R-mapping is a good compromise between flexibility and simplicity.