Persisting objects in tables
With a tip of the hat to Stefan Wenig, whose diagrams I lifted from his slides.
The overview in the previous What is re-motion? chapter briefly explains that three methods are popular when storing object instance data in relational database tables:
- single table inheritance
- concrete table inheritance
- class table inheritance
If you are familiar with these approaches, you can safely skip this section. An alternative discussion can be found at the castle project's website
In the following example, we use four classes:
- The abstract class Person
- The concrete class Employee, derived from Person
- The concrete class Customer, also derived from Person
- The abstract class Order, not derived from Person
Here is the UML diagram:
In the following discussion, we will simply use dashed boxes for signaling table borders.
Mapping with single table inheritance
For single table inheritance, objects instances belonging into the same tree of ancestry go into the same table.
This gives us two tables:
1.) Table Person
, with the columns
FirstName
LastName
EmployeeNumber
(NULL for objects of typeCustomer
)Wage
(NULL for objects of typeCustomer
)CustomerID
(NULL for objects of typeEmployee
)
2.) Table Order
, with the columns
OrderID
Date
}
Since Order
is entirely independent from Person
, it gets its own table. This strategy for storing objects has three big advantages:
- the localization of instances in the table is very fast
- the reading of instances in the table is very fast
- the writing of instances in the table is very fast
The big disadvantage can't be seen in the handful of objects in our example, but for a practical number of classes and more bushy trees of inheritance, the number of columns in tables explodes and quickly becomes unmanageable.
Mapping with concrete table inheritance
Concrete table inheritance reduces the number of columns in tables significantly, at the expense of speed and programmer convenience. This diagram shows how the inheritance tree is sliced up for concrete table inheritance:
For concrete table inheritance, we end up with separate tables for each concrete class (hence the name "concrete table inheritance"). In our example, we give each of the Employee
- and Customer
- branches its own table. Nothing changes for Order
, of course. Here is a list of tables and their columns:
The table Employee
has these columns:
FirstName
LastName
EmployeeNumber
Wage
The table Customer
has these columns:
FirstName
LastName
CustomerID
Order
is not affected by our move to concrete table inheritance. It's columns remain
OrderID
Date
Concrete table inheritance does not need redundant columns for objects of a base-class not sporting the columns of the sub-class, so it solves both the problem of not having a NULL
for member variables and the problem of exploding numbers of columns. However, please note that there still is some redundancy, because both the Employee
and the Customer
table require their own FirstName
and LastName
columns.
Concrete table inheritance trades the column redundancy in single table inheritance for column duplication. Concrete table inheritance does not give you extra columns in your instance that have a purpose only in instances of some other type, but it gives you columns that are duplicated in the tables for other types. In the example above both the Employee
and the Customer
table require their own FirstName
and LastName
columns.
Mapping with class table inheritance
With class table inheritance, you can reduce the number of columns even further, albeit at the expense of even more speed and programmer convenience. Here is the diagram that shows that each class gets its own table:
Consequently, we end up with four tables.
Person
has the columns
FirstName
LastName
(and a column with a reference to either an object/row in the Customer
or Employee
table)
Employee
has the columns
EmployeeNumber
Wage
(and a column with a reference to an object/row in the Person
table)
Customer
has the column
CustomerID
(and a column with a reference to an object/row in the Person
table)
Order
is not affected in this thought experiment and happy as a clam with its OrderID
and Date
columns.
As explained in the re-store section of the overview, class table inheritance transforms the "is-a" relationship into a "has-a" relationship. Expressed in UML:
In the left ("IS A") diagram we see that the object someEmployee
is an Employee
which in turn "is a" (subclass of) Person
. It is possible to store the same attributes of an Employee without inheriting from Person
, however, by making a separate instance of Person
and putting a reference to it into someEmployee
. In the right diagram we see that "HAS A"-relationship. This is essentially what class table inheritance does to inheritance behind the scenes: spreading a single "instance" over several instances and establishing references between them. This is very economical in terms of redundancy, but clumsy in terms of speed and programmer convenience. After all, the query for assembling all instance data is complicated by JOINs.
As you see, the three approaches to object-relational mapping must make compromises between minimizing space or minimizing complications. Which approach is actually used in a re-motion application must be decided by a developer after taking into account how the data is used (but be aware that class table inheritance is not supported by re-motion yet). In practice a mixture between class table inheritance and concrete table inheritance is used. Before we turn to discussing how such decisions are made for a given domain, here is a recap of the advantages and disadvantages of the three table inheritance strategies in one handy grid:
table inheritance |
single |
concrete |
class |
---|---|---|---|
accessing instances |
fast, easy |
fast, easy |
complicated by required JOIN |
number of columns |
very high |
considerably high |
low |
spoilers |
object members not nullable |
accessing all instances, including those of common base-classes, requires UNION; table size, performance |
requires JOIN |
In order to give you an impression how these approaches can be combined, we look at a simple toy application. The toy application is a typical enterprise application with the following concrete types of domain objects:
- Employees
- Premium Customers
- Standard Customers
- Order
The concrete classes for those domain objects are derived from the abstract classes Person
and Customer
classes in fairly obvious ways, and, just for luck, we throw in our mascot class Order
in for good measure. Here is the UML diagram, with dashed lines delineating table boundaries:
FIXME ILLU
As you can see, a mixture of single table inheritance, concrete table inheritance and class table inheritance is used. The layout of the given design is not arbitrary, careful analysis of how many objects there are of each class and how they are used must be taken into account. The design is based on the following assumptions:
- The company will process many more orders than it has customers or employees
- The company will have many more customers than employees
- The company will have many more standard customers than premium customers
Observe that the deepest line of ancestry is the one going from Object
to StandardCustomer
, along Person
and Customer
. In terms of extra columns, this makes single table inheritance somewhat problematic. On the other hand, overall performance of the system in typical use will be decided by how quickly instances of these classes can be processed. For this reason, we decide that quick access is the most important requirement and put all the member fields of StandardCustomer
and all its superclasses into a single table Person
:
StandardCustomer
columns:
CreatedBy
CreatedAt
FirstName
LastName
CustomerID
CreditCardInfo
In this particular case, the decision of using single table inheritance looks particularly clever, because there are no other subclasses in the same table spawning off that go into the same table. Alright, Employee
is a subclass of Person
, and PremiumCustomer
is a subclass of Customer
, but these classes get their own tables, so their columns don't pollute the StandardCustomer
columns. In this particular case, StandardCustomer
's single table inheritance comes with all the benefits and only modest disadvantages.
As can be seen in the diagram, Employee
and PremiumCustomer
each get their own tables, but they need references back into the StandardCustomer
table, because some of their instance data is stored there. Consequently:
Employee
columns:
EmployeeNumber
Wage
- (A foreign key into the
StandardCustomer
table storing some more instance data)
PremiumCustomer
columns:
KeyAcctMgr
- (A foreign key into
StandardCustomer
table storing some more instance data)
Observe that the foreign keys into another table where the instance data shared with the superclass can be found clearly makes this a class table inheritance mapping, but here it comes at a price: For every Employee
object and every PremiumCustomer
object, not only a row in the Employee
, or PremiumCustomer
table must be created, but also one in StandardCustomer
. This StandardCustomer table comes with an extra {{CreditCardInfo
column that is neither part of an Employee
object or PremiumCustomer
object.
This column will be redundant for Employee
and PremiumCustomer
instances, but since we don't expect that many of either, we believe it is a good decision to hitch them a ride on the Person
table. Assembling the instance data for Employee
or PremiumCustomer
objects is complicated, of course. Whenever we need instance data for either, we have to join it with the instance data in Person
. This, too, is a fair price to pay here, because Employee
data is needed only once a month for payroll, and PremiumCustomers
are premium because they place huge orders once in a blue moon, giving the company large sales with very little administrative overhead. This means that slow access to instance data won't matter much for overall system performance.
Mapping the Order
class is a no-brainer. It has nothing to share with any subclass of Person
, so it needs no relation to that table. Since we don't have a separate Object
table, we use concrete table inheritance for storing both Object
members and Order
members in a single table.
Order
table columns:
CreatedBy
CreatedAt
OrderID
Date
In practice, concrete table inheritance is the most frequently used type. Again, be reminded that re-motion does not support class table inheritance yet. What re-motion does support is combinations of single table inheritance and class table inheritance within its domain.
A very practical feature is support for database views. Views can make any type of * table inheritance look like single table inheritance by merging the data from multiple tables into a single result set.
More on object-relational mapping
O/R-mappers are an interesting topic. The author found a very good introduction to its basic concepts and challenges here
Object-relational mapping is hard to get right, as the article behind the link above points out. The focus for re-motion's O/R-mapping is a good compromise between flexibility and simplicity.