Many databases are designed only to answer questions about the current state of your information. Most database designs can also provide a quick answer to questions of the form: "What information do I have now about last week ?" However, questions like "What information did I have last week ?" are not always so easy to answer. Depending on your database server, including its configuration and maintenance, you may or may not be able directly restore a database to any previous point in time. However, even with applications to help with log file management and presentation it is not practical to use restored copies of a database or raw log files for historic queries and analysis. Using Valid Time tables is a powerful technique to incorporate the temporal dimension for auditing and other historic queries directly in your data tables.
Microsoft SQL Server 2000 will be the reference for the code examples below. Although Valid-Time tables can be used by many different database servers for many different design goals, our initial focus is on using an audit history table to track all modifications to the Orders table in the Northwind sample database. Out of the box, the Northwind Orders table has this structure:
| Field Name | Size | Type |
|---|---|---|
| OrderID | 4 | Identity |
| CustomerID | 5 | WChar |
| EmployeeID | 4 | Integer |
| OrderDate | 16 | DBTimeStamp |
| RequiredDate | 16 | DBTimeStamp |
| ShippedDate | 16 | DBTimeStamp |
| ShipVia | 4 | Integer |
| Freight | 8 | Currency |
| ShipName | 40 | VarWChar |
| ShipAddress | 60 | VarWChar |
| ShipCity | 15 | VarWChar |
| ShipRegion | 15 | VarWChar |
| ShipPostalCode | 10 | VarWChar |
| ShipCountry | 15 | VarWChar |
The simplest design of the Valid Time table for auditing these Orders will contain all the fields in the base Orders table plus two extra fields. One datetime field is required to indicate when the other information in the record started being valid and another datetime field is required to indicate when this information stopped being valid. For this example the new table audit_Orders and these two extra fields audit_StartDateTime and audit_EndDateTime (the prefix convention ocdb_ in these examples indicates 1 Click DB, however the choice is arbitrary). All other fields in the audit table should match the base table in name, size and datatype. Identity fields in Microsoft SQL Server 2000 must also lose their autoincrement property and be converted back to their base data type. In our example this means the OrderID must be converted to a regular 4 byte integer field with no identity set. Although all the fields in the Orders table can be audited, Memo (text) or binary fields are not fully supported in Microsoft SQL Server 2000 triggers and must be excluded. Regular varchar fields in Microsoft SQL Server 2000 fields can hold up to 8000 characters so this is not a major issue for most transactional processing.
For the purposes of this example, in addition to the required Start and End fields let's add fields for the user, application, and hostname associated with each of these two events. Along with these let's also add a new arbitrary Identity field to aid in ad hoc retrieval. Here then is the structure of the table audit_Orders:
| Field Name | Size | Type |
|---|---|---|
| OrderID | 4 | Integer |
| CustomerID | 5 | WChar |
| EmployeeID | 4 | Integer |
| OrderDate | 16 | DBTimeStamp |
| RequiredDate | 16 | DBTimeStamp |
| ShippedDate | 16 | DBTimeStamp |
| ShipVia | 4 | Integer |
| Freight | 8 | Currency |
| ShipName | 40 | VarWChar |
| ShipAddress | 60 | VarWChar |
| ShipCity | 15 | VarWChar |
| ShipRegion | 15 | VarWChar |
| ShipPostalCode | 10 | VarWChar |
| ShipCountry | 15 | VarWChar |
| audit_RecordID | 4 | Identity |
| audit_StartDateTime | 16 | DBTimeStamp |
| audit_EndDateTime | 16 | DBTimeStamp |
| audit_StartUserName | 255 | VarChar |
| audit_StartAppName | 255 | VarChar |
| audit_StartHostName | 255 | VarChar |
| audit_EndUserName | 255 | VarChar |
| audit_EndAppName | 255 | VarChar |
| audit_EndHostName | 255 | VarChar |
Now that OrderID is no longer an Identity field, the natural primary key for this Valid-Time table would be the original primary key (the 4 byte integer OrderID) plus either the audit_DateTimeStart or the audit_DateTimeEnd field. Although logically this key presents no conceptual problems, in implementation there is a potential conflict if updates occur simultaneously as represented by the underlying data type being used. SQL Server 2000 resolves Datetime fields to a resolution of about 3.33 milliseconds. Although this issue is unlikely to present itself in practice, this condition should be considered at least for its theoretical implications. For our auditing table purposes enforcing any sort of unique constraint on the table is not imperative, so audit_RecordID will be set as the primary key.
We already have data in our Orders table, so lets dump it into the new audit_Orders table indicating the starting point of the audit. This will be a complete point in time snapshot of the data as it existed before we started auditing it. All we need is the following INSERT INTO statement:
INSERT INTO audit_Orders ( OrderID, CustomerID, EmployeeID, OrderDate, RequiredDate, ShippedDate, ShipVia, Freight, ShipName, ShipAddress, ShipCity, ShipRegion, ShipPostalCode, ShipCountry, audit_startdatetime, audit_enddatetime, audit_startusername, audit_startappname, audit_starthostname ) SELECT OrderID, CustomerID, EmployeeID, OrderDate, RequiredDate, ShippedDate, ShipVia, Freight, ShipName, ShipAddress, ShipCity, ShipRegion, ShipPostalCode, ShipCountry, getdate() AS audit_startdatetime, '9/9/9999' AS audit_enddatetime, user_name() AS audit_startusername, app_name() AS audit_startappname, host_name() AS audit_starthostname FROM Orders
This SQL statement is pretty self-explanatory except perhaps for assigning audit_enddatetime the value of '9/9/9999'. Obviously this is an arbitrary date to choose for marking the data as "currently valid". We could code these end dates as Null or unknown and perhaps satisfy some deeper design aesthetic. However in practice using Null in the design would make constructing optimizable queries on the table more difficult. Since SQL Server datetime fields themselves run after December 31, 9999 we will ignore our own reinforcement of this Y10K problem in favor of enabling audit queries that are easy to construct and optimize.
The final step is to create the trigger required to populate the audit_Orders every time the Orders table is modified. The simplest implementation requires two SQL statements. One Update command to expire the audit_enddatetime field on the the old record and one Insert command to create a new record for the current data.
CREATE TRIGGER "audit_trigger_Orders" ON "Orders" FOR INSERT, UPDATE, DELETE NOT FOR REPLICATION AS DECLARE @TrigTime DateTime set @TrigTime = getDate() UPDATE audit_Orders SET audit_enddatetime = (@TrigTime), audit_endappname = (APP_Name()), audit_endusername = (USER_Name()), audit_endhostname = (HOST_NAME()) FROM deleted,audit_Orders WHERE audit_Orders.OrderID = deleted.OrderID AND audit_enddatetime = '9/9/9999' INSERT INTO audit_Orders ( OrderID, CustomerID, EmployeeID, OrderDate, RequiredDate, ShippedDate, ShipVia, Freight, ShipName, ShipAddress, ShipCity, ShipRegion, ShipPostalCode, ShipCountry, audit_startdatetime, audit_enddatetime, audit_startusername, audit_startappname, audit_starthostname ) SELECT OrderID, CustomerID, EmployeeID, OrderDate, RequiredDate, ShippedDate, ShipVia, Freight, ShipName, ShipAddress, ShipCity, ShipRegion, ShipPostalCode, ShipCountry, @TrigTime, '9/9/9999', user_name(), app_name(), host_name() FROM inserted
Inserted and Deleted are special tables that appear only in SQL Triggers. They refer to pseudo tables containing the New and Old values of the data that initiating the Trigger. When an operation affects multiple rows, each of these be included in the Inserted and Deleted. Note that when deleting rows in your main table, nothing new is inserted or removed from your audit table. The only change is the "audit_enddatetime" field is timestamped to reflect the data as being no longer valid.
After the above trigger is made active, your database will be logging away all changes to Orders in audit_Orders. The question now is how do you get data out of this table. This is where the power of the design really becomes apparent. Showing the current state of the data is a simple select:
SELECT * FROM audit_orders WHERE audit_enddatetime = '9/9/9999'
Display the data held in the main Orders table at 1:45 PM on Tuesday, August 7, 2001 is only slightly more complex:
SELECT * FROM audit_Orders WHERE audit_startdatetime <= '8/7/01 1:45:00 PM' AND audit_enddatetime > '8/7/01 1:45:00 PM'
Keep in mind that this queries are expressed in seconds whereas SQL Server 2000 has a datetime resolution of one three hundredth of a second and usually displays datetime values only to the second. Because of this you will likely not see records that display audit_startdatetime as being '8/7/01 1:45:00 PM' because the real underlying value of the field being displayed is more exactly '8/7/01 1:45:00 PM' plus a few milliseconds.
To list all records that were changed between 1:45 PM on Tuesday, August 7, 2001 and 3:45PM on Tuesday, August 7, 2001 along with their original values is also simple:
SELECT * FROM audit_orders WHERE audit_startdatetime BETWEEN '8/7/01 1:45:00 PM' AND '8/7/01 3:45:00 PM'
In the example Inserts, Updates, and Deletes are not explicitly distinguished. An Insert is represented simply as a record with a new non-temporal key (e.g. OrderID) coming into existence at a given audit_StartTimeDate. An update is a record with a non-temporal key (e.g. OrderID) that is temporally contiguous with another record same non-temporal key (e.g. deleted.OrderID = inserted.OrderID and deleted.audit_EndDateTime = inserted.audit_StartDateTime). A deletion represented simply as a record whose audit_EndDateTime is less than '9/9/9999' (i.e. not currently valid) and has no new record taking its place. A side effect of this is that if you update the primary key, it will be represented in the same manner as if the record had been deleted and instantaneously inserted with a new key value. If you are don't have stable key values, and are uncomfortable with this equivalence, consider adding another two fields to the history table indicating something like StartAction and EndAction.
Valid Time tables can also be used for other data that needs to be tracked through time. However different Triggers are required when users are enabled to change the Start and End Dates as a regular data entry function. There are other implementation challenges associated with enforcing integrity constraints for these scenarios. Although the non-temporal primary key can still be combined with the End Date in non-auditing designs for a suitable natural key, this key will not enforce the idea of no-overlapping time periods that is integral to the design. Rick Snodgrass has written extensively on the use of Valid-Time tables and has an interesting article on the subject of implementing data integrity for these designs Referential Integrity Over Time-Varying Tables on a DB2 installation.
Logging changes to individual field values is generallynot practical for temporal designs that, although there are reasons why the technique offers special advantages for auditing purposes. First off, logging individual field changes does not require a selective update to expire the old record after any change. The database only has to to write a new record at the end of the field change history table indicating each new field. Secondly, when working with tables containing a very large number of fields, of which only a few are ever updated as a group, a full Valid Time audit table can result in more disk usage. But if a base table sees a lot of insert and delete activity and most update operations affect more than just one or two fields, Valid Time tables will probably take up less space than a field level audit table. This is because the key information and time stamps are only written once an operation not once per value. It is also much easier to reconstruct historic record values with Valid Time tables. You can use native data types in Valid Time tables instead of having to coerce all your data into large varchar fields for storage and then back to native data types for queries. You don't have to worry about capturing complex primary keys with Valid Time tables. For all these reasons field logging techniques is often the wrong design choice for many practical applications.