Time After Time (SQL Audits and Historic Views)

Author: David Kawliche

Introduction

Many databases are designed only to answer questions about the current state of your information. Most database designs can also provide a quick answer to questions of the form: "What information do I have now about last week ?" However, questions like "What information did I have last week ?" are not always so easy to answer. Depending on your database server, including its configuration and maintenance, you may or may not be able directly restore a database to any previous point in time. However, even with applications to help with log file management and presentation it is not practical to use restored copies of a database or raw log files for historic queries and analysis. Using Valid Time tables is a powerful technique to incorporate the temporal dimension for auditing and other historic queries directly in your data tables.

Creating the Audit Table

Microsoft SQL Server 2000 will be the reference for the code examples below. Although Valid-Time tables can be used by many different database servers for many different design goals, our initial focus is on using an audit history table to track all modifications to the Orders table in the Northwind sample database. Out of the box, the Northwind Orders table has this structure:

Field Name Size Type
OrderID 4 Identity
CustomerID 5 WChar
EmployeeID 4 Integer
OrderDate 16 DBTimeStamp
RequiredDate 16 DBTimeStamp
ShippedDate 16 DBTimeStamp
ShipVia 4 Integer
Freight 8 Currency
ShipName 40 VarWChar
ShipAddress 60 VarWChar
ShipCity 15 VarWChar
ShipRegion 15 VarWChar
ShipPostalCode 10 VarWChar
ShipCountry 15 VarWChar

The simplest design of the Valid Time table for auditing these Orders will contain all the fields in the base Orders table plus two extra fields. One datetime field is required to indicate when the other information in the record started being valid and another datetime field is required to indicate when this information stopped being valid. For this example the new table audit_Orders and these two extra fields audit_StartDateTime and audit_EndDateTime (the prefix convention ocdb_ in these examples indicates 1 Click DB, however the choice is arbitrary). All other fields in the audit table should match the base table in name, size and datatype. Identity fields in Microsoft SQL Server 2000 must also lose their autoincrement property and be converted back to their base data type. In our example this means the OrderID must be converted to a regular 4 byte integer field with no identity set. Although all the fields in the Orders table can be audited, Memo (text) or binary fields are not fully supported in Microsoft SQL Server 2000 triggers and must be excluded. Regular varchar fields in Microsoft SQL Server 2000 fields can hold up to 8000 characters so this is not a major issue for most transactional processing.

For the purposes of this example, in addition to the required Start and End fields let's add fields for the user, application, and hostname associated with each of these two events. Along with these let's also add a new arbitrary Identity field to aid in ad hoc retrieval. Here then is the structure of the table audit_Orders:

Field Name Size Type
OrderID 4 Integer
CustomerID 5 WChar
EmployeeID 4 Integer
OrderDate 16 DBTimeStamp
RequiredDate 16 DBTimeStamp
ShippedDate 16 DBTimeStamp
ShipVia 4 Integer
Freight 8 Currency
ShipName 40 VarWChar
ShipAddress 60 VarWChar
ShipCity 15 VarWChar
ShipRegion 15 VarWChar
ShipPostalCode 10 VarWChar
ShipCountry 15 VarWChar
audit_RecordID 4 Identity
audit_StartDateTime 16 DBTimeStamp
audit_EndDateTime 16 DBTimeStamp
audit_StartUserName 255 VarChar
audit_StartAppName 255 VarChar
audit_StartHostName 255 VarChar
audit_EndUserName 255 VarChar
audit_EndAppName 255 VarChar
audit_EndHostName 255 VarChar

Now that OrderID is no longer an Identity field, the natural primary key for this Valid-Time table would be the original primary key (the 4 byte integer OrderID) plus either the audit_DateTimeStart or the audit_DateTimeEnd field. Although logically this key presents no conceptual problems, in implementation there is a potential conflict if updates occur simultaneously as represented by the underlying data type being used. SQL Server 2000 resolves Datetime fields to a resolution of about 3.33 milliseconds. Although this issue is unlikely to present itself in practice, this condition should be considered at least for its theoretical implications. For our auditing table purposes enforcing any sort of unique constraint on the table is not imperative, so audit_RecordID will be set as the primary key.

Initial Data Population

We already have data in our Orders table, so lets dump it into the new audit_Orders table indicating the starting point of the audit. This will be a complete point in time snapshot of the data as it existed before we started auditing it. All we need is the following INSERT INTO statement:


INSERT INTO
audit_Orders
(
OrderID,
CustomerID,
EmployeeID,
OrderDate,
RequiredDate,
ShippedDate,
ShipVia,
Freight,
ShipName,
ShipAddress,
ShipCity,
ShipRegion,
ShipPostalCode,
ShipCountry,
audit_startdatetime,
audit_enddatetime,
audit_startusername,
audit_startappname,
audit_starthostname
)
SELECT 
OrderID,
CustomerID,
EmployeeID,
OrderDate,
RequiredDate,
ShippedDate,
ShipVia,
Freight,
ShipName,
ShipAddress,
ShipCity,
ShipRegion,
ShipPostalCode,
ShipCountry,
getdate() AS audit_startdatetime,
'9/9/9999' AS audit_enddatetime,
user_name() AS audit_startusername,
app_name() AS audit_startappname,
host_name() AS audit_starthostname
FROM
Orders
		

This SQL statement is pretty self-explanatory except perhaps for assigning audit_enddatetime the value of '9/9/9999'. Obviously this is an arbitrary date to choose for marking the data as "currently valid". We could code these end dates as Null or unknown and perhaps satisfy some deeper design aesthetic. However in practice using Null in the design would make constructing optimizable queries on the table more difficult. Since SQL Server datetime fields themselves run after December 31, 9999 we will ignore our own reinforcement of this Y10K problem in favor of enabling audit queries that are easy to construct and optimize.

Creating the Audit Trigger

The final step is to create the trigger required to populate the audit_Orders every time the Orders table is modified. The simplest implementation requires two SQL statements. One Update command to expire the audit_enddatetime field on the the old record and one Insert command to create a new record for the current data.

CREATE TRIGGER "audit_trigger_Orders" ON "Orders"
FOR INSERT, UPDATE, DELETE
NOT FOR REPLICATION
AS
DECLARE 
@TrigTime DateTime
set @TrigTime = getDate()

UPDATE 
audit_Orders
SET 
audit_enddatetime = (@TrigTime), 
audit_endappname = (APP_Name()), 
audit_endusername = (USER_Name()), 
audit_endhostname = (HOST_NAME()) 
FROM 
deleted,audit_Orders 
WHERE 
audit_Orders.OrderID = deleted.OrderID 
AND 
audit_enddatetime = '9/9/9999'

INSERT INTO 
audit_Orders (
OrderID,
CustomerID,
EmployeeID,
OrderDate,
RequiredDate,
ShippedDate,
ShipVia,
Freight,
ShipName,
ShipAddress,
ShipCity,
ShipRegion,
ShipPostalCode,
ShipCountry,
audit_startdatetime,
audit_enddatetime,
audit_startusername,
audit_startappname,
audit_starthostname
)
SELECT 
OrderID,
CustomerID,
EmployeeID,
OrderDate,
RequiredDate,
ShippedDate,
ShipVia,
Freight,
ShipName,
ShipAddress,
ShipCity,
ShipRegion,
ShipPostalCode,
ShipCountry,
@TrigTime,
'9/9/9999',
user_name(),
app_name(),
host_name()
FROM
inserted
		

Inserted and Deleted are special tables that appear only in SQL Triggers. They refer to pseudo tables containing the New and Old values of the data that initiating the Trigger. When an operation affects multiple rows, each of these be included in the Inserted and Deleted. Note that when deleting rows in your main table, nothing new is inserted or removed from your audit table. The only change is the "audit_enddatetime" field is timestamped to reflect the data as being no longer valid.

Querying the Data

After the above trigger is made active, your database will be logging away all changes to Orders in audit_Orders. The question now is how do you get data out of this table. This is where the power of the design really becomes apparent. Showing the current state of the data is a simple select:

SELECT
*
FROM
audit_orders
WHERE
audit_enddatetime = '9/9/9999'
		

Display the data held in the main Orders table at 1:45 PM on Tuesday, August 7, 2001 is only slightly more complex:

SELECT 
* 
FROM 
audit_Orders 
WHERE 
audit_startdatetime <= '8/7/01 1:45:00 PM'
AND
audit_enddatetime > '8/7/01 1:45:00 PM'
		

Keep in mind that this queries are expressed in seconds whereas SQL Server 2000 has a datetime resolution of one three hundredth of a second and usually displays datetime values only to the second. Because of this you will likely not see records that display audit_startdatetime as being '8/7/01 1:45:00 PM' because the real underlying value of the field being displayed is more exactly '8/7/01 1:45:00 PM' plus a few milliseconds.

To list all records that were changed between 1:45 PM on Tuesday, August 7, 2001 and 3:45PM on Tuesday, August 7, 2001 along with their original values is also simple:

SELECT 
*
FROM
audit_orders
WHERE
audit_startdatetime 
BETWEEN 
'8/7/01 1:45:00 PM' 
AND 
'8/7/01 3:45:00 PM' 
		

In the example Inserts, Updates, and Deletes are not explicitly distinguished. An Insert is represented simply as a record with a new non-temporal key (e.g. OrderID) coming into existence at a given audit_StartTimeDate. An update is a record with a non-temporal key (e.g. OrderID) that is temporally contiguous with another record same non-temporal key (e.g. deleted.OrderID = inserted.OrderID and deleted.audit_EndDateTime = inserted.audit_StartDateTime). A deletion represented simply as a record whose audit_EndDateTime is less than '9/9/9999' (i.e. not currently valid) and has no new record taking its place. A side effect of this is that if you update the primary key, it will be represented in the same manner as if the record had been deleted and instantaneously inserted with a new key value. If you are don't have stable key values, and are uncomfortable with this equivalence, consider adding another two fields to the history table indicating something like StartAction and EndAction.

Other Applications and Alternatives

Valid Time tables can also be used for other data that needs to be tracked through time. However different Triggers are required when users are enabled to change the Start and End Dates as a regular data entry function. There are other implementation challenges associated with enforcing integrity constraints for these scenarios. Although the non-temporal primary key can still be combined with the End Date in non-auditing designs for a suitable natural key, this key will not enforce the idea of no-overlapping time periods that is integral to the design. Rick Snodgrass has written extensively on the use of Valid-Time tables and has an interesting article on the subject of implementing data integrity for these designs Referential Integrity Over Time-Varying Tables on a DB2 installation.

Logging changes to individual field values is generallynot practical for temporal designs that, although there are reasons why the technique offers special advantages for auditing purposes. First off, logging individual field changes does not require a selective update to expire the old record after any change. The database only has to to write a new record at the end of the field change history table indicating each new field. Secondly, when working with tables containing a very large number of fields, of which only a few are ever updated as a group, a full Valid Time audit table can result in more disk usage. But if a base table sees a lot of insert and delete activity and most update operations affect more than just one or two fields, Valid Time tables will probably take up less space than a field level audit table. This is because the key information and time stamps are only written once an operation not once per value. It is also much easier to reconstruct historic record values with Valid Time tables. You can use native data types in Valid Time tables instead of having to coerce all your data into large varchar fields for storage and then back to native data types for queries. You don't have to worry about capturing complex primary keys with Valid Time tables. For all these reasons field logging techniques is often the wrong design choice for many practical applications.