Quick and Dirty Data Auditing

A lot of times a project, especially as it progresses, lacks proper requirements, time, or the resources, and tracking changes to data is often an afterthought. Rapid changes to the the database schema and code occur on a frequent basis, and with a moving trend to create software with fewer checks and balances (less quality assurance staff, less emphasis from management to allow proper time for code development), errors in saved data becomes more and more prevalent.

With all of this in mind, the ability to track changes in data is a powerful tool for both development and production issue remediation. However, one of the biggest issues faced is maintaining auditing tables when the database schema changes, and trying to ensure they don’t cause a system failure on their own.

Take for example the following table:

and with this a common scenario is to create an auditing table like so:

and a trigger which fires on updates:

With this scenario there are some problems. The most major being that if the live table gets updated and not the audit table, the trigger fails and stops the insert. (This is easily fixed in development, not so much in production.) Hopefully if this is the case, the calling application is verifying the database update succeeds.

This problem can be solved by specifying the columns specifically,

but this leaves the problem that if columns are added to the live table, and either the audit table or the trigger aren’t also updated, new fields may slip through the cracks and won’t be noticed until someone actively checks the table for information. Unfortunately, this is a commonly occurring worse case scenario, because most of the time you only look in a audit table when you need to track down something that might be wrong.

There is actually a fairly simple solution to this problem. Store the updated data as a single text entry. Furthermore it’s just as easy to store it as an XML entry which most runtime environments have a convenient way of parsing. To accomplish this, create an auditing table with the appropriate fields:

The TableId is the sys.tables.object_id so that when a trigger enters data into the table, it is easily discernible which data came from where. The action field says whether it was an insert or a delete. (Updates to a table do a delete of the previous data and then an insert.) The TransactionEntryId links multiple entries into the audit table together to show which insert and deletes are the result of an update.

Next, create the trigger and apply it to the appropriate table or tables.

In truth, this doesn’t work for all scenarios. Tables with a high transaction volume, or ones with large amounts of data for each record may suffer problems, but it works well enough in many scenarios to be a viable solution even if it isn’t one for the long term. (Like any solution, before jumping to conclusions about performance, the change should be tested.) Also, by keeping the trigger contents generic, this same code can be applied to multiple places without needing to change it alleviating the possibility of accidentally making a mistake in the setup process.

Depending on the scenario, it can beneficial to be able to enable auditing at will. In an ideal scenario, it would be as easy as applying the trigger to the table when necessary, but in many organizations, this isn’t possibly without a formal deployment process, and when looking into a problem normally auditing needs to be added quickly. By adding another table which tells triggers when to record data, auditing can be turned on by adding an entry.

and edit the trigger template

The drawback to this approach is that the triggers will still fire on each data change in the table, but it does alleviate some of the overhead of the second insert statement and the audit tables growing.

Compile Time Stored Procedures

(code for this project can be found here)

One of the largest problems with interacting with databases is the lack of compile time assurance that the application code and the SQL code interact correctly. Testing helps with finding the errors, but as a system grows in size and complexity it becomes both difficult and costly to find every mistake. There are pre-made solutions such as F# Type Providers or Entity Framework, but these present their own challenges such as not having the project in F# or avoiding the heavy handedness of Entity Framework.

Fortunately SQL server provides enough information about Stored Procedures in the Metadata tables that makes it possible to create a from scratch compile time safe mapping between stored procedures and application code. Combining this with
T4 Text Templates it’s possible to automatically generate all the code necessary to handle stored procedure inputs and returns.

The first part is to pull all the stored procedure data from the sys.procedures view.

I remove all the procedures starting with sp_ as this prefix is meant to designate that it is a system procedure for use by SQL Server. The three important pieces from the table are the Procedure Name (p.name), Object Identifier (Object_Id), and the Schema Name (s.name).

Select Sys Procedures

Armed with the object id it's now possible to pull all the data for a procedures parameters from the sys.parameters view.

With this and the procedure's name it's possible to construct code to call SQL Server stored procedures which is compile time safe. Simply generate a method which takes in the appropriate parameters to call the stored procedure. The last step is to generate a lookup table which translates the SQL datatypes to ones which are valid C# syntax.

Idenfitfing which parameters have a default value and be not passed to the procedure

Since stored procedure parameters can have default values and therefore aren't necessary to populate, it would be nice to be able to know which one's are available but aren't required. This is a little trickier. There is a field has_default_value which looks like it would tell you if the parameter has a default value and therefore is optional, but there is a problem in the SQL Server metadata table, and it is only correct for CLR parameters, and not SQL ones. In order to actually find out if parameter has a default value, there are two options.

  1. Use Sql Management Objects (SMO)
  2. Pull the stored procedure code using something like SELECT OBJECT_DEFINITION (PROCEDURE_OBJECT_ID)

Using SMO to pull the definition is actually relatively simple assuming you have access to the appropriate dlls from Sql Server Management Studio.

Code:

If you don’t have access to the SMO dlls, you can still use something akin to a regular expression (or some other parsing method) to pull the parameter information out from the Object_Definition. As with parsing anything, there are a number of gotchas to watch out for, but it is possible with a little bit of work.

User Defined Tables

Starting in SQL Server 2008 it is possible to pass tables as parameters into stored procedures. There is view called sys.table_types which lists all of the user defined tables and has the necessary columns to link it back to which stored procedures use it as a parameter, and retrieve its column definition.

Stored Procedure Output

To automatically generate C# code for SQL Server output, you’ll really need SQL Server 2012 and later for it to be effective. In that edition they added, the dm_exec_describe_first_result_set_for_object procedure

This procedure takes a stored procedure Object_Id and returns the column information concerning the first result set it finds.

Result Set

The great thing about this procedure is that not only can you generate classes which can automatically update their types and names based on a result set that changes during development, but the procedure will return nothing if the stored procedure is non functional because of a changes in the database. By running this this and having it map all the result sets, the ones which come up blank which should return data indicate the procedure will not work with the current schema and needs to be fixed. It quickly helps eliminate the difficulties of knowing which procedures will break during program execution.

Putting It All Together

With these procedure, all that is really necessary is something that opens a connection to the database, pulls the necessary data and create the appropriate C# files.

As an example and a quick starter I’ve created an example solution. The SQL metadata retrieval is in the Classes.tt file, and all the code C# generation is in the StoredProcedures.tt file. The database I used to generate the example code is found here, and uses the Adventure Works Database.

Wait Wait Don’t Tell Me…..On Second Thought

Consider the following:

Ideally, the example would start at 100 and loop until the counter hits zero, but the statement to decrement the counter is missing. It won’t ever stop running, and there is no visual cue to alert there is an issue. The most logical solution to this would be to add the PRINT statement.

Sadly, this does not solve the issue at hand as the PRINT statement only displays output sent to it after the script finishes executing. It’s great for knowing what occurred after it’s finished but provides no use for telling what is currently happening. Conveniently, there is a solution using the RAISERROR function.

The first parameter is the message to display, and the second and third are the error level and the system state. RAISERROR treats anything with an error level from 0-10 as a message and does not affect code execution. The system state ranges from -1 to 255 and anything outside of that results in an exception (and for the purposes of printing messages, this value is inconsequential). Using RAISERROR and adding the WITH NOWAIT option allows the immediate message output describing how the script is executing as it happens.

informational message

When the error level is raised above 10, Sql Server raises an exception:

Error

This works really well when executing a command from within a .NET application, but an issue arises when executing code from an external process like SqlCmd. When using the RAISERROR function and supplying a severity above 10, it doesn’t always trigger the process’s exit code (Process.ExitCode) to change to something other than 0 which alerts the calling application about an issue.

Exit code 0

Why use SQLCmd when .NET has a built in way of executing code with SqlCommand? The SqlCommand object does not allow the use of the batch seperator: GO. For applications doing Create/Read/Update/Delete (CRUD) operations, not using the GO statement isn’t a problem, but it is important to applications which perform maintenance on databases. Certain SQL execution statementslike CREATE PROCEDURE, must be the first item in a batch meaning chaining multiple statements together in one execution is impossible without using GO.

Fortunately, there is an easy solution to this problem. Change the error level to 20 – 25 or greater and add the WITH LOG option (values greater than 25 are interpreted as 25). This will close the connection to Sql Server and force applications like SqlCmd to return an ExitCode that is something other than 0. Unfortunately to set a severity level between 19-25, the account running the script must either be a member of the sysadmin role or have the Alter Trace permission.

Exit Code Not Zero

It is possible to look at the output and error messages and attempt to parse them and determine if an error occurred instead, but this is error prone and not the standard method for checking for system exceptions. It may work for applications specifically written for this scenario, but applications written to interact with others in the Windows ecosystem look at the exit code to determine process success or failure making setting it to the correct value vital when interacting with different systems.

You Get an Update, and You Get an Update, and You Get an Update

What’s the difference between

and

When they’ve completed, both ensure all the records in the table have the ReadyToProcess value set to 1. When looking strictly at the records in the database, there is no difference. In this example, the end result isn’t what’s important, but it’s how the system there. In the first one it updates all records regardless of what the end value was, and the second only makes an update if the value is different than what is being set. The first update statement isn’t telling SQL Server, “Hey I want the value to be this.” What it’s really telling SQL Server is, “I want you to update the field in this record to this. It may look the same as the previous value, but I still want you to update it.”

Depending on the number of records in the table, and what is affected by the table updating, it might not be much of a problem. The first is less efficient than the second, but it still accomplishes the task. It really starts to be an issue when updating a large number of records, or when something like a trigger is associated with the table. Every time SQL Server updates a record, the associated trigger fires, and this is true even if the new value and the old value are the same. An update is an update no matter what the end value is. If the system logs audit data for updates etc. the non-qualified statement is forcing the system to do a lot more work for absolutely no gain.