Validation of Data Using Set-Based Sql Operations

I work in data warehousing in banking, where we handle large amounts of data and have to change complex sql statements to fix bugs etc.. Individual queries tend to get bigger and bigger. The danger is that they become so complex that fixing one bug can break other parts of the logic.

It’s important to test changes methodically. If you’re working with co-operative users or testers, they’ll provide examples of the problem that you’re trying to fix. You can test how your changes affect these examples. That’s a partial test though. Ideally, you’ll have a more thorough-going test method that will test data across a whole partition. This should avoid the embarrassment of a long “ping pong” session where you send off a results only to have them bounce back because of new problems caused by the fix, which you rectify, send off again, have returned, etc., etc….

The MINUS (Oracle) and EXCEPT(Sql Server) operators can be a big help here. I won’t try to cover these in detail, but when you make a change you’ll be comparing it with the data set returned by the existing query. The usage is as follows:

SELECT {old data set}
MINUS
SELECT {new data set}

You can start by building a query including one of the key columns to bring back the PROD data, then below it a MINUS statement followed a query to bring the same column from your new data set. This will tell you if you still have all of the rows that you started with. You can also MINUS in the other direction, although you might want to leave the inverted queries until later.

The next stage is to build the same query as above, but adding another key column. You can exclude the values that appeared in the first query. Progressing this way will give you some quick examples
of differences that you can explore to see if they are good differences or new bugs. I’ll provide some code examples in due course.

Another approach is to group by a key column along with COUNT(*). MINUSing the new set will tell you where you have keys that match but with a different number of rows from that in the original.

Will continue this article when I have time.

After

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.