Complex T-Sql Column Expressions? You must try this!

There’s an outstanding video by Kendra Little called “5 T-Sql Features You’re Missing Out On” which you can reach from my Links (Topical) page under “Database > Transact-Sql”. It describes a special way of using the CROSS APPLY statement in T-SQL. This is a technique that can change the way you write Sql, or at least it can in cases where you have an expression that is used in several columns of a SELECT statement. She explains it very well so you should watch the video; I’ve added some examples below to expand a little on the explanation.

I literally can’t wait to try it out on a real project – well that’s not really true of course, I’ve been able to wait for about two days since I saw the video – but it is genuinely very, very interesting.

You may already know some of the other four techniques described (I was quite pleased to score 3.5 points overall, as I was acquainted with several of them and scored a half mark for having used CROSS APPLY, although not in the way Kendra describes); but even if only one of the features is new to you it’ll probably be worth watching, as these are the kind of things you can use day-by-day, not obscure methods you’ll have forgotten by the time you need them.

The examples below are very simplified and also slightly contrived. I hope they make the explanation easy to follow and that you can run them in Sql Server “as is” (provided your permissions are OK). As you can see, we’re returning four columns which all depend on the same calculation, plus the last one where a new value is derived from it. Even in this example, it’s verbose and over-complicated because of the repetition. A worse problem is maintenance: if there’s a change to the base calculation we’ll need to be sure that it changes everywhere – if not, the query will be inconsistent.

SELECT
DATEDIFF(day, modify_date, GETDATE()) As DaysSinceMod
,MIN(DATEDIFF(day, modify_date, GETDATE())) OVER (ORDER BY GETDATE()) As MinDays
,MAX(DATEDIFF(day, modify_date, GETDATE())) OVER (ORDER BY GETDATE()) As MaxDays
,AVG(DATEDIFF(day, modify_date, GETDATE())) OVER (ORDER BY GETDATE()) As AvgDays
,DATEDIFF(day, modify_date, GETDATE())/$7 As WeeksSinceMod
FROM
sys.all_objects ao

We can avoid the repetition with various combinations of CTEs (common table expressions, also known as “with clauses”) and derived tables, a.k.a. inline views. Unfortunately this complicates the structure of the query, which in turn means the execution plan might be affected.

SELECT Calc.DaysSinceMod
,Calc.WeeksSinceMod
,MIN(Calc.DaysSinceMod) OVER (ORDER BY GETDATE()) As MinDays
,MAX(Calc.DaysSinceMod) OVER (ORDER BY GETDATE()) As MaxDays
,AVG(Calc.DaysSinceMod) OVER (ORDER BY GETDATE()) As AvgDays
FROM
(
SELECT
DATEDIFF(day, modify_date, GETDATE()) As DaysSinceMod
/*still one repeated calculation. to eliminate it we'll have to nest the
derived tables or add a WITH clause to the query*/
,DATEDIFF(day, modify_date, GETDATE())/$7 As WeeksSinceMod
FROM
sys.all_objects ao
) Calc

What if we could avoid the repetition without complicating the structure of the query? Well, how about this:

SELECT Calc.DaysSinceMod
,Calc2.WeeksSinceMod
,MIN(Calc.DaysSinceMod) OVER (ORDER BY GETDATE()) As MinDays
,MAX(Calc.DaysSinceMod) OVER (ORDER BY GETDATE()) As MaxDays
,AVG(Calc.DaysSinceMod) OVER (ORDER BY GETDATE()) As AvgDays
FROM sys.all_objects ao
CROSS APPLY (SELECT DATEDIFF(day, modify_date, GETDATE()) As DaysSinceMod) Calc
CROSS APPLY (SELECT DaysSinceMod/$7 As WeeksSinceMod) Calc2

Please note that the lower CROSS APPLY clause is aware of the one above; this allows us to derive further values from the original calculation without cluttering up the column definitions by doing the calculation there. What effect will redesigning with CROSS APPLY have on the execution plan? I can’t take a position on that as I haven’t done enough testing. It doesn’t look like it would have much effect though. Note that the CROSS APPLY clause is similar a derived table: it has to have a SELECT, its values are like columns and have to be named, and the clause must be aliased like a derived table. The difference is that it doesn’t have a FROM clause, and without one it’s hard to see how it could have much effect on the execution plan.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.