Category Archives: ETL

Sql Server Filtered Index as a Unique Constraint on a Specific Value

Indexes often double as constraints. Each combination of values must only appear once among the columns of an index defined as UNIQUE. The “filtered” index allows some refinement. Suppose that in one column you won’t allow a specific value to appear more than once, but you don’t mind other values being duplicated.

A good example would be the IS_LATEST_VERSION column in a data warehouse table. Continue reading Sql Server Filtered Index as a Unique Constraint on a Specific Value

Raising Errors in Transact-Sql

There seems to be quite a bit of confusion about the way errors should be raised in newer versions of Transact-Sql. The documentation for RAISERROR says: “New applications should use THROW instead.” It doesn’t say that RAISERROR is deprecated; when you look at the list of deprecated features what you find is that invoking RAISERROR without an error number is now deprecated, but it’s unclear whether the command itself will go. Continue reading Raising Errors in Transact-Sql

OPENROWSET and BULK INSERT (4)

{Previous Article in Series}
Here are some final notes on this topic. I’m going to describe an import I did recently, and hope this may help you decide where OPENROWSET and BULK INSERT could fit into your development activity and what tools you might use.

I was doing an ad hoc load of some data from a small csv file. There was no corresponding table in the database to store this data, and I don’t like doing work for the sake of it, so my first stop was the “Sql Server 2014 Import and Export Data” wizard. I was hoping it would be a simple matter for the wizard to build a table and read the data into it. It crashed with an error which wasn’t very specific. Continue reading OPENROWSET and BULK INSERT (4)

The INSERT statement and Column Defaults in Sql Server

When you INSERT into a table which has a column with a default value defined, and you omit that column from your statement, what will happen? The default will be written. What if you write an explicit Null (and the column is nullable)? In nearly all cases the Null is written.

You may not want this – it’s one of the cases where it might be nice if there were more than one type of Null, as some people advocated in the early days. Then we could distinguish a “known unknown” (© D. Rumsfeld) Continue reading The INSERT statement and Column Defaults in Sql Server

Sql Server Collation Cheat Sheet

This post is not written for an architect or DBA who has to choose a collation and wants an in-depth explanation, but for a developer (perhaps one with an error to fix) who needs a primer on the subject.

Put very simply, collations have to do with languages and with the rules used to compare characters. This is relevant in joins and sorting. You might think that a collation ought to be specified for the query; that in your sql statement you should determine the rules to use for comparing values. You can do that (although you’ll set it at column, not query level), but the columns in the tables have a collation property which is what the query uses by default. This is a convenience as it standardises comparisons and saves you having to specify the rules over and over again. It’s when the two conflict that you have an error which may make it necessary to override one or more default collations. Continue reading Sql Server Collation Cheat Sheet

OPENROWSET and BULK INSERT (3)

{Previous Article in Series}
{Next}
This is the promised post about using a Powershell script to inspect large files. It won’t be possible to open a very large file in Excel of course, and some text readers may struggle under the weight of data. This script is an alternative that runs under most if not all versions of Windows, and doesn’t need an installation or a licence. Continue reading OPENROWSET and BULK INSERT (3)

OPENROWSET and BULK INSERT (2)

{Previous Article in Series}
{Next}

As a data warehouse developer you’ll probably have to load big files. Even if you’re not responsible for loads in PROD, you may well have to import data into the DEV database. There’s often an expectation that this shouldn’t take long, so if you hit any problems you could come under time pressure.

A toolkit and a methodical approach can help you to grind out a predictable result every time. I hope these notes may be some help. Continue reading OPENROWSET and BULK INSERT (2)

OPENROWSET and BULK INSERT (1)

{Next Article in Series}
Having found out a bit more about the way OPENROWSET works, it seems to me that if you are using a format file for an ad hoc INSERT of file data this is the only way to go. The rest of the statement works like standard SQL, so you have better control (compared with BULK INSERT) over which columns are selected and which are written. Continue reading OPENROWSET and BULK INSERT (1)

File Imports and Sql Server Language Settings

I had a file (historical FTSE values) to load into a database for some development work so I started up the SSIS Import and Export Wizard to bring it in. There was no error but after three or four attempts it was still failing to bring any rows in. Although I’ve done a lot of ETL I’ve generally used utilities or import exes, so this seemed like a opportunity to get my head around BULK INSERT and bcp.  Continue reading File Imports and Sql Server Language Settings

Planning the use of Oracle DML Error Logging to make an ETL Batch more reliable

An ETL batch should not fail because of minor source data errors. In data warehousing it is assumed that there will be data quality issues. These should be dealt with. If they are not handled properly the batch may break on flaws which have no importance in reporting terms.

Oracle’s DML Error Logging feature has been around for a while and can prevent batch failures Continue reading Planning the use of Oracle DML Error Logging to make an ETL Batch more reliable