{Previous Article in Series}

As a data warehouse developer you’ll probably have to load big files. Even if you’re not responsible for loads in PROD, you may well have to import data into the DEV database. There’s often an expectation that this shouldn’t take long, so if you hit any problems you could come under time pressure.

A toolkit and a methodical approach can help you to grind out a predictable result every time. I hope these notes may be some help. Running your import code directly in Transact-Sql is a good way of speeding up the find-fix cycle, even if you transfer the code into SSIS when deploying to PROD.

The BULK INSERT command in Transact-Sql isn’t ideal because it writes to a table. You really want to get the reading of the file working first. Dealing with column mappings and data type mismatches is friction that’s best avoided at this stage. If you wrap BULK INSERT with OPENROWSET as described in a previous blog. you can return rows to the screen instead.

You’ll need a format file to precisely control the way you read the data. I prefer the standard file – once you understand it, there doesn’t seem to be much gain from the extra complexity of the XML version. There’s a technical reference here

There are two ways to get a working draft of your format file. One is to hand-code it: you’ll find some notes on that in the last article of this series here; or you can use the bcp utility from the command line. This exe should have be on your machine as part of your Sql Server client tools installation. The command to generate the file is as follows:

bcp [Database.Schema.Table Name} format nul -S {Host\Sql Server} -T -n -f {Format File Path\Name - extension is ".fmt"}

The “-S” argument matches the text in the “Server Name” box when you set the connection for a new query in Sql Server Management Studio. If you aren’t using Windows Authentication or have any other problems check bcp /? for help.

Bear in mind you now have a format file based on the database table; it can’t know what the column delimiters are in the file of course. You need to edit these manually. One point to note is that there doesn’t seem to be any special treatment of text qualifiers, you just treat them as part of the delimiter, which they are of course. In the case of a comma-delimited file I loaded recently the specification was "\",\"" with the last column in the row as "\"\r\n".

Now you can construct a database query like the following:

BULK '{Source File Path/Name}'
,FORMATFILE = '{Format File Path/Name}'
,ERRORFILE = '{Error File Path/Name}'
,MAXERRORS = {Integer}
Bulk Option CHECK_CONSTRAINTS not used as default behaviour of OPENROWSET is to test constraints
) A

Please note that there mustn’t be a file at the error file path/name that you specify or the query will error. Now you can run the query and if it errors you’ll have useful information right in front of you. You’ll also find that two files have been generated, one with a rather cryptic description of the error and location, the other with the rows that caused the error, up to the limit you specified. This file could be useful, but I prefer to compare the errors with neighbouring good rows. For that, you need to be able to scan the file rapidly and view a specific group of rows. Powershell comes in useful here, and in another post I’ll provide the script that I use for this purpose.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.