{Next Article in Series}
Having found out a bit more about the way OPENROWSET works, it seems to me that if you are using a format file for an ad hoc INSERT of file data this is the only way to go. The rest of the statement works like standard SQL, so you have better control (compared with BULK INSERT) over which columns are selected and which are written. You can easily create expressions in the SELECT list. This avoids the complexity of editing the format file. One benefit it doesn’t offer is a quick solution to problems with file header rows; the whole dataset is assembled by the OPENROWSET expression and you can’t apply filters there.
I created a simple tab-delimited file from the following query:
SELECT name, system_type_id, user_type_id FROM sys.types
The format file (tech reference here) was as follows:
12.0
3
1 SQLCHAR 0 256 "\t" 1 name Latin1_General_CI_AS
2 SQLCHAR 0 12 "\t" 2 system_type_id ""
3 SQLCHAR 0 12 "\r\n" 3 user_type_id ""
Here’s the OPENROWSET statement (unlike BULK INSERT, with OPENROWSET you can select the data and check it before writing it to a table):
SELECT *
FROM OPENROWSET
(
BULK '{data file path from database server}'
,FORMATFILE = '{format file path from database server}'
,CODEPAGE = 'RAW'
,MAXERRORS = 1
,ERRORFILE = '{error file path from database server}'
/*
Bulk Option CHECK_CONSTRAINTS not used as default behaviour of OPENROWSET is to test constraints
see http://technet.microsoft.com/en-us/library/ms186247%28v=SQL.105%29.aspx
,CHECK_CONSTRAINTS
*/
) A
You’ll note that we can incorporate the “<bulk options>” into the OPENROWSET statement. That’s why I don’t think there’s any need to use BULK INSERT. The position of the options relative to each other does matter – BULK and FORMATFILE have to be first, as the documentation shows if you read it carefully. If they aren’t you’ll get an error.
The error file path determines where it should be written. If a file exists at that location you’ll get an error. Messages are fairly self-explanatory provided you read them to the end. The MAXERRORS function relates to the number of bad rows that will be ignored. It’s not at the level of sophistication that you get with Oracle’s LOG ERRORS TO, but it may be worth a look. Please note also the code comment relating to the treatment of constraints.