An ETL batch should not fail because of minor source data errors. In data warehousing it is assumed that there will be data quality issues. These should be dealt with. If they are not handled properly the batch may break on flaws which have no importance in reporting terms.
Oracle’s DML Error Logging feature has been around for a while and can prevent batch failures by allowing a sql statement to complete instead of rolling back. The bad rows are written to a special error log table, complete with details of the error It may be possible for batch code to fix the data in the error log and write it to the correct location. At a minimum the log will help in finding out what went wrong.
When thinking about this, it’s important to start from the point of view of the business. Over a period of time, company policy as to what happens when a batch task goes red has been evolved and refined. If you as a developer make a change which stops tasks going red, then in practical terms you have changed company policy. Unless you have a mandate to do that, the goal will be to make sure that when there is an error – even though in future the batch will now continue – everything else happens as it did before. Escalation procedures must take their usual course. but there will now be more information available (see example below) to trace the problem back to its origin. The original task will not go to a failed state any more, so you will need to create a new one which does. This will be a cyclical job that checks all the error logs for new rows and goes red when if finds any.
This technology makes it possible to improve management of source data problems. Given that faulty data is now stored in the log tables, reporting to management on these problems will be quite easy. Columns added to the table could allow management to “sign off” problems when they are happy that the problem has been resolved and won’t come back.
A final point which will help to convince management that this approach is worth adopting is it that it is not a “switch” which will suddenly become active across the whole system. DML error logging is invoked in the sql statement. with the “ON ERROR LOG TO” phrase. This means that you aren’t committed to a “big bang” implementation. There will be no effect on any sql which doesn’t have the logging clause. Unless you are doing a complete redesign of your batch, it’s hard to see why you would want to adopt a big bang approach and log every sql statement, since most of them don’t cause any problems. The errors are likely to be concentrated on a few source systems and code fragments; fix them and you’ve dealt with 80% of the problems. You can probably deal with the the rest as they crop up.
More Detailed Considerations
Error logging gives you the benefit of a batch which continues to execute while you decide whether the results will be acceptable, or whether you need to stop it, fix the problem and rerun. There may be some debate about this if there are 10 missing rows, or 100, but if 10,000 rows are faulty it’s probably a catastrophic failure. You can set the catastrophic failure level with the REJECT LIMIT clause in the LOG ERROR TO clause.
The cyclical job will go red after the first error and stop cycling, so you will not be notified of any failures after that. If this is an issue, there are various ways of dealing with it. One would be a report which queries all the logs, and maps each error back to the originating task. You then have a complete picture of what’s gone wrong.
A longer-term benefit of DML error logging is that you will be able to improve the design of the tables that receive source system data. The objection to adding constraints or making columns non-nullable is that you will break the process and reporting will not be available until the data is fixed. With a REJECT LIMIT UNLIMITED
Example (no liability accepted, please develop your own code and do not copy from this sample)
DROP TABLE T1;
DROP TABLE ERR_LOG_T1
–For the following table:
CREATE TABLE SYSTEM.T1
( C1 VARCHAR2(1 BYTE),
–The DML error log is created using an Oracle package as shown:
dml_table_name => 'T1'
,err_log_table_name => 'ERR_LOG_T1'
--err_log_table_owner IN VARCHAR2 := NULL,
--err_log_table_space IN VARCHAR2 := NULL,
--skip_unsupported IN BOOLEAN := FALSE
–which creates the following the DML error log
CREATE TABLE SYSTEM.ERR_LOG_T1
( ORA_ERR_NUMBER$ NUMBER,
ORA_ERR_MESG$ VARCHAR2(2000 BYTE),
ORA_ERR_ROWID$ UROWID (4000),
ORA_ERR_OPTYP$ VARCHAR2(2 BYTE),
ORA_ERR_TAG$ VARCHAR2(2000 BYTE),
C1 VARCHAR2(4000 BYTE),
C2 VARCHAR2(4000 BYTE)
COMMENT ON TABLE SYSTEM.ERR_LOG_T1 IS 'DML Error Logging table for T1';
–The DML error log is a conventional table so columns can be added to provide tracing information.
--this column updated to 'Y' at beginning of batch so that new rows can be identified--
ALTER TABLE SYSTEM.ERR_LOG_T1
ADD IS_PRIOR_BATCH_ERROR CHAR(1) DEFAULT('N') NOT NULL
CHECK(IS_PRIOR_BATCH_ERROR IN('Y', 'N'));
--capture session number of sql command; will identify task if it logs its session number--
ALTER TABLE SYSTEM.ERR_LOG_T1
ADD SESSION_NUMBER INTEGER DEFAULT(SYS_CONTEXT('USERENV', 'SID')) NOT NULL;
--record date/time information on error; use date/sysdate instead of timestamp if all systems are local--
ALTER TABLE SYSTEM.ERR_LOG_T1
ADD WRITE_DATE_TIME TIMESTAMP(4) WITH LOCAL TIME ZONE DEFAULT(CURRENT_TIMESTAMP) NOT NULL;
–Create a package to demonstrate error logging–
create or replace package package1 as
create or replace package body package1 as
procedure writerror as
-- TODO: Implementation required for procedure PACKAGE1.writerror
insert into T1(C1, C2)
--force a DML error--
select 'E', 1/0 from dual
--log the error please see Oracle documentation on reject limit and maximum size of variable to hold stack data etc.--
log errors into err_log_t1 (cast(dbms_utility.format_call_stack() as varchar2(4000))) reject limit 10;
–Test error logging:
select * from err_log_t1;