8
SQL*Loader: Conventional and Direct Path Loads

This chapter describes SQL*Loader's conventional and direct path load methods. The following topics are covered:

For an example of loading with using the direct path load method, see "Case 6: Loading Using the Direct Path Load Method" on page 4-20. The other cases use the conventional path load method.

Note: If you are using Trusted Oracle, see the Trusted Oracle documentation for information about using SQL*Loader in that environment.

Data Loading Methods

SQL*Loader provides two methods for loading data:

Conventional path loads execute SQL INSERT statement(s) to populate table(s) in an Oracle database. Direct path loads eliminate much of the Oracle database overhead by formatting Oracle data blocks and writing the data blocks directly to the database files. The direct load, therefore, does not compete with other users for database resources so it can usually load data at nearly disk speed. Certain considerations, inherent to this method of access to database files, such as restrictions, security and backup implications, are discussed in this chapter.

Conventional Path Loads

Conventional path loads (the default) use the SQL INSERT statement and a bind array buffer to load data into database tables. This method is used by all Oracle tools and applications.

When SQL*Loader performs a conventional path load, it competes equally with all other processes for buffer resources. This can slow the load significantly. Extra overhead is added as SQL commands are generated, passed to Oracle, and executed.

Oracle looks for partially filled blocks and attempts to fill them on each insert. Although appropriate during normal use, this can slow bulk loads dramatically.

Conventional Path Loading of a Single Partition

By definition, conventional path uses SQL INSERT statements. For conventional path loading of a single partition, SQL*Loader makes use of the partition-extended syntax of the INSERT statement which has the following form:

INSERT into table T partition (P) VALUES ...

The SQL layer of the ORACLE kernel determines if the row being inserted maps to the specified partition. If the row does not map to the partition, the row is rejected, and the loader log file records an appropriate error message.

When to Use a Conventional Path Load

Because the direct path is faster than the conventional path, it is desirable to use the direct path. But there are times when the conventional path is preferred, or when the direct path cannot be used due to restrictions. You should use the conventional path in the following situations:

When accessing an indexed table concurrently with the load, or when applying inserts or updates to a non-indexed table concurrently with the load.

To use the direct path (excepting parallel loads), SQL*Loader must have exclusive write access to the table and exclusive read-write access to any indexes.

When loading data with SQL*Net across heterogeneous platforms.

You cannot load data through the direct path with SQL*Net; unless both systems belong to the same family of computers, and both are using the same character set. Even then, load performance can be significantly impaired by network overhead.

When loading data into a clustered table.

Direct path does not support loading of clustered tables.

When loading a relatively small number of rows into a large indexed table.

On the direct path, the existing index is copied when it is merged with the new index keys. If the existing index is very large and the number of new keys is very small, then the index copy time can offset the time saved by loading the data with the direct path.

When loading a relatively small number of rows into a large table with referential and column-check integrity constraints.

Because these constraints cannot be applied to rows loaded on the direct path, they are disabled for the duration of the load. Then they are applied to the whole table when the load completes. The costs could outweigh the savings for a very large table and a small number of new rows.

When you want to apply SQL functions to data fields.

SQL functions are not available on the direct path. For more information on the SQL functions, see "Applying SQL Operators to Fields" on page 5-82.

Direct Path Loads

Instead of filling a bind array buffer and passing it to Oracle with a SQL INSERT command, a direct path load parses the input data according to the description given in the loader control file, converts the data for each input field to its corresponding Oracle column datatype, and builds a column array structure (an array of <length, data> pairs.) The loader then uses the column array structure to format Oracle data blocks and build index keys. The newly formatted database blocks are then written directly to the database (multiple blocks per I/O request using asynchronous writes if the host platform supports asynch I/O.)

Internally, multiple buffers are used for the formatted blocks. While one buffer is being filled, one or more buffers are being written if asynchronous I/O is available on the host platform. Overlapping computation with I/O increases load performance.

Figure 8-1 on page 8 - 5 shows how conventional and direct path loads perform database writes.

Figure 8-1: Database Writes on Direct Path and Conventional Path

Direct Path Loading of a Partitioned Table

When loading a partitioned table, the direct path load engine in the Oracle kernel partitions the rows, and maintains indexes (which can also be partitioned.) Note that direct path loading of a partitioned table can be quite resource intensive for tables with many partitions.

Direct Path Loading of a Single Partition

When loading a single partition of a partitioned table, the direct path load engine in the Oracle kernel partitions the rows and rejects any rows which do not map to the partition specified in the loader control file. Local index partitions which correspond to the data partition being loaded are maintained by the loader. Global indexes are not maintained on single partition direct path loads.

While loading a partition of a partitioned table, DML operations on, and direct path loads of other partitions in the table are allowed.

Although direct path loads minimize the database processing, several calls to the Oracle server are made at the beginning and end of the load to initialize and finish the load, respectively. The necessary DML locks are required during load initialization, and released when the load finishes. Also, during the load the following operations occur: index keys are built and put into a sort, space management routines are used to get new extents when needed and to adjust the high-water mark for a data save point. The high-water mark is described in "Data Saves" on page 8-12.

Advantages of Direct Path Loads

The direct path method is faster than the conventional path for the following reasons:

Partial blocks are not used, so no reads are needed to find them and fewer writes are performed.
SQL INSERT commands are not executed by SQL*Loader, and therefore, processing load on the Oracle database is reduced.
The bind-array buffer is not used - formatted database blocks are written directly.
The direct path method calls on Oracle to lock tables and indexes at the start of the load and releases them when the load is finished. The conventional path calls Oracle once for each array of rows to process a SQL INSERT statement.
Direct path load uses multi-block asynchronous I/O on writes to the database files.
Processes using the direct path perform their own write I/O, instead of using Oracle's buffer cache in contention with other Oracle users. Therefore, the direct path does not contend for free buffers in the buffer cache.
The direct path's sorted indexes option allows you to pre-sort the data using high-performance sort routines that are native to your system or installation.
When the table to be loaded is empty, the pre-sorting option eliminates the sort and merge phases of index-building - the index is simply filled in as data arrives.
Protection against instance failure does not require redo log file entries during direct path loads. Therefore, no time is required to log the load in the following cases:
- Oracle is operating in NOARCHIVELOG mode
- the UNRECOVERABLE option of the load is set to TRUE
- the object being loaded has the NOLOG attribute set

See "Instance Recovery and Direct Path Loads" on page 8-13.

When to Use a Direct Path Load

You should use a direct path load in the following situations:

You have a large amount of data to load quickly. A direct path load can quickly load and index large amounts of data. It can also load data into either an empty or non-empty table.
You want to load data in PARALLEL for maximum performance. See "Parallel Data Loading Models" on page 8-26.
You want to load data in a character set that cannot be supported in your current session, or when the conventional conversion to the database character set would cause errors.

Restrictions on Using Direct PATH LOADS

In addition to the general load conditions described in "Conventional Path Load versus Direct Path Load" on page 3-16, the following conditions must be satisfied to use the direct path load method:

Tables are not clustered.
Tables to be loaded do not have any active transactions pending.

To check for this condition, use the SQL*DBA command MONITOR TABLE to find the object ID for the table(s) you want to load. Then use the command MONITOR LOCK to see if there are any locks on the table.

SQL strings are not used in the control file.

Restrictions on Direct Path Loading of a Single Partition

In addition to the above listed restrictions, loading of a single partition has the following restrictions:

The table which the partition is a member of cannot have any global indexes defined on it.
Enabled referential and check constraints on the table which the partition is a member of are not allowed.
Enabled triggers are not allowed.

Integrity Constraints

All integrity constraints are enforced during direct path loads, although not necessarily at the same time. NOT NULL constraints are enforced during the load. Records that fail these constraints are rejected.

UNIQUE constraints are enforced both during and after the load. A record which violates a UNIQUE constraint is not rejected (the record is not available in memory when the constraint violation is detected.)

Integrity constraints that depend on other rows or tables, such as referential constraints, are disabled before the direct path load and must be re-enabled afterwards. If REENABLE is specified, SQL*Loader can re-enable them automatically at the end of the load. When the constraints are re-enabled, the entire table is checked. Any rows that fail this check are reported in the specified error log. See the section in this chapter called "Direct Loads, Integrity Constraints, and Triggers".

Field Defaults on the Direct Path

DEFAULT column specifications defined in the database are not available when loading on the direct path. Fields for which default values are desired must be specified with the DEFAULTIF clause, described on "DEFAULTIF Clause" on page 5-74. If a DEFAULTIF clause is not specified, and the field is NULL, then a NULL value is inserted into the database.

Loading into Synonyms

You can load data into a synonym for a table during a the direct path load, but the synonym must point directly to a table. It cannot be a synonym for a view or a synonym for another synonym.

Exact Version Requirement

A SQL*Loader direct load can only be done for a database of the same version. For example, you cannot do a SQL*Loader Version 7.1.2 direct path load to load into a Oracle Version 7.1.3 database.

Using Direct Path Load

This section explains you how to use SQL*Loader's direct path load.

Setting Up for Direct Path Loads

To prepare the database for direct path loads, you must run the setup script, CATLDR.SQL to create the necessary views. You need only run this script once for each database you plan to do direct loads to. This script can be run during database installation if you know then that you will be doing direct loads.

Specifying a Direct Path Load

To start SQL*Loader in direct load mode, the parameter DIRECT must be set to TRUE on the command line or in the parameter file, if used, in the format:

DIRECT=TRUE

See "Case 6: Loading Using the Direct Path Load Method" on page 4-20 for an example.

Building Indexes

During a direct path load, performance is improved by using temporary storage. After each block is formatted, the new index keys are put to a sort (temporary) segment. The old index and the new keys are merged at load finish time to create the new index. The old index, sort (temporary) segment, and new index segment all require storage until the merge is complete. Then the old index and temporary segment are removed.

Note that, during a conventional path load, every time a row is inserted the index is updated. This method does not require temporary storage space, but it does add processing time.

The SINGLEROW Option

Performance on systems with limited memory can also be improved by using the SINGLEROW option. For more information see "SINGLEROW Option" on page 5-40.

Note: If, during a direct load, you have specified that the data is to be pre-sorted and the existing index is empty, a temporary segment is not required, and no merge occurs-the keys are put directly into the index. See "Maximizing Performance of Direct Path Loads" on page 8-15 for more information.

When multiple indexes are built, the temporary segments corresponding to each index exist simultaneously, in addition to the old indexes. The new keys are then merged with the old indexes, one index at a time. As each new index is created, the old index and the corresponding temporary segment are removed.

Index Storage Requirements

The formula for calculating the amount of space needed for storing the index itself can be found in the chapter "Managing Database Files" of Oracle8 Server Administrator's Guide. Remember that two indexes exist until the load is complete: the old index and the new index.

Temporary Segment Storage Requirements

The amount of temporary segment space needed for storing the new index keys (in bytes) can be estimated using the following formula:

1.3 * key_storage

where:

key_storage = (number_of_rows) *

     ( 10 + sum_of_column_sizes + number_of_columns )

The columns included in this formula are the columns in the index. There is one length byte per column, and 10 bytes per row are used for a ROWID and additional overhead.

The constant 1.3 reflects the average amount of extra space needed for sorting. This value is appropriate for most randomly ordered data. If the data arrives in exactly opposite order, twice the key-storage space is required for sorting, and the value of this constant would be 2.0. That is the worst case.

If the data is fully sorted, only enough space to store the index entries is required, and the value of this constant reduces to 1.0. See "Pre-sorting Data for Faster Indexing" on page 8-16 for more information.

Indexes Left in Index Unusable State

SQL*Loader will leave indexes in Index Unusable state when the data segment being loaded becomes more up-to-date than the index segments that index it.

Any SQL statement that tries to use an index that is in Index Unusable state returns an error. The following conditions cause the direct path option to leave an index or a partition of a partitioned index in Index Unusable state:

SQL*Loader runs out of space for the index, and cannot update the index.
The data is not in the order specified by the SORTED INDEXES clause.
There is an instance failure, or the Oracle shadow process fails while building the index.
There are duplicate keys in a unique index.
Data save points are being used, and the load fails or is terminated via a keyboard interrupt after a data save point occurred.

To determine if an index is in Index Unusable state, you can execute a simple query:

SELECT INDEX_NAME, STATUS

   FROM USER_INDEXES 

   WHERE TABLE_NAME = 'tablename';

To determine if an index partition is in unusable state,

SELECT INDEX_NAME, 

       PARTITION_NAME,

       STATUS FROM USER_IND_PARTITIONS

       WHERE STATUS != 'VALID';

If you are not the owner of the table, then search ALL_INDEXES or DBA_INDEXES instead of USER_INDEXES. For partitioned indexes, search ALL_IND_PARTITIONS and DBA_IND_PARTITIONS instead of USER_IND_PARTITIONS.

Data Saves

You can use data saves to protect against loss of data due to instance failure. All data loaded up to the last data save is protected against instance failure To continue the load after an instance failure, determine how many rows from the input file were processed before the failure, then use the SKIP option to skip those processed rows. If there were any indexes on the table, drop them before continuing the load, then recreate them after the load. See "Recovery" on page 8-13 for more information on media and instance failure.

Note: Indexes are not protected by a data save, because SQL*Loader does not build indexes until after data loading completes. (The only time indexes are built during the load is when pre-sorted data is loaded into an empty table - but these indexes are also unprotected.)

Using the ROWS Parameter

The parameter ROWS determines when data saves occur during a direct path load. The value you specify for ROWS is the number of rows you want SQL*Loader to read from the input file before saving inserts in the database.

The number of rows you specify for a data save is an approximate number. Direct loads always act on full data buffers that match the format of Oracle database blocks. So, the actual number of data rows saved is rounded up to a multiple of the number of rows in a database block.

SQL*Loader always reads the number of rows needed to fill a database block. Discarded and rejected records are then removed, and the remaining records are inserted into the database. So the actual number of rows inserted before a save is the value you specify, rounded up to the number of rows in a database block, minus the number of discarded and rejected records.

A data save is an expensive operation. The value for ROWS should be set high enough so that a data save occurs once every 15 minutes or longer. The intent is to provide an upper bound on the amount of work which is lost when an instance failure occurs during a long running direct path load. Setting the value of ROWS to a small number will have an adverse affect on performance.

Data Save Versus Commit

In a conventional load, ROWS is the number of rows to read before a commit. A direct load data save is similar to a conventional load commit, but it is not identical. The similarities are:

Data save will make the rows visible to other users
Rows cannot be rolled back after a data save

The major difference is that the indexes will be unusable (in Index Unusable state) until the load completes.

Recovery

SQL *Loader provides full support for data recovery when using the direct path option. There are two main types of recovery:

Media Recovery

is recovering from the loss of a database file. You must operate in ARCHIVELOG mode to recover after a file has been lost.

Instance Recovery

is recovering from a system failure in which in-memory data was changed (but not written to disk) before the failure occurred. Oracle can always recover from instance failures, even if redo log files are not archived.

See Oracle8 Server Administrator's Guide for more information about recovery.

Instance Recovery and Direct Path Loads

Because SQL*Loader writes directly to the database files, all rows inserted up to the last data save will automatically be present in the database files if the instance is restarted. Changes do not need to be recorded in the redo log file to make instance recovery possible.

If an instance failure occurs, the indexes being built may be left in Index Unusable state. Indexes which are Unusable must be re-built before using the table or partition. See "Indexes Left in Index Unusable State" on page 8-11 for more information on how to determine if an index has been left in Index Unusable state.

Media Recovery and Direct Path Loads

If redo log file archiving is enabled (you are operating in ARCHIVELOG mode), SQL*Loader logs loaded data when using the direct path, making media recovery possible. If redo log archiving is not enabled (you are operating in NOARCHIVELOG mode), then media recovery is not possible.

To recover a database file that was lost while it was being loaded, use the same method that you use to recover data loaded with the conventional path:

Restore the most recent backup of the affected database file.
Recover the tablespace using the RECOVER command. (See Oracle8 Server Administrator's Guide for more information on the RECOVER command.)

Loading LONG Data Fields

Data that is longer than SQL*Loader's maximum buffer size can be loaded on the direct path with either the PIECED option or by specifying the number of READBUFFERS. This section describes those two options.

Loading Data as PIECED

The data can be loaded in sections with the pieced option if it is the last column of the logical record. The syntax for this specification is given "High-Level Syntax Diagrams" on page 5-5.

Declaring a column as PIECED informs the direct path loader that the field may be processed in pieces, one buffer at once.

The following restrictions apply when declaring a column as PIECED:

This option is only valid on the direct path.
Only one field per table may be PIECED.
The PIECED field must be the last field in the logical record.
The PIECED field may not be used in any WHEN, NULLIF, or DEFAULTIF clauses.
The PIECED field's region in the logical record must not overlap with any other field's region.
The PIECED corresponding database column may not be part of the index.
It may not be possible to load a rejected record from the bad file if it contains a PIECED field.

For example, a PIECED filed could span 3 records. SQL*Loader loads the piece from the first record and then reuses the buffer for the second buffer. After loading the second piece, the buffer is reused for the third record. If an error is then discovered, only the third record is placed in the bad file because the first two records no longer exist in the buffer. As a result, the record in the bad file would not be valid.

Using the READBUFFERS Keyword

For data that is not divided into separate sections, or not in the last column, READBUFFERS can be specified. With READBUFFERS a buffer transfer area can be allocated that is large enough to hold the entire logical record at one time.

READBUFFERS specifies the number of buffers to use during a direct path load. (A LONG can span multiple buffers.) The default value is four buffers. If the number of read buffers is too small, the following error results:

ORA-02374 ... No more slots for read buffer queue

Note: Do not specify a value for READBUFFERS unless it becomes necessary, as indicated by ORA-2374. Values of READBUFFERS that are larger than necessary do not enhance performance. Instead, higher values unnecessarily increase system overhead.

Maximizing Performance of Direct Path Loads

You can control the time and temporary storage used during direct path loads.

To minimize time:

Pre-allocate storage space.
Pre-sort the data.
Perform infrequent data saves.
Disable archiving of redo log files.

To minimize space:

When sorting data before the load, sort data on the index that requires the most temporary storage space.
Avoid index maintenance during the load.

Pre-allocating Storage for Faster Loading

SQL*Loader automatically adds extents to the table if necessary, but this process takes time. For faster loads into a new table, allocate the required extents when the table is created.

To calculate the space required by a table, see the chapter "Managing Database Files" in Oracle8 Server Administrator's Guide. Then use the INITIAL or MINEXTENTS clause in the SQL command CREATE TABLE to allocate the required space.

Another approach is to size extents large enough so that extent allocation is infrequent.

Pre-sorting Data for Faster Indexing

You can improve the performance of direct path loads by pre-sorting your data on indexed columns. Pre-sorting minimizes temporary storage requirements during the load. Pre-sorting also allows you to take advantage of high-performance sorting routines that are optimized for your operating system or application.

If the data is pre-sorted and the existing index is not empty, then pre-sorting minimizes the amount of temporary segment space needed for the new keys. The sort routine appends each new key to the key list. Instead of requiring extra space for sorting, only space for the keys is needed. To calculate the amount of storage needed, use a sort factor of 1.0 instead of 1.3. For more information on estimating storage requirements, see "Temporary Segment Storage Requirements" on page 8-10.

If pre-sorting is specified and the existing index is empty, then maximum efficiency is achieved. The sort routines are completely bypassed, with the merge phase of index creation. The new keys are simply inserted into the index. Instead of having a temporary segment and new index existing simultaneously with the empty, old index, only the new index exists. So, temporary storage is not required, and time is saved.

SORTED INDEXES Statement

The SORTED INDEXES statement identifies the indexes on which the data is presorted. This statement is allowed only for direct path loads. See Chapter 5, "SQL*Loader Control File Reference" for the syntax, and see"Case 6: Loading Using the Direct Path Load Method" on page 4-20 for an illustration.

Generally, you specify only one index in the SORTED INDEXES statement because data that is sorted for one index is not usually in the right order for another index. When the data is in the same order for multiple indexes, however, all of the indexes can be specified at once.

All indexes listed in the SORTED INDEXES statement must be created before you start the direct path load.

Unsorted Data

If you specify an index in the SORTED INDEXES statement, and the data is not sorted for that index, then the index is left in Index Unusable state at the end of the load. The data is present, but any attempt to use the index results in an error. Any index which is left in Index Unusable state must be re-built after the load.

Multiple Column Indexes

If you specify a multiple-column index in the SORTED INDEXES statement, the data should be sorted so that it is ordered first on the first column in the index, next on the second column in the index, and so on.

For example, if the first column of the index is city, and the second column is last name; then the data should be ordered by name within each city, as in the following list:

Albuquerque      Adams

Albuquerque      Hartstein

Albuquerque      Klein

...         ...

Boston           Andrews

Boston           Bobrowski

Boston           Heigham

...              ...

Choosing the Best Sort Order

For the best overall performance of direct path loads, you should presort the data based on the index that requires the most temporary segment space. For example, if the primary key is one numeric column, and the secondary key consists of three text columns, then you can minimize both sort time and storage requirements by pre-sorting on the secondary key.

To determine the index that requires the most storage space, use the following procedure:

For each index, add up the widths of all columns in that index.
For a single-table load, pick the index with the largest overall width.
For each table in a multiple table load, identify the index with the largest, overall width for each table. If the same number of rows are to be loaded into each table, then again pick the index with the largest overall width. Usually, the same number of rows are loaded into each table.
If a different number of rows are to be loaded into the indexed tables in a multiple table load, then multiply the width of each index identified in step 3 by the number of rows that are to be loaded into that index. Multiply the number of rows to be loaded into each index by the width of that index and pick the index with the largest result.

Infrequent Data Saves

Frequent data saves resulting from a small ROWS value adversely affect the performance of a direct path load. Because direct path loads can be many times faster than conventional loads, the value of ROWS should be considerably higher for a direct load than it would be for a conventional load.

During a data save, loading stops until all of SQL*Loader's buffers are successfully written. You should select the largest value for ROWS that is consistent with safety. It is a good idea to determine the average time to load a row by loading a few thousand rows. Then you can use that value to select a good value for ROWS.

For example, if you can load 20,000 rows per minute, and you do not want to repeat more than 10 minutes of work after an interruption, then set ROWS to be 200,000 (20,000 rows/minute * 10 minutes).

Minimizing Use of the Redo Log

One way to speed a direct load dramatically is to minimize use of the redo log. There are three ways to do this. You can disable archiving, you can specify that the load is UNRECOVERABLE, or you can set the NOLOG attribute of the objects being loaded. This section discusses all methods.

Disable Archiving

If media recovery is disabled, direct path loads do not generate full image redo.

Specifying UNRECOVERABLE

Use UNRECOVERABLE to save time and space in the redo log file. An UNRECOVERABLE load does not record loaded data in the redo log file, instead, it generates invalidation redo. Note that UNRECOVERABLE applies to all objects loaded during the load session (both data and index segments.)

Therefore, media recovery is disabled for the loaded table, although database changes by other users may continue to be logged.

Note: Because the data load is not logged, you may want to make a backup of the data after loading.

If media recovery becomes necessary on data that was loaded with the UNRECOVERABLE phrase, the data blocks that were loaded are marked as logically corrupted.

To recover the data, drop and re-create the data. It is a good idea to do backups immediately after the load to preserve the otherwise unrecoverable data.

By default, a direct path load is RECOVERABLE. See "Data Definition Language (DDL) Syntax" on page 5-4 for information on RECOVERABLE and UNRECOVERABLE.

NOLOG Attribute

If a data or index segment has the NOLOG attribute set, then full image redo logging is disabled for that segment (invalidation redo is generated.) Use of the NOLOG attribute allows a finer degree of control over the objects which are not logged.

Avoiding Index Maintenance

For both the conventional path and the direct path, SQL*Loader maintains all existing indexes for a table.

Index maintenance can be avoided by using one of the following methods:

Drop the indexes prior to the beginning of the load.
Mark selected indexes or index partitions as Index Unusable prior to the beginning of the load and use the SKIP_UNUSABLE_INDEXES option.
Use the SKIP_INDEX_MAINTENANCE option (direct path only, use with caution!)

Avoiding index maintenance saves temporary storage while using the direct load method. Avoiding index maintenance minimizes the amount of space required during the load, for the following reasons:

You can build indexes one at a time, reducing the amount of sort (temporary) segment space that would otherwise be needed for each index.
Only one index segment exists when an index is built, instead of the three segments that temporarily exist when the new keys are merged into the old index to make the new index.

Avoiding index maintenance is quite reasonable when the number of rows to be loaded is large compared to the size of the table. But if relatively few rows are added to a large table, then the time required to re-sort the indexes may be excessive. In such cases, it is usually better to make use of the conventional path, or use the SINGLEROW option.

Direct Loads, Integrity Constraints, and Triggers

With the conventional path, arrays of rows are inserted with standard SQL INSERT statements - integrity constraints and insert triggers are automatically applied. But when loading data with the direct path, some integrity constraints and all database triggers are disabled. This section discusses the implications of using direct path loads with respect to these features.

Integrity Constraints

During a direct path load, some integrity constraints are automatically disabled. Others are not. For a description of the constraints, see the chapter "Maintaining Data Integrity" of Oracle8 Server Application Developer's Guide.

Enabled Constraints

The constraints that remain in force are:

not null
unique
primary keys (unique-constraints on not-null columns)

Not Null constraints are checked at column array build time. Any row that violates this constraint is rejected. Unique constraints are verified when indexes are rebuilt at the end of the load. The index will be left in Index Unusable state if a violation is detected. See "Indexes Left in Index Unusable State" on page 8-11.

Disabled Constraints

The following constraints are disabled:

check constraints
referential constraints (foreign keys)

Reenable Constraints

When the load completes, the integrity constraints will be re-enabled automatically if the REENABLE clause is specified. The syntax for this clause is as follows:

The optional keyword DISABLED_CONSTRAINTS is provided for readability. If the EXCEPTIONS clause is included, the table must already exist and, you must be able to insert into it. This table contains the ROWIDs of all rows that violated one of the integrity constraints. It also contains the name of the constraint that was violated. See Oracle8 Server SQL Reference for instructions on how to create an exceptions table.

If the REENABLE clause is not used, then the constraints must be re-enabled manually. All rows in the table are verified then. If Oracle finds any errors in the new data, error messages are produced. The names of violated constraints and the ROWIDs of the bad data are placed in an exceptions table, if one is specified. See ENABLE in Oracle8 Server SQL Reference.

The SQL*Loader log file describes the constraints that were disabled, the ones that were re-enabled and what error, if any, prevented re-enabling of each constraint. It also contains the name of the exceptions table specified for each loaded table.

Attention: As long as bad data remains in the table, the integrity constraint cannot be successfully re-enabled.

Suggestion: Because referential integrity must be reverified for the entire table, performance may be improved by using the conventional path, instead of the direct path, when a small number of rows are to be loaded into a very large table.

Database Insert Triggers

Table insert triggers are also disabled when a direct path load begins. After the rows are loaded and indexes rebuilt, any triggers that were disabled are automatically re-enabled. The log file lists all triggers that were disabled for the load. There should not be any errors re-enabling triggers.

Unlike integrity constraints, insert triggers are not reapplied to the whole table when they are enabled. As a result, insert triggers do not fire for any rows loaded on the direct path. When using the direct path, the application must ensure that any behavior associated with insert triggers is carried out for the new rows.

Replacing Insert Triggers with Integrity Constraints

Applications commonly use insert triggers to implement integrity constraints. Most of these application insert triggers are simple enough that they can be replaced with Oracle's automatic integrity constraints.

When Automatic Constraints Cannot Be Used

Sometimes an insert trigger cannot be replaced with Oracle's automatic integrity constraints. For example, if an integrity check is implemented with a table lookup in an insert trigger, then automatic check constraints cannot be used, because the automatic constraints can only reference constants and columns in the current row. This section describes two methods for duplicating the effects of such a trigger.

Preparation

Before either method can be used, the table must be prepared. Use the following general guidelines to prepare the table:

Before the load, add a one-character column to the table that marks rows as "old data" or "new data".
Let the value of null for this column signify "old data", because null columns do not take up space.
When loading, flag all loaded rows as "new data" with SQL*Loader's CONSTANT clause.

After following this procedure, all newly loaded rows are identified, making it possible to operate on the new data without affecting the old rows.

Using An Update Trigger

Generally, you can use a database update trigger to duplicate the effects of an insert trigger. This method is the simplest. It can be used whenever the insert trigger does not raise any exceptions.

Create an update trigger that duplicates the effects of the insert trigger.

Copy the trigger. Change all occurrences of "new.column_name" to "old.column_name".

Replace the current update trigger, if it exists, with the new one
Update the table, changing the "new data" flag to null, thereby firing the update trigger
Restore the original update trigger, if there was one

Note: Depending on the behavior of the trigger, it may be necessary to have exclusive update access to the table during this operation, so that other users do not inadvertently apply the trigger to rows they modify.

Duplicating the Effects of Exception Conditions

If the insert trigger can raise an exception, then more work is required to duplicate its effects. Raising an exception would prevent the row from being inserted into the table. To duplicate that effect with an update trigger, it is necessary to mark the loaded row for deletion.

The "new data" column cannot be used for a delete flag, because an update trigger cannot modify the column(s) that caused it to fire. So another column must be added to the table. This column marks the row for deletion. A null value means the row is valid. Whenever the insert trigger would raise an exception, the update trigger can mark the row as invalid by setting a flag in the additional column.

Summary: When an insert trigger can raise an exception condition, its effects can be duplicated by an update trigger, provided:

two columns (which are usually null) are added to the table
the table can be updated exclusively (if necessary)

Using a Stored Procedure

The following procedure always works, but it is more complex to implement. It can be used when the insert trigger raises exceptions. It does not require a second additional column; and, because it does not replace the update trigger, and it can be used without exclusive access to the table.

Create a stored procedure that duplicates the effects of the insert trigger. Follow the general outline given below. (For implementation details, see PL/SQL User's Guide and Reference for more information about cursor management.)
- declare a cursor for the table, selecting all the new rows
- open it and fetch rows, one at a time, in a processing loop
- perform the operations contained in the insert trigger
- if the operations succeed, change the "new data" flag to null
- if the operations fail, change the "new data" flag to "bad data"
Execute the stored procedure using an administration tool such as Server Manager.
After running the procedure, check the table for any rows marked "bad data".
Update or remove the bad rows.
Re-enable the insert trigger.

Permanently Disabled Triggers & Constraints

SQL*Loader needs to acquire several locks on the table to be loaded to disable triggers and constraints. If a competing process is enabling triggers or constraints at the same time that SQL*Loader is trying to disable them for that table, then SQL*Loader may not be able to acquire exclusive access to the table.

SQL*Loader attempts to handle this situation as gracefully as possible. It attempts to re-enable disabled triggers and constraints before exiting. However, the same table-locking problem that made it impossible for SQL*Loader to continue may also have made it impossible for SQL*Loader to finish enabling triggers and constraints. In such cases, triggers and constraints will remain permanently disabled until they are manually enabled.

Although such a situation is unlikely, it is possible. The best way to prevent it is to make sure that no applications are running that could enable triggers or constraints for the table, while the direct load is in progress.

If a direct load is aborted due to failure to acquire the proper locks, carefully check the log. It will show every trigger and constraint that was disabled, and each attempt to re-enable them. Any triggers or constraints that were not re-enabled by SQL*Loader should be manually enabled with the ENABLE clause described in Oracle8 Server SQL Reference.

Alternative: Concurrent Conventional Path Loads

If triggers or integrity constraints pose a problem, but you want faster loading, you should consider using concurrent conventional path loads. That is, use multiple load sessions executing concurrently on a multiple-CPU system. Split the input datafiles into separate files on logical record boundaries, and then load each such input datafile with a conventional path load session. The resulting load has the following attributes:

It is faster than a single conventional load on a multiple-CPU system, but probably not as fast as a direct load.
Triggers fire, integrity constraints are applied to the loaded rows, and indexes are maintained via the standard DML execution logic.

Parallel Data Loading Models

This section discusses three basic models of concurrency which can be used to minimize the elapsed time required for data loading:

concurrent conventional path loads
inter-segment concurrency with direct path load method
intra-segment concurrency with direct path load method

Concurrent Conventional Path Loads

Using multiple conventional path load sessions executing concurrently is discussed in the previous section. This technique can be used to load the same or different objects concurrently with no restrictions.

Inter-Segment Concurrency with Direct Path

Inter-segment concurrency can be used for concurrent loading of different objects. This technique can be applied for concurrent direct path loading of different tables, or to concurrent direct path loading of different partitions of the same table.

When direct path loading a single partition, the following items should be considered:

local indexes can be maintained by the load.
global indexes cannot be maintained by the load
referential integrity and check constraints must be disabled
triggers must be disabled
the input data should be partitioned (otherwise many records will be rejected which adversely affects performance.)

Intra-Segment Concurrency with Direct Path

SQL*Loader permits multiple, concurrent sessions to perform a direct path load into the same table, or into the same partition of a partitioned table. Multiple SQL*Loader sessions improve the performance of a direct path load given the available resources on your system.

This method of data loading is enabled by setting both the DIRECT and the PARALLEL option to TRUE, and is often referred to as a "parallel direct path load."

It is important to realize that parallelism is user managed, setting the PARALLEL option to TRUE only allows multiple concurrent direct path load sessions.

Restrictions on Parallel Direct Path Loads

The following restrictions are enforced on parallel direct path loads:

neither local or global indexes can be maintained by the load
referential integrity and check constraints must be disabled
triggers must be disabled
Rows can only be appended. REPLACE, TRUNCATE, and INSERT cannot be used (this is due to the individual loads not being coordinated.) If you must truncate a table before a parallel load, you must do it manually.

If a parallel direct path load is being applied to a single partition, it is best that the data is pre-partitioned (otherwise the overhead of record rejection due to a partition mismatch slows down the load.)

Initiating Multiple SQL*Loader Sessions

Each SQL*Loader session takes a different datafile as input. In all sessions executing a direct load on the same table, you must set PARALLEL to TRUE. The syntax is:

PARALLEL can be specified on the command line or in a parameter file. It can also be specified in the control file with the OPTIONS clause.

For example, to invoke three SQL*Loader direct path load sessions on the same table, you would execute the following commands at the operating system prompt:

SQLLOAD USERID=SCOTT/TIGER CONTROL=LOAD1.CTL DIRECT=TRUE PARALLEL=TRUE

SQLLOAD USERID=SCOTT/TIGER CONTROL=LOAD2.CTL DIRECT=TRUE PARALLEL=TRUE

SQLLOAD USERID=SCOTT/TIGER CONTROL=LOAD3.CTL DIRECT=TRUE PARALLEL=TRUE

The previous commands must be executed in separate sessions, or if permitted on your operating system, as separate background jobs. Note the use of multiple control files. This allows you to be flexible in specifying the files to use for the direct path load (see the example of one of the control files below).

Note: Indexes are not maintained during a parallel load. Any indexes must be (re)created or rebuilt manually after the load completes. You can use the parallel index creation or parallel index rebuild feature to speed the building of large indexes after a parallel load.

When you perform a PARALLEL load, SQL*Loader creates temporary segments for each concurrent session and then merges the segments upon completion. The segment created from the merge is then added to the existing segment in the database above the segment's high water mark. The last extent used of each segment for each loader session is trimmed of any free space before being combined with the other extents of the SQL*Loader session.

Options Keywords for Parallel Direct Path Loads

When using parallel direct path loads, options are available for specifying attributes of the temporary segment to be allocated by the loader.

Specifying Temporary Segments

It is recommended that each concurrent direct path load session use files located on different disks to allow for the maximum I/O throughput. Using the FILE keyword of the OPTIONS clause you can specify the filename of any valid datafile in the tablespace of the object (table or partition) being loaded. The following example illustrates a portion of one of the control files used for the SQL*Loader sessions in the previous example:

LOAD DATA

INFILE 'load1.dat'

INSERT INTO TABLE emp

OPTIONS(FILE='/dat/data1.dat')

(empno POSITION(01:04) INTEGER EXTERNAL NULLIF empno=BLANKS

...

You can specify the database file from which the temporary segments are allocated with the FILE keyword in the OPTIONS clause for each object (table or partition) in the control file. You can also specify the FILE parameter on the command line of each concurrent SQL*Loader session, but then it will globally apply to all objects being loaded with that session.

Using the FILE Keyword

The FILE keyword in Oracle8 has the following restrictions for direct path parallel loads:

For non-partitioned tables:
the specified file must be in the tablespace of the table being loaded
For partitioned tables, single partition load:
the specified file must be in the tablespace of the partition being loaded
For partitioned tables, full table load:
the specified file must be in the tablespace of all partitions being loaded that is, all partitions must be in the same tablespace.

Using the STORAGE Keyword

The STORAGE keyword can be used to specify the storage attributes of the temporary segment(s) allocated for a parallel direct path load. If the STORAGE keyword is not used, the storage attributes of the segment containing the object (table, partition) being loaded are used.

OPTIONS				(STORAGE=(MINEXTENTS n1 MAXEXTENTS n2 INITIAL n3[K|M]

					NEXT n4[K|M] PCTINCREASE n5)

For example, the following STORAGE clause could be used:

OPTIONS (STORAGE=(INITIAL 100M NEXT 100M PCTINCREASE 0))

The STORAGE keyword can only be used in the control file, and not on the command line. Use of the STORAGE keyword to specify anything other than PCTINCREASE of 0, and INITIAL or NEXT values is strongly discouraged (and may be silently ignored in the future.)

Enabling Constraints After a Parallel Direct Path Load

Constraints and triggers must be enabled manually after all data loading is complete.

General Performance Improvement Hints

This section gives a few guidelines which can help to improve the performance of a load. If you must use a certain feature to load your data, by all means do so. But if you have control over the format of the data to be loaded, here are a few hints which can be used to improve load performance:

Make logical record processing efficient:
- use one-to-one mapping of physical records to logical records (avoid continueif, concatenate)
- make it easy for the software to figure out physical record boundaries. Use the file processing option string "FIX nnn" or "VAR". If you use the default (stream mode) on most platforms (e.g. UNIX, NT) the loader has to scan each physical record for the record terminator (newline character.)
Make field setting efficient. Field setting is the process of mapping "fields" in the datafile to their corresponding columns in the table being loaded. The mapping function is controlled by the description of the fields in the control file. Field setting (along with data conversion) is the biggest consumer of CPU cycles for most loads.
- avoid delimited fields; use positional fields. If you use delimited fields, the loader must scan the input data to find the delimiters. If you use positional fields, field setting becomes simple pointer arithmetic (very fast!)
- Don't trim whitespace if you don't need to (use PRESERVE BLANKS.)
Make conversions efficient. There are several conversions that the loader does for you, character set conversion and datatype conversions. Of course, the quickest conversion is no conversion.
- Avoid character set conversions if you can. The loader supports four character sets: a) client character set (NLS_LANG of the client sqlldr process); b) datafile character set (usually the same as the client character set, but can be different); c) server character set; and d) server national character set. Performance is optimized if all character sets are the same. For direct path loads, it is best if the datafile character set and the server character set are the same. If the character sets are the same, character set conversion buffers are not allocated.
- Use single byte character sets if you can.
Use direct path loads.
Use "sorted indexes" clause.
Avoid unnecessary NULLIF and DEFAULTIF clauses. Each clause must be evaluated on each column which has a clause associated with it for EVERY row loaded.
Use parallel direct path loads and parallel index create when you can.

Media Recovery	is recovering from the loss of a database file. You must operate in ARCHIVELOG mode to recover after a file has been lost.
Instance Recovery	is recovering from a system failure in which in-memory data was changed (but not written to disk) before the failure occurred. Oracle can always recover from instance failures, even if redo log files are not archived.

8 SQL*Loader: Conventional and Direct Path Loads

Data Loading Methods

Conventional Path Loads

Conventional Path Loading of a Single Partition

When to Use a Conventional Path Load

Direct Path Loads

Figure 8-1: Database Writes on Direct Path and Conventional Path

Direct Path Loading of a Partitioned Table

Direct Path Loading of a Single Partition

Advantages of Direct Path Loads

When to Use a Direct Path Load

Restrictions on Using Direct PATH LOADS

Restrictions on Direct Path Loading of a Single Partition

Integrity Constraints

Field Defaults on the Direct Path

Loading into Synonyms

Exact Version Requirement

Using Direct Path Load

Setting Up for Direct Path Loads

Specifying a Direct Path Load

Building Indexes

The SINGLEROW Option

Index Storage Requirements

Temporary Segment Storage Requirements

Indexes Left in Index Unusable State

Data Saves

Using the ROWS Parameter

Data Save Versus Commit

Recovery

Instance Recovery and Direct Path Loads

Media Recovery and Direct Path Loads

Loading LONG Data Fields

Loading Data as PIECED

Using the READBUFFERS Keyword

Maximizing Performance of Direct Path Loads

Pre-allocating Storage for Faster Loading

Pre-sorting Data for Faster Indexing

SORTED INDEXES Statement

Unsorted Data

Multiple Column Indexes

Choosing the Best Sort Order

Infrequent Data Saves

Minimizing Use of the Redo Log

Disable Archiving

Specifying UNRECOVERABLE

NOLOG Attribute

Avoiding Index Maintenance

Direct Loads, Integrity Constraints, and Triggers

Integrity Constraints

Enabled Constraints

Disabled Constraints

Reenable Constraints

Database Insert Triggers

Replacing Insert Triggers with Integrity Constraints

When Automatic Constraints Cannot Be Used

Preparation

Using An Update Trigger

Duplicating the Effects of Exception Conditions

Using a Stored Procedure

Permanently Disabled Triggers & Constraints

Alternative: Concurrent Conventional Path Loads

Parallel Data Loading Models

Concurrent Conventional Path Loads

Inter-Segment Concurrency with Direct Path

Intra-Segment Concurrency with Direct Path

Restrictions on Parallel Direct Path Loads

Initiating Multiple SQL*Loader Sessions

Options Keywords for Parallel Direct Path Loads

Specifying Temporary Segments

Using the FILE Keyword

Using the STORAGE Keyword

Enabling Constraints After a Parallel Direct Path Load

General Performance Improvement Hints

8
SQL*Loader: Conventional and Direct Path Loads