8
Tuning ConText

This chapter contains information to consider when tuning the performance of ConText.

The topics discussed in this chapter are:

Indexing Tuning

See Also::

For general information about tuning Oracle8 databases, see Oracle8 Server Tuning.

For a complete discussion of query tuning in applications, see Oracle8 ConText Cartridge Application Developer's Guide.

Indexing Tuning

ConText indexing rates will vary as a function of host system speed, available memory, degree of parallelism, and the data composition (e.g., the number of unique tokens in the document set) for the column.

However, the following areas can be addressed to help improve indexing performance and ensure that ConText indexes are successfully created for text columns:

Temporary Segment and SORT_AREA_SIZE Size

When the Oracle indexes are created on the ConText index for a text table, temporary segments are used to sort the data in the indexes. The temporary segment that is used is the temporary segment for the CTXSYS user.

It is important to ensure that the temporary tablespace for CTXSYS is large enough to perform the sort. In general, the size of the temporary tablespace should be at least 25% of the size of the data being indexed.

However, the size required also depends on the value that has been specified for the SORT_AREA_SIZE initialization parameter. SORT_AREA_SIZE determines how much memory in the SGA is used to perform index sort, thereby affecting the amount of temporary tablespace needed.

See Also::

For more information about initialization parameters, see Oracle8 Server Administrator's Guide.

Memory Allocation

When defining engine preferences used for ConText indexing, you define the amount of memory allocated for indexing. The more indexing memory specified, the less fragmentation of the index occur and the fewer the number of writes to the database are required to flush the index from the memory cache to the index table.

The amount of memory allocated for indexing should be a function of the amount of real and virtual memory on the machine on which the ConText server is running.

To calculate this amount, determine the amount of real memory left over after accounting for all other processes, including ConText servers, running on the machine. The remaining real memory should be allocated in the policies used for indexing.

If you are running multiple ConText servers on a single machine, divide the available memory equally among the servers so that each server has sufficient memory for parallel indexing.

Note:

If your real memory usage on the servers exceeds the real memory of the machine, you will experience thrashing due to processes being swapped out of real memory and into virtual memory on disk.

This will cause significant performance degradation and should be avoided at all costs. In other words, always err on the conservative side when allocating indexing memory.

See Also:

For more information about allocating indexing memory for ConText servers, see "Creating an Engine Preference" in Chapter 6, "Setting Up and Managing Text".

Batch Versus Immediate DML

DML may occur either in batch mode or in immediate mode. Batch mode reindexes documents to reflect DML changes only when a synchronization command is explicitly issued.

Immediate DML initiates immediate reindexing so that changes to documents are reflected in the index in real-time.

Batch mode is generally preferred because the index will be less fragmented. This is because a larger number of reindexing requests for documents are processed by a single server in one pass, resulting in less segmentation in the index. However, immediate DML is advantageous if updates to the text table need to be reflected in the index in real time.

Note:

Fragmentation of the index may be fixed by running index optimization regularly.

See Also:

For more information, see "DML" and "Index Optimization" in Chapter 4, "Text Concepts".

Parallel Processing

When multiple CPUs are available for indexing, you can run multiple ConText servers for DDL or DML operations to take advantage of these CPUs. However, it should be noted that indexing is typically memory bound, so the advantages of parallel indexing may be reduced if running multiple servers on a single machine results in less memory allocated to each server.

On the other hand, if each ConText server runs on a separate workstation in a networked environment, significant performance gains will be realized by running indexing in parallel. It should be noted that the advantage of running multiple servers is gained during the inversion of the document set in memory. However, when the inverted index is flushed from the memory buffers to the index table, you may have contention because multiple servers will be writing to the token table.

This can be alleviated by increasing the INITRANS parameter for the token table (DR_nnnnn_I1Tn) to be the same as the number of ConText servers indexing in parallel.

See Also:

For more information about setting the INITRANS parameter for the token table, see the section on Engine Tiles in "Tiles, Tile Attributes, and Attribute Values: Indexing" in Chapter 10, "ConText Data Dictionary".

For more information about parallel processing, see Oracle8 Server Administrator's Guide.

Index Optimization

Index optimization may be performed either in-place or using two tables. In-place compaction involves reading the inverted index from the text table, compacting it in memory, and then flushing the buffer back to the same table. Two-table compaction involves reading the inverted index from the text table, compacting it in memory, and then flushing the buffer back to a new table. When the process is completed, the old table is dropped and the new table becomes the token table for the ConText index.

Two-table compaction is much faster because only reads are performed on the source table and only inserts are performed on the destination table. There are no updates, and more important, no Oracle index updates when two-table compaction is used.

The advantage of in-place compaction is that you need much less space. Note that two-table compaction results in an approximate replication of the index table. The destination table will be smaller depending on the amount of compaction performed. Specifically, the more fragmented the index, the less significant the size of the destination table compared to the size of the source table.

Something to consider when choosing the method of compaction is the amount of DML that has actually been performed on the table since the last optimization. If a large amount of DML has been performed, optimization will more likely reduce the token table significantly, with large numbers of reads and writes to perform the compaction.

Two-table compaction is preferred for this scenario. However, if the index has not been significantly fragmented, in-place compaction should perform sufficiently well and requires less tablespace.

Network Considerations

Indexing is typically constrained by memory on the workstation and memory allocated to the server for indexing. The ConText server takes full advantage of the array interface to the Oracle database to reduce the number of required network round trips during indexing. Specifically, when documents are retrieved from the database, they are fetched in batches for indexing.

Additionally, the index is flushed back to the database only after the indexing memory has been filled. This flush also utilizes array inserts into the database, reducing the number of network hits.

Note:

If indexing memory is small, buffer flushes to the database will be more frequent, resulting in network performance also becoming more of an issue.

Query Tuning

This section discusses some ConText administration considerations for tuning ConText queries:

Result Tables in Two-Step Queries
Matching Servers to Numbers of Users

See Also::

For more information about application considerations for tuning queries, see Oracle8 ConText Cartridge Application Developer's Guide.

Result Tables in Two-Step Queries

When a two-step query is run, the results are written to a user-specified hitlist result table which then can be either queried directly for text-only queries or joined with the text table for mixed queries (i.e., queries that involve textual conditions and structured conditions).

The hitlist result table may be shared by all users or may be specific to the user running the query. Performance is better if each user has a unique hitlist table. This is because the query on the hitlist to get the result set doesn't require a filter if the hitlist is not shared. Additionally, the hitlist table can be truncated rather than deleted from when it is not shared. Truncating is much faster than deleting from a table and has the added benefit of generating no redo log.

Finally, the hitlist table should be in a datafile that is on a raw partition. This will result in faster writes and reads from the table.

Matching Servers to Numbers of Users

As the number of users is increased, the number of query servers must also be increased. This is essential to ensure that there are enough servers running to retrieve queries off the Text Request Queue before the Query pipe fills, thereby maintaining high rates of query throughput.

If you find that throughput is dropping as the number of users increases, start up more query servers, with the assumption that each server is not fighting another server for CPU cycles. If servers are bottle necking due to insufficient CPU cycles, consider spreading query servers across multiple machines.

Note:

Network considerations are important during queries, because fast response time is generally critical, and the time to perform network round trips will become a significant percentage of the elapsed query time.

If you are forced to use multiple machines for your Query servers, make sure that the network connection between the query server and the database is a fast connection.

8 Tuning ConText

Indexing Tuning

Temporary Segment and SORT_AREA_SIZE Size

Memory Allocation

Batch Versus Immediate DML

Parallel Processing

Index Optimization

Network Considerations

Query Tuning

Result Tables in Two-Step Queries

Matching Servers to Numbers of Users

8
Tuning ConText