Oracle8i interMedia Text Reference
Release 8.1.5

A67843-01

Library

Product

Contents

Index

Prev Next

11
Executables

This chapter discusses the executables provided with interMedia Text. The following topics are discussed in this chapter:

ctxsrv

You use the ctxsrv server daemon for background DML processing. You can start it from the command line or with the interMedia Text Manager administration tool.

This server synchronizes the index with ALTER INDEX at regular intervals.


Note:

ctxsrv can only be executed by the Oracle user, CTXSYS.  


Syntax

ctxsrv [-user ctxsys/passwd[@sqlnet_address]]
       [-personality M]
       [-logfile log_name]
       [-sqltrace]

-user

Specify the username and password for the Oracle user CTXSYS.

The username and password can be immediately followed by @sqlnet_address to permit logon to remote databases. The value for sqlnet_address is a database connect string. If the TWO_TASK environment variable is set to a remote database, you need not specify a value for sqlnet_address to connect to the database.


Note:

If you do not specify user in the ctxsrv command-line, you are prompted to enter the required information in the format:'CTXSYS/password' where password is the password for CTXSYS.

This is useful if you wish to mask the CTXSYS password from other users of the machine on which the server is running.  


-personality

Specify the personality mask for the server started by ctxsrv. The only possible value is M and M is the default.

-logfile

Specify the name of a log file to which the server writes all session information and errors.

-sqltrace

Enables the server to write to a trace file in the directory specified by the USER_DUMP_DEST initialization parameter.

See Also:

For more information about SQL trace and the USER_DUMP_DEST initialization parameter, see Oracle8 Administrator's Guide.  

Examples

The following example starts a server and writes all server messages to a file named ctx.log:

ctxsrv -user ctxsys/ctxsys -personality M -log ctx.log &

The following example starts a server and writes all server messages to a file named ctx.log. Because -user is not specified, the server prompts you to enter a user:

ctxsrv -log ctx.log

...
Copyright (c) Oracle Corporation 1979, 1998.  All rights reserved.
...
Enter user:

At the prompt, enter 'CTXSYS/password', where password is the password assigned to the CTXSYS user.


Unix Users:

In this example, the process is not run in the background.

In environments where you can run processes in the background, if you do not specify -user in the ctxsrv command-line, you must run the server process in the foreground or pass a value for -user to ctxsrv from an operating system file.

For example:

ctxsrv -log ctx.log < pword.txt

The file must contain a single line consisting of the following text: 'CTXSYS/password'

If you pass a value to ctxsrv from a file, ctxsrv does not prompt you to enter a user.  


Notes

Viewing Pending Updates

Pending index updates are stored in the DML queue. To view this queue, you can use the CTX_PENDING or CTX_USER_PENDING views.

You can also use the interMedia Text Manager administration tool, which is part of the Oracle Enterprise Manager.

Viewing DML Errors

You can view DML errors with the CTX_INDEX_ERRORS or CTX_USER_INDEX_ERRORS views.

Index Fragmentation

Background DML with ctxsrv scans for DML constantly by polling the DML queue. This leads to new additions being indexed automatically and quickly. However, background DML also tends to process documents in smaller batches, which increases index fragmentation.

However, when you synchronize the index manually with ALTER INDEX, the batches are usually larger and thus there is less index fragmentation.

Shutting Down the Server

You can shut down ctxsrv with

Related Topics

ALTER INDEX

CTX_ADM.SHUTDOWN in Chapter 6.

The following views in Appendix H, "Views":

interMedia Text Manager

For more information on starting servers with the administration tool, see the online help for the interMedia Text Manager. This administration tool ia a Java application integrated with the Oracle Enterprise Manager.

ctxload

You use ctxload to perform the following operations:

Thesaurus Importing and Exporting

Use ctxload to load a thesaurus from an import file into the iMT thesaurus tables.

An import file is an ASCII flat file that contains entries for synonyms, broader terms, narrower terms, or related terms which can be used to expand queries.

ctxload can also be used to export a thesaurus by dumping the contents of the thesaurus into a user-specified operating-system file.

See Also:

For examples of import files for thesaurus importing, see "Structure of ctxload Thesaurus Import File" in Appendix D.  

Text Loading

You can use ctxload to load text from a load file into a LONG or LONG RAW column in a table.


Suggestion:

If the target table does not contain a LONG or LONG RAW column or you do not want to load text into a LONG or LONG RAW column, you can use SQL*Loader to populate the table with text.

For more information on loading with SQL*Loader, see "SQL*Loader Example" in Appendix D.  


A load file is an ASCII flat file that contains the plain text, as well as any structured data (title, author, date, etc.), for documents to be stored in a text table; however, in place of the text for each document, the load file can store a pointer to a separate file that holds the actual text (formatted or plain) of the document.


Note:

The ctxload utility does not support load files that contain both embedded text and file pointers. You must use one method or the other when creating a load file.  


The ctxload utility creates one row in the table for each document identified by a header in the load file.

See Also:

For examples of load files for text loading, see "Structure of ctxload Text Load File" in Appendix D.  

Document Updating/Exporting

The ctxload utility supports updating database columns from operating system files and exporting database columns to files, specifically LONG RAW, LONG, BLOB and CLOB columns.


Note:

The updating/exporting of data is performed in sections to avoid the necessity of a large amount of memory (up to 2 Gigabytes) for the update/fetch buffer.

As a result, a minimum of 16 Kilobytes of memory is required for document update/export.  


ctxload Syntax

ctxload -user username[/password][@sqlnet_address]
        -name object_name
        -file file_name
       [-pk primary_key]  
       [-export]
       [-update]
       [-thes]
       [-thescase y|n]
       [-thesdump]
       [-separate]
       [-longsize n]
       [-date date_mask]
       [-log file_name]
       [-trace]
       [-commitafter n]

Mandatory Arguments

-user

Specify the username and password of the user running ctxload.

The username and password can be followed immediately by @sqlnet_address to permit logon to remote databases. The value for sqlnet_address is a database connect string. If the TWO_TASK environment variable is set to a remote database, you do not have to specify a value for sqlnet_address to connect to the database.

-name object_name

When you use ctxload to export/import a thesaurus, use object_name to specify the name of the thesaurus to be exported/imported.

You use object_name to identify the thesaurus in queries that use thesaurus operators.


Note:

Thesaurus name must be unique. If the name specified for the thesaurus is identical to an existing thesaurus, ctxload returns an error and does not overwrite the existing thesaurus.  


When you use ctxload to update/export a text field, use object_name to specify the index associated with the text column.

-file file_name

When ctxload is used to import a thesaurus, use file_name to specify the name of the import file which contains the thesaurus entries.

When ctxload is used to export a thesaurus, use file_name to specify the name of the export file created by ctxload.


Note:

If the name specified for the thesaurus dump file is identical to an existing file, ctxload overwrites the existing file.  


When ctxload is used to update a single row in a text column, use file_name to specify the file that stores the text to be inserted into the text column. You identify the destination row with -pk.

When ctxload is used to export a single row in a text column, use file_name to specify the file to which the text is exported. You identify the source row with -pk.

See Also:

For more information about the structure of ctxload import files, see Appendix D, "Loading Examples".  

Optional Arguments

-pk

Specify the primary key value of the row to be updated or exported.

When the primary key is compound, you must enclose the values within double quotes and separate the keys with a comma.

-export

Exports the contents of a single cell in a database table into the operating system file specified by -file. ctxload exports the LONG, LONG RAW, CLOB or BLOB column in the row specified by -pk.

When you use the -export, you must specify a primary key with -pk.

-update

Updates the contents of a single cell in a database table with the contents of the operating system file specified by -file. ctxload updates the LONG, LONG RAW, CLOB or BLOB column in for the row specified by -pk.

When you use -update, you must specify a primary key with -pk.

-thes

Import a thesaurus. Specify the source file with the -file argument. You specify the name of the thesaurus to be imported with -name.

-thescase y | n

Specify y to create a case-sensitive thesaurus with the name specified by -name and populate the thesaurus with entries from the thesaurus import file specified by -file. If -thescase is 'y' (the thesaurus is case-sensitive), ctxload enters the terms in the thesaurus exactly as they appear in the import file.

The default for -thescase is 'n' (case-insensitive thesaurus)


Note:

-thescase is valid for use with only the -thes argument.  


-thesdump

Export a thesaurus. Specify the name of the thesaurus to be exported with the -name argument. Specify the destination file with the -file argument.

-separate

For text loading, include this parameter to specify that the text of each document in the load file is a pointer to a separate text file. This instructs ctxload to load the contents of each text file in the LONG or LONG RAW column for the specified row.

-longsize n

For text loading, specify the maximum number of kilobytes to load into the LONG or LONG RAW column.

The minimum value is 1 (that is 1 Kb) and the maximum value is machine dependent.


Note:

You must enter the value for longsize as a number only. Do not include a 'K' or 'k' to indicate kilobytes.  


-date

Specify the TO_CHAR date format for any date columns loaded using ctxload.

See Also:

For more information about the available date format models, see Oracle8i SQL Reference.  

-log

Specify the name of the log file to which ctxload writes any national-language supported (NLS) messages generated during processing. If you do not specify a log file name, the messages appear on the standard output.

-trace

Enables SQL statement tracing using 'ALTER SESSION SET SQL_TRACE TRUE'. This command captures all processed SQL statements in a trace file, which can be used for debugging. The location of the trace file is operating-system dependent and can be modified using the USER_DUMP_DEST initialization parameter.

See Also:

For more information about SQL trace and the USER_DUMP_DEST initialization parameter, see Oracle8 Administrator's Guide.  

-commitafter n

Specify the number of rows (documents) that are inserted into the table before a commit is issued to the database. The default is 1.

Examples

This section provides examples for some of the operations that ctxload can perform.

See Also:

For more document loading examples, see Appendix D, "Loading Examples".  

Thesaurus Import Example

The following example imports a thesaurus named tech_doc from an import file named tech_thesaurus.txt:

ctxload -user jsmith/123abc -thes -name tech_doc -file tech_thesaurus.txt 

Thesaurus Export Example

The following example dumps the contents of a thesaurus named tech_doc into a file named tech_thesaurus.out:

ctxload -user jsmith/123abc -thesdump -name tech_doc -file tech_thesaurus.out 

Exporting a Single Text Field

The following example exports a single text field identified by the primary key value of 1 to the file myfile. The index myindex identifies the text column.

ctxload -user scott/tiger -export -name myindex -file myfile -pk 1 

To export a single text field identified by a compound primary key, you must enclose the primary keys with quotes and separate the values with commas as follows:

ctxload -user scott/tiger -export -name myindex -file myfile -pk "Oracle,1" 

Updating a Single Text Field

The following example updates a single text field identified by primary key value of 1 with the contents of myfile. The index myindex identifies the text column.

ctxload -user scott/tiger -update -name myindex -file myfile -pk 1 

To update a single text field identified by a compound primary key, you must enclose the primary key with quotes and separate the values with commas as follows:

ctxload -user scott/tiger -update -name myindex -file myfile -pk "Oracle,1" 

Knowledge Base Extension Compiler (ctxkbtc)

The ctxkbtc compiler takes one or more specified thesauri and compiles them with the interMedia Text knowledge base to create an extended knowledge base. The extended information can be application-specific terms and relationships.

The extended knowledge base overrides any terms and relationships in the knowledge base where there is overlap. The extended knowledge base is accessed during tasks that use the knowledge base, such as theme indexing, processing ABOUT queries in English, and extracting document themes with document services.

See Also:

For more information about the knowledge base packaged with interMedia Text, see Appendix J, "Knowledge Base - Category Hierarchy".

For more information about the ABOUT operator, see ABOUT operator in Chapter 4.

For more information about document services, see Chapter 8, "CTX_DOC Package".  

Syntax

ctxkbtc -user uname/passwd
       [-name thesname1 [thesname2 ... thesname16]]
       [-revert]
       [-verbose]
       [-log filename]

-user

Specify the username and password for the administrator creating an extended knowledge base.

-name

Specify the name(s) of the thesauri (up to 16) to be compiled with the knowledge base to create the extended knowledge base. The thesauri you specify must already be loaded with ctxload.

-revert

Reverts the extended knowledge base to the default knowledge base provided by interMedia Text.

-verbose

Displays all warnings and messages, including non-NLS messages, to the standard output.

-log

Specify the log file for storing all messages. When you specify a log file, no messages are reported to standard out.

Usage Notes

Knowledge base extension cannot be performed when theme indexing is being performed.

In addition, any SQL sessions that are using interMedia Text functions must be exited and reopened to make use of the extended knowledge base.

There can be only one user extension per installation. Since a user extension affects all users at the installation, only administrators or terminology managers should extend the knowledge base.

Running ctxkbtc twice removes the previous extension.

Before being compiled, each thesaurus must be loaded into interMedia Text case sensitive with the "-thescase Y" option in ctxload.

Constraints on Thesaurus Terms

Terms are case sensitive. If a thesaurus has a term in uppercase, for example, the same term present in lowercase form in a document will not be recognized.

The maximum length of a term is 80 characters.

Disambiguated homographs are not supported.

Constraints on Thesaurus Relations

The following constraints apply to thesaurus relations:

Linking New Terms to Existing Terms

Oracle recommends that new terms be linked to one of the categories in the knowledge base for best results in theme proving when appropriate.

See Also:

Appendix J, "Knowledge Base - Category Hierarchy"  

For example, if a hierarchy of medical terms is added, the existing category health and medicine can be made a broader term for the new terms. If new terms are kept completely disjoint from existing categories, fewer themes from new terms will be proven. The result of this is poorer precision and recall with ABOUT queries as well poor quality of gists and theme highlighting.

Order of Precedence for Multiple Thesauri

When multiple thesauri are to be compiled, precedence is determined by the order in which thesauri are listed in the arguments to the compiler (most preferred first). A user thesaurus always has precedence over the built-in KB.

Size Limits

The following table lists the size limits associated with creating and compiling an extended knowledge base:

Description of Parameter   Limit  

Number of RTs (from + to) per term  

32  

Number of terms per a single hierarchy (i.e., all narrower terms for a given top term)  

64000  

Number of new terms in an extended knowledge base  

1 million  

Number of separate thesauri that can be compiled into a user extension to the KB  

16  




Prev

Next
Oracle
Copyright © 1999 Oracle Corporation.

All Rights Reserved.

Library

Product

Contents

Index