Oracle8i interMedia Text Migration
Release 8.1.5

A67845-01

Library

Product

Contents

Index

Prev Next

5
Indexing

The chapter discusses the changes to the Text indexing process that might affect your applications. The following topics are covered:

About the Text Index

In pre-8.1.5, the index is created with the CTX_DDL package by first creating a policy and then using the policy to create the index.

In 8.1.5, a Text index is created as a special type of extensible index to Oracle using standard SQL. This means that a interMedia Text 8.1.5 index operates like a Oracle index. It has a name by which it is referenced, and policies do not exist.

See Also:

For more information about creating a Text index, see "Procedure for Creating Index" in this chapter.  

Merged Word and Theme Index (English only)

In 8.1.5, a single text index can contain both theme and word information. This is different from pre-8.1.5 where you needed a theme index in addition to a text index to issue theme queries.

By default in English, interMedia Text indexes theme information with word information. You can optionally enable and disable theme indexing with your lexer preference.

See Also:

To learn more about indexing theme information, see "Creating Preferences" in this chapter.  

Columns with Multiple Indexes

In pre-8.1.5, the system allows you to create more than one index on a text column. This is useful when you want a text column to have a text and theme index.

In 8.1.5, a column can have no more than a single domain index attached to it, which is keeping with Oracle standards. However, a single Text index can contain theme information in addition to word information.

Indexing Views

In pre-8.1.5, you can create a ConText index on a view. This might be useful when you need to index documents whose content is pieced together from different tables.

However, Oracle SQL standards does not support creating indexes on views. Therefore in 8.1.5, if you need to create and index documents whose contents are in different tables, you can create a data storage preference using the USER_DATSTORE object, which is new for 8.1.5. With this object, you define a procedure that synthesizes documents at install time.

See Also:

To learn more about USER_DATASTORE, see Oracle8i interMedia Text Reference.  

Procedure for Creating Index

Pre-8.1.5

The pre-8.1.5 procedure for creating an index is

  1. determine indexing preferences

  2. create index preferences

  3. create index policy

  4. Call CTX_DDL.CREATE_INDEX procedure, specifying the policy

8.1.5

The process for creating an index is simpler because of the following

By default, the system expects your documents to be stored in a text column. Once this requirement is satisfied, you can create a text index using the CREATE INDEX SQL command as an extensible index of type ConText, without explicitly specifying any preferences.

See Also:

For more information about the out-of-box defaults, see Oracle8i interMedia Text Reference.  

The 8.1.5 procedure for creating an index is:

  1. Optionally, determine your custom indexing preferences if not using defaults. In this step, you determine the following preferences:

    Preference Class   Description  

    Datastore  

    How are your documents stored?  

    Filter  

    How can the documents be converted to plaintext?  

    Lexer  

    What language is being indexed?  

    Wordlist  

    How should stem and fuzzy queries be expanded?  

    Storage  

    How should the index data be stored?  

    Stop List  

    What words or themes are not to be indexed?  

    Section Group  

    How are documents sections defined?  

    See Also:

    For more information about the preference objects available in the 8.1.5 release, see Oracle8i interMedia Text Reference.  

  2. Optionally, create your own custom preferences. See "Creating Preferences" in this chapter.

  3. Create the Text index with the SQL command CREATE INDEX, naming your index and optionally specifying preferences. See "Creating an Index" in this chapter.

Creating Preferences

In 8.1.5, the syntax for the CTX_DDL.CREATE_PREFERENCE and CTX_DDL.SET_ATTRIBUTE procedures have changed. In addition, the order in which you call these procedures has changed.

In 8.1.5, you create the preferences then set the attributes, which is the opposite order of what you do in pre-8.1.5.

See Also:

For a complete list of preference objects and their associated attributes, and the syntax for the CTX_DDL.CREATE_PREFERENCE and CTX_DDL.SET_ATTRIBUTE procedures, see the Oracle8i interMedia Text Reference.  

Example: Specifying File Data Storage

The following example creates a custom data storage preference called mypref that tells the system that the files to be indexed are stored in the operating system. The example then uses CTX_DDL.SET_ATTRIBUTE to set the PATH attribute of to the directory /docs.

begin
ctx_ddl.create_preference('mypref', 'FILE_DATASTORE');
ctx_ddl.set_attribute('mypref', 'PATH', '/docs'); 
end;

See Also:

For more information about data storage, see Oracle8i interMedia Text Reference.  

Creating an Index

Pre-8.1.5

In pre-8.1.5, you create an index using CTX_DDL.CREATE_INDEX and name a policy.

8.1.5

In 8.1.5, you create the Text index as a type of extensible index using the CREATE INDEX SQL command. You name the index and optionally specify the preferences such as lexer and filter in the parameter string.

See Also:

To learn more about the CREATE INDEX command syntax, see the Oracle8i interMedia Text Reference.  

Create Index Example

The following example creates a Text index called newsindex on the news column in mytable. The index is created with the lexer preference called my_lexer and the stoplist called my_stop. Default attributes are used for the unspecified preferences.

create index newsindex on mytable(news) indextype is ctxsys.context 
  parameters('lexer my_lexer stoplist my_stop');

Dropping a Preference

Pre-8.1.5

In pre-8.1.5, you drop preferences using CTX_DDL.DROP_PREFERENCE, and you can only do so when all referenced policies have been deleted from the data dictionary.

8.1.5

In 8.1.5, you drop index preferences with the same procedure CTX_DDL.DROP_PREFERENCE. Because preferences exist separately from the index and because policies do not exist in 8.1.5, you need not drop your index before you drop a preference.

Dropping a preference does not affect the index that is using the dropped preference.

See Also:

To learn more about the syntax for the CTX_DDL.DROP_PREFERENCE procedure, see the Oracle8i interMedia Text Reference.  

Example

The following code drops the preference my_lexer.

begin
ctx_ddl.drop_preference('my_lexer');
end;

Dropping an Index

Pre-8.1.5

In pre-8.1.5, you drop an index using CTX_DDL.DROP_INDEX.

8.1.5

In 8.1.5, you drop an index using the DROP INDEX command in SQL.

For example, to drop an index called newsindex, issue the following SQL command:

drop index newsindex; 

If Oracle cannot determine the state of the index, for example as a result of an indexing crash, you cannot drop the index as described above. Instead use:

drop index newsindex force;

See Also:

To learn more about the DROP INDEX command syntax, see the Oracle8i interMedia Text Reference.  

Resuming Failed Index

Pre-8.1.5

In pre-8.1.5, when an indexing operation fails (creation or optimization), you can resume the operation using CTX_DDL.RESUME_FAILED_INDEX.

8.1.5

In interMedia Text 8.1.5, you resume a failed index operation using the ALTER INDEX command.

See Also:

To learn more about the ALTER INDEX command syntax, see the Oracle8i interMedia Text Reference.  

Example

The following command resumes the indexing operation on newsindex with 2 megabytes of memory:

ALTER INDEX newsindex rebuild parameters('resume memory 2M');

Rebuilding an Index

You can rebuild a valid index using ALTER INDEX. You might rebuild an index when you want to index with a new preference.

See Also:

To learn more about the ALTER INDEX command syntax for rebuilding an index, see the Oracle8i interMedia Text Reference.  

Example

The following command rebuilds the index, replacing the lexer preference with my_lexer.

ALTER INDEX newsindex rebuild parameters('replace lexer my_lexer');

Optimizing an Index

Pre-8.1.5

In pre-8.1.5 to optimize an index, you use CTX_DDL.OPTIMIZE_INDEX and specify one of five different optimizing methods.

8.1.5

In 8.1.5 to optimize an index, you use the ALTER INDEX command in SQL with the REBUILD parameter. You can optimize the index in either fast or full mode.

See Also:

To learn more about optimizing the index with ALTER INDEX, see the Oracle8i interMedia Text Reference.  

Updating the Index - Background DML

As in pre-8.1.5, the 8.1.5 Text index is updated automatically whenever there is an insert, delete, or update to the base table. A ctxsrv server must be running. This is known as background DML processing.

The following example starts a server and writes all server messages to a file named ctx.log:

ctxsrv -user ctxsys/ctxsys -personality M -log ctx.log &

See Also:

To learn more about background DML with ctxsrv, see the specification for ctxsrv in the Oracle8i interMedia Text Reference.  

Updating the Index - Batch DML

Pre-8.1.5

In pre-8.1.5, you synchronize the index using CTX_DML.SYNC. In addition, a ConText M server must be running.

8.1.5

You can update your index in batch mode by executing the ALTER INDEX command with the sync parameter. When you synchronize the index in batch mode, Oracle processes pending updates and inserts stored in the DML queue.

Because synchronizing an index in batch works on batches of inserts, updates and deletes, batch DML usually results in less index fragmentation than synchronizing the index immediately by running the ctxsrv daemon.


Note:

No background ctxsrv server is required to synchronize an index in batch. If the ctxsrv daemon is running, it synchronizes the index immediately.  


See Also:

To learn more about the ALTER INDEX command syntax, see the Oracle8i interMedia Text Reference.  

Example

The following example synchronizes the index with a runtime memory of 2 megabytes:

ALTER INDEX newsindex rebuild PARAMETERS('sync memory 2M'); 

Stoplists and Stopwords

Pre-8.1.5

In pre-8.1.5 a stoplist consisted of words that are not to be indexed. You recorded these words by calling CTX_DDL.SET_ATTRIBUTE for each stopword and then by creating a stoplist preference with CTX_DDL.CREATE_PREFERENCE.

Default stoplists in most of the supported languages are available. You manually set the stoplist fro your language.

8.1.5

Default Stoplist

By default, they system sets the default stoplist to the language you specify in your database setup. There is no need to create or set stoplists, unless you want to customize the list.

Stopthemes and Stopclasses

In addition to defining your own stopwords in 8.1.5, you can define stopthemes, which are themes that are not to be indexed. This is available for English only.

You can also specify that numbers are not to be indexed. A class of alphanumeric characters such a numbers that is not to be indexed is a stopclass.

You record your own stopwords, stopthemes, stopclasses by creating a single stoplist, to which you add the stopwords, stopthemes, and stopclasses. You specify the stoplist in the paramstring for CREATE INDEX.

New Procedures

In 8.1.5, you use the following procedures to manage stopwords, stopthemes, and stopclasses:

See Also:

To learn more about using these commands, see the Oracle8i interMedia Text Reference.  

Document Sections

Defining document sections before you index enables you to query within the sections using the WITHIN operator. You define sections as part of a section group.

Pre-8.1.5

In pre-8.1.5, you create a section group and specify it in the Wordlist preference. You can create only user-defined zone sections and sentence and paragraph sections.

8.1.5

Section Groups

In 8.1.5, you create a section group and specify it in the paramstring for CREATE INDEX. To create a section group, use CTX_DDL.CREATE_SECTION_GROUP.

See Also:

to learn more about using CTX_DDL.CREATE_SECTION_GROUP, see its specification in the Oracle8i interMedia Text Reference.  

Within a section group, you can create three types of sections:

Zone Sections

Zone sections (formerly known as user-defined sections in pre-8.1.5) are sections delimited by start and end tags. The <B> and </B> tags in HTML for instance, marks a range of words which are to be rendered in boldface.

Zone sections can be nested within one another, can overlap, and can occur more than once in a document.

You create zone sections as part of a section group with CTX_DDL.ADD_ZONE_SECTION.

See Also:

to learn more about using CTX_DDL.ADD_ZONE_SECTION, see its specification in the Oracle8i interMedia Text Reference.  

Field Sections

Field sections are new for 8.1.5. Field sections are delimited by start and end tags. By default, the text within field sections are indexed as a sub-document separate from the rest of the document.

Unlike zone sections, field sections cannot nest or overlap. As such, field sections are best suited for non-repeating, non-overlapping sections such as TITLE and AUTHOR sections in news type documents.

Because of how field sections are indexed, WITHIN queries on field sections are usually faster than WITHIN queries on zone sections.

You create a field section as part of a section group using CTX_DDL.ADD_FIELD_SECTION procedure.

See Also:

to learn more about using CTX_DDL.ADD_FIELD_SECTION, see its specification in the Oracle8i interMedia Text Reference.  

Special Sections

In 8.1.5, special sections are the same as paragraph and sentence sections in pre-8.1.5.

To create sentence and paragraph sections, use the CTX_DDL.ADD_SPECIAL_SECTION procedure.

See Also:

to learn more about using CTX_DDL.ADD_SPECIAL_SECTION, see its specification in the Oracle8i interMedia Text Reference.  




Prev

Next
Oracle
Copyright © 1999 Oracle Corporation.

All Rights Reserved.

Library

Product

Contents

Index