Oracle ConText Cartridge Administrator's Guide
Release 2.0

A54628_01

Library

Product

Contents

Index

Prev Next

10
ConText Data Dictionary

This chapter provides reference information for the ConText data dictionary objects provided with ConText.

The topics discussed in this chapter are:

Tiles, Tile Attributes, and Attribute Values: Indexing

The following section lists all of the Tiles which can be used to create indexing preferences for use in policies. The section also lists the attributes and attribute values for each indexing Tile. In addition, a brief description of the Tile attributes and examples are provided.

The indexing Tiles are grouped alphabetically by preference category:

Preference Category   Tiles  

Data Store Category  

DIRECT  

 

MASTER DETAIL  

 

OSFILE  

 

URL  

Filter Category  

BLASTER FILTER  

 

FILTER NOP  

 

HTML FILTER  

 

USER FILTER  

Lexer Category  

BASIC LEXER  

 

CHINESE V-GRAM LEXER  

 

JAPANESE V-GRAM LEXER  

 

KOREAN LEXER  

 

THEME LEXER  

Engine Category  

GENERIC ENGINE  

 

ENGINE NOP  

Wordlist Category  

GENERIC WORD LIST  

Stoplist Category  

GENERIC STOP LIST  

Note:

The Compressor category is not listed because data compression is not supported in this release of ConText. Predefined NULL Compressor Tiles and preferences are used as defaults in any policies created.  

Data Store Category

The Data Store category contains the following Tiles:

Tile   Attributes   Attribute Values  

DIRECT  

** none **  

N/A  

MASTER DETAIL  

BINARY  

0 (plain text)  

 

 

1 (binary text)  

OSFILE  

PATH  

path1:path2:...:pathn  

URL  

TIMEOUT  

seconds (0 to 3600, default 30)  

 

MAXTHREADS  

number of threads (0 to 1024, default 8)  

 

MAXURLS  

buffer length in bytes (1 to 4294967295, default 256)  

 

URLSIZE  

URL length (32 to 65535, default 256)  

 

MAXDOCSIZE  

document size (256 to 4294967295, default 2000000)  

 

HTTP_PROXY  

host name  

 

NO_PROXY  

string (up to 16 strings, separated by commas)  

MASTER DETAIL Tile Attribute(s)

The binary attribute specifies whether the text in a master detail table is in plain text format (0) or binary format (1).

Text in plain text uses newline characters at the end of each line to indicate the end of the line. In contrast, binary format does not use newline characters to indicate the end of the line.

OSFILE Tile Attribute(s)

The path attribute specifies the location of text files that are stored externally in a file system.

Multiple paths can be specified for the path attribute, with each path separated by a colon (:). File names are stored in the text column in the text table. If the path attribute is not used to specify a path for external files, ConText requires the path to be included in the file names stored in the text column.

Note:

If text is stored in external files rather than in a database, the files must be accessible from the host machine on which the ConText server is running.

This can be accomplished by storing the files in the file system for the host machine or by mounting the file system where the files are stored to the host machine.  

URL Tile Attribute(s)

The timeout attribute specifies the length of time, in seconds, that a network operation such as 'connect' or 'read' waits before timing out and returning a timeout error to the application. The valid range for timeout is 0 to 3600 and the default is 30.

Note:

Since timeout is at the network operation level, the total timeout may be longer than the time specified for timeout.  

The maxthreads attribute specifies the maximum number of threads that can be running at the same time. The valid range for maxthreads is 1 to 1024 and the default is 8.

Note:

The upper range of maxthreads corresponds to the number of file descriptors that the operating system can process at one time. If the number of files the operating system can process at one time is less than the value set, an invalid socket error may be returned.  

The maxurls attribute specifies the maximum number of rows that the internal buffer can hold for HTML documents (rows) retrieved from the text table. The valid range for maxurls is 1 to 4294967295 and the default is 256.

The urlsize attribute specifies the maximum length, in bytes, that the URL data store supports for URLs stored in the database. If a URL is over the maximum set, an error is returned. The valid range for urlsize is 32 to 65535 and the default is 256.

The maxdocsize attribute specifies the maximum size, in bytes, that the URL data store supports for accessing HTML documents whose URLs are stored in the database. The valid range for maxdocsize is 1 to 4294967295 and the default is 200000 (2 Mb).

The http_proxy attribute specifies the fully-qualified name of the host machine that serves as the proxy (gateway) for the machine on which ConText is installed.

The no_proxy attribute specifies the strings (up to sixteen, separate by commas) which, when encountered in a host name, cause the URL data store to ignore the machine as a proxy machine.

For example, if the string 'us.oracle.com, uk.oracle.com' is entered for no_proxy, any machines that contain either of these domains in their host names are ignored as proxy machines.

Data Store Example

The following example creates a preference named doc_ref for the OSFILE Tile:

begin
  ctx_ddl.set_attribute ('PATH', '/private/mydocs');
  ctx_ddl.create_preference ('DOC_PREF', 'Path my for my documents' 'OSFILE');
end;

Note:

This example illustrates usage of OSFILE for documents stored in a UNIX-based environment.

The directory path syntax may be different for other environments.  

Filter Category

The Filter category contains the following Tiles:

Tile   Attributes   Attribute Values  

BLASTER FILTER  

EXECUTABLE  

format id (number), filter executable, sequence (number)  

 

FORMAT  

0 or 999 (No filter -- plain/ASCII text)  

 

 

1 or 4 (Word Perfect for Windows 5.x; Word Perfect for DOS 5.0, 5.1)  

 

 

2 (MS Word for DOS 5.0, 5.5)  

 

 

5 (Word Perfect for Windows 6.x; Word Perfect for DOS 6.0)  

 

 

6 (MS Word for Mac 3, 4, 5.x)  

 

 

7 (MS Word for Windows 2)  

 

 

8 (AMIPRO for Windows 1, 2, 3)  

 

 

9 (Lotus 1-2-3 for Windows 2, 3, 4, 5; Lotus 1-2-3 for DOS 4, 5)  

 

 

11 (MS Word for Windows 6.x, 7.0)  

 

 

13 (Xerox XIF for UNIX 5, 6)  

 

 

997 (Autorecognize)  

FILTER NOP  

** none **  

N/A  

HTML FILTER  

CODE_CONVERSION  

0 (disabled)  

 

 

1(enabled)  

USER FILTER  

COMMAND  

filter executable  

BLASTER FILTER Tile Attribute(s)

The format attribute specifies the internal filter used for filtering text stored in a text column.

The executable attribute specifies the external filters that are used to filter text stored in a mixed-format text column. It has three values that must be specified:

HTML FILTER Tile Attribute(s)

The code_conversion attribute specifies whether code conversion is enabled for documents which contain Japanese ASCII text with HTML tags.

Code conversion is required for Japanese HTML documents if the documents use more than one of the three character sets supported for HTML text in Japanese. If code conversion is enabled, all Japanese HTML documents are converted to a single, common character set before indexing.

The default for code_conversion is 0 (disabled).

Note:

For mixed-format columns that use Autorecognize (BLASTER Tile, format attribute = 997) or use external filters (BLASTER Tile, executable attribute) for all formats except HTML, code conversion is always enabled.  

USER FILTER Tile Attributes(s)

The command attribute specifies the executable for the single external filter used to filter all text stored in a column. If more than one document format is stored in the column, the external filter must recognize and handle all such formats.

Filter Example

The following example creates a preference named word6 for the BLASTER FILTER Tile:

begin
  ctx_ddl.set_attribute ('FORMAT', '11');
  ctx_ddl.create_preference ('WORD6', 'Microsoft Word docs', 'BLASTER FILTER');
end;

Lexer Category

The Lexer category contains the following Tiles:

Tile   Attributes   Attribute Values  

BASIC LEXER  

PUNCTUATIONS  

characters (string)  

 

PRINTJOINS  

characters (string)  

 

SKIPJOINS  

characters (string)  

 

NUMJOIN  

characters (string)  

 

NUMGROUP  

characters (string)  

 

CONTINUATION  

characters (string)  

 

BASE_LETTER  

0 (disabled)  

 

 

1 (enabled)  

CHINESE
V-GRAM LEXER  

HANZI_INDEXING  

1  

 

 

2  

JAPANESE
V-GRAM LEXER  

KANJI_INDEXING  

1  

 

 

2  

KOREAN LEXER  

** none **  

N/A  

THEME LEXER  

** none **  

N/A  

Note:

The character strings for each BASIC LEXER Tile attribute can contain multiple characters. Each character in the string serves as a punctuation, join, or continuation character.

For example, if the string '.?!' is specified for the punctuations attribute, each individual character ('.', '?', '!') in the string is treated by ConText as a sentence delimiter during indexing and queries.  

BASIC LEXER Tile Attribute(s)

punctuations specifies the characters that indicate the end of a sentence.

printjoins specifies the characters that join words together when they appear between the words with no blank spaces. Words that contain printjoin characters are stored in the text index exactly as they appear in the text.

For example, if a hyphen '-' is defined as a printjoin character, the word pseudo-intellectual is stored in the text index as pseudo-intellectual.

skipjoins specifies the characters that join words together, but the characters are not stored in the text index.

For example, if a hyphen '-' is defined as a skipjoin character, the word pseudo-intellectual is stored in the text index as pseudointellectual.

Note:

printjoins and skipjoins are mutually exclusive. The same characters cannot be specified for both attributes.  

numjoin specifies the characters that, when they appear in a string of digits, cause ConText to index the string of digits as a single unit or word.

For example, a period '.' may be defined as a numjoin character because it often serves as a decimal point when it appears in a string of digits.

numgroup specifies the characters that, when they appear in a string of digits, indicate that the digits are groupings within a larger single unit.

For example, a comma ',' may be defined as a numgroup character because it often indicates a grouping of thousands when it appears in a string of digits.

Note:

The default values for numjoin and numgroup are determined by the NLS initialization parameters that are specified for the database.

In general, a value does not need to be specified for either numjoin or numgroup when creating a Lexer preference for the BASIC LEXER Tile.  

continuation specifies the characters that indicate a word continues on the next line and should be indexed as a single token. The most common continuation characters are a hyphen '-' and a backslash '\'.

base_letter specifies whether characters that have diacritical marks (umlats, cedillas, acute accents, etc.) are converted to their base form for text indexing and text queries.

CHINESE V-GRAM LEXER Tile Attribute(s)

The hanzi_indexing attribute specifies the length of the character groups used for pattern matching while indexing.

A value of 1 for hanzi_indexing indicates that the Chinese lexer examines each character individually to determine token boundaries.

A value of 2 for hanzi_indexing indicates that the lexer examines characters in pairs to determine token boundaries.

The default is 2.

JAPANESE V-GRAM LEXER Tile Attribute(s)

The kanji_indexing attribute specifies the length of the character groups used for pattern matching while indexing.

A value of 1 for kanji_indexing indicates that the Japanese lexer examines each character individually to determine token boundaries.

A value of 2 for kanji_indexing indicates that the lexer examines pairs of characters to determine token boundaries.

The default is 2.

Lexer Example

The following example creates a preference named doc_link for the BASIC LEXER Tile:

begin
  ctx_ddl.Set_attribute     ('PRINTJOINS', '-*/');
  ctx_ddl.create_preference ('DOC_LINK', 'Dash, star, slash', 'BASIC LEXER' );
end;

Engine Category

The Engine category contains the following Tiles:

Tile   Attributes   Attribute Values  

GENERIC ENGINE  

INDEX_MEMORY  

memory in bytes (integer)  

 

OPTIMIZE_DEFAULT  

default ConText index optimization method  

 

I1T_TABLESPACE, I1T_STORAGE, I1T_OTHER_PARMS  

tablespace, STORAGE clause, and other table creation parameters for token table  

 

I1I_TABLESPACE, I1I_STORAGE, I1I_OTHER_PARMS  

tablespace, STORAGE clause, and other index creation parameters for index on token table  

 

KTB_TABLESPACE, KTB_STORAGE, KTB_OTHER_PARMS  

tablespace, STORAGE clause, and other table creation parameters for mapping table  

 

KID_TABLESPACE, KID_STORAGE, KID_OTHER_PARMS

KIK_TABLESPACE, KIK_STORAGE, KIK_OTHER_PARMS  

tablespace, STORAGE clause, and other index creation parameters for indexes on mapping table  

 

LST_TABLESPACE, LST_STORAGE, LST_OTHER_PARMS  

tablespace, STORAGE clause, and other table creation parameters for control table  

 

LIX_TABLESPACE, LIX_STORAGE, LIX_OTHER_PARMS  

tablespace, STORAGE clause, and other index creation parameters for index on control table  

 

SQR_TABLESPACE, SQR_STORAGE, SQR_OTHER_PARMS  

tablespace, STORAGE clause, and other table creation parameters for SQE results table  

 

SRI_TABLESPACE, SRI_STORAGE, SRI_OTHER_PARMS  

tablespace, STORAGE clause, and other index creation parameters for index on SQE results table  

 

SQE_TABLESPACE, SQE_STORAGE, SQE_OTHER_PARMS  

tablespace, STORAGE clause, and other table creation parameters for SQE definition table (NOT USED)  

 

SEI_TABLESPACE, SEI_STORAGE, SEI_OTHER_PARMS  

tablespace, STORAGE clause, and other index creation parameters for index on SQE definition table (NOT USED)  

ENGINE NOP  

** none **  

N/A  

GENERIC ENGINE Tile Attribute(s)

index_memory specifies the amount of memory, in bytes, allocated for indexing.

Note:

When specifying a value for index_memory in a preference, specify as much real (not virtual) memory as is available on the machine which is running the ConText server that will be creating indexes.

For parallel indexing, the memory specified should be the amount of available memory divided evenly among the number of ConText servers that will perform the indexing in parallel.  

optimize_default specifies the type of optimization used when CTX_DDL.OPTIMIZE_INDEX is called without an optimization type. If no value is specified for optimize_default, the default is DEFRAGMENT_TO_TWO_TABLE.

i1t_tablespace, ktb_tablespace, and lst_tablespace specify the tablespaces used for the ConText index tables created during indexing.

sqr_tablespace specifies the tablespace used for the stored query expression result (SQR) table that is created, but not populated, during indexing. The SQR table for a policy stores the results of stored query expressions for the policy.

i1i_tablespace, kid_tablespace, kik_tablespace, and lix_tablespace specify the tablespaces used for the Oracle indexes generated for each ConText index table during indexing.

sri_tablespace specifies the tablespace used for the Oracle index generated for each SQR table.

Note:

For each TABLESPACE attribute that is not specified when creating an Engine preference, the text table owner's default tablespace is used for storing the ConText index objects (tables and indexes).  

i1t_storage, ktb_storage, and lst_storage specify the STORAGE clauses used to create the ConText index tables during ConText indexing.

sqr_storage specifies the STORAGE clause used to create the stored query expression result (SQR) table during ConText indexing.

i1i_storage, kid_storage, kik_storage, and lix_storage specify the STORAGE clauses used to create the Oracle indexes for each ConText index table.

sri_storage specifies the STORAGE clause used to create the Oracle index for each SQR table.

i1t_other_parms, ktb_other_parms, and lst_other_parms specify any additional parameters used to create the ConText index tables during ConText indexing.

sqr_other_parms specifies any additional parameters used to create the stored query expression result (SQR) table during ConText indexing.

i1i_other_parms, kid_other_parms, kik_other_parms, and lix_other_parms specify any additional parameters used to create the Oracle indexes for each ConText index table.

sri_other_parms specifies any additional parameters used to create the Oracle index for each SQR table.

Note:

In particular, the other_parms attributes are used to specify a value for the PARALLEL clause in the CREATE TABLE/INDEX command. The PARALLEL clause determines the degree of parallelism used by the Oracle8 parallel query option for operations such as generating Oracle indexes.  

sqe/sei_tablespace, sqe/sei_storage, and sqe/sei_other_params are not used by ConText because SQE tables and their accompanying Oracle indexes are not used for storing SQE definitions (all SQE definitions are stored in a system table owned by CTXSYS). As a result, values are not required for these attributes.

See Also:

For descriptions of the tables and indexes that constitute a ConText index, see "Appendix C, "ConText Index Tables and Indexes".

For more information about the storage clauses and other parameters that can be specified for a database table/index, see the CREATE TABLE and CREATE INDEX commands in Oracle8 Server SQL Reference.

For more information about the parallel query option in Oracle8, see Oracle8 Server Tuning.

For more information about SQEs, see Oracle8 ConText Cartridge Application Developer's Guide.  

Engine Example

The following example creates a preference named doc_engine for the GENERIC ENGINE Tile:

begin
  ctx_ddl.set_attribute ('INDEX_MEMORY',   30000000 );
  ctx_ddl.set_attribute ('I1T_TABLESPACE', 'DOCUMENTS' );
  ctx_ddl.set_attribute ('I1T_STORAGE',' initial 10M next 2M
                         maxextents 10');
  ctx_ddl.set_attribute ('I1T_OTHER_PARMS',' pctfree 20');
  ctx_ddl.set_attribute ('I1I_OTHER_PARMS',' parallel 2');
  ctx_ddl.create_preference ('DOC_ENGINE', 'Test case',
                             'GENERIC ENGINE' );
end;

Wordlist Category

The Wordlist category contains the following Tiles:

Tile   Attributes   Attribute Values  

GENERIC WORD LIST  

STCLAUSE  

STORAGE clause for Soundex wordlist table  

 

INSTCLAUSE  

STORAGE clause for index on Soundex wordlist table  

 

SOUNDEX_AT_INDEX  

0 (disabled)  

 

 

1 (enabled)  

 

STEMMER  

1 (English)  

 

 

2 (English -- derivational)  

 

 

3 (Dutch)  

 

 

4 (French)  

 

 

5 (German)  

 

 

6 (Italian)  

 

 

7 (Spanish)  

 

FUZZY_MATCH  

1 (English and other Western European languages)  

 

 

2 (Japanese)  

 

 

3 (Korean)  

 

 

4 (Chinese)  

GENERIC WORD LIST Tile Attribute(s)

The stclause attribute specifies the STORAGE clause used to create the Soundex wordlist table during ConText indexing. The Soundex wordlist table is only created if Soundex is enabled through the soundex_at_index attribute.

The instclause attribute specifies the STORAGE clause used to create the Oracle index for the Soundex wordlist table.

The soundex_at_index attribute specifies whether ConText generates Soundex word mappings and stores them in the Soundex wordlist table during text indexing. If Soundex word mappings are not generated and stored in the wordlist table during indexing, queries that use Soundex are not expanded.

The stemmer attribute specifies the stemmer used for word stemming in text queries. For all the supported languages, the stemmers return standard inflected forms of a word, such as the plural form (e.g. department --> departments).

For English, an additional stemmer is provided which returns standard inflected forms and derived forms (e.g. department --> departments, departmentalize).

The default for stemmer is 1 (inflectional English)

The fuzzy_match attribute specifies which fuzzy matching routines are used for the column. Fuzzy matching is currently supported for English, Japanese, and, to a lesser extent, the Western European languages.

The default for fuzzy_match is 1.

Note:

The fuzzy_match attribute values for Chinese and Korean are dummy attribute values that prevent the English and Japanese fuzzy matching routines from being used on Chinese and Korean text.  

See Also:

For more information about the expansion methods supported by ConText, see "WordList Category" in Chapter 5, "Understanding the ConText Data Dictionary".

For more information about expansion methods in queries, see Oracle8 ConText Cartridge Application Developer's Guide.  

Wordlist Example

The following example creates a preference named soundex_yes for the GENERIC WORDLIST Tile:

begin
  ctx_ddl.set_attribute('SOUNDEX_AT_INDEX', '1');
  ctx_ddl.create_preference('SOUNDEX_YES',
                            'Will build the soundex mapping during indexing',
                            'GENERIC WORDLIST');
end;

Stoplist Category

The Stoplist category contains the following Tiles:

Tile   Attributes   Attribute Values  

GENERIC STOP LIST  

STOP_WORD  

word (string), sequence (number)  

GENERIC STOP LIST Tile Attribute(s)

The stop_word attribute has two values that must be specified:

Sequence is a value from 1 to 4095 and is used in a text index to record the stop words that proceed and follow an indexed term. ConText records up to eight preceding stop words and eight following stop words for each indexed term. This enables text queries for phrases which contain stop words.

For example, consider the sentence "he is at the top of the class" where at, the, top, and of are stop words. The sequences for each of the stop words are recorded as part of the text index entry for the term class, which allows users to include stopwords in a query (e.g. 'top of the class').

Stoplist Example

The following example creates a preference named mini_stop_list for the GENERIC STOPLIST Tile:

begin
  ctx_ddl.set_attribute     ('STOP_WORD', 'A',   1);
  ctx_ddl.set_attribute     ('STOP_WORD', 'AND', 2);
  ctx_ddl.set_attribute     ('STOP_WORD', 'THE', 3);
  ctx_ddl.create_preference ('MINI_STOP_LIST', 'Small', 'GENERIC STOP LIST' );
end;

Tiles, Tile Attributes, and Attribute Values: Text Loading

The following section lists all of the Tiles which can be used to create text loading preferences for use in sources. The section also lists the attributes and attribute values for each text loading Tile. In addition, a brief description of the Tile attributes and examples are provided.

The text loading Tiles are grouped alphabetically by preference category:

Preference Category   Tiles  

Reader Category  

DIRECTORY READER  

Engine Category  

GENERIC LOADER  

Translator Category  

NULL TRANSLATOR  

 

USER TRANSLATOR  

Reader Category

The Reader category contains the following Tiles:

Tile   Attributes   Attribute Values  

DIRECTORY READER  

DIRECTORIES  

pathname for the directory where text loading files are located  

DIRECTORY READER Tile Attribute(s)

The directories attribute specifies the full pathname for the directory that the ConText server with the Loader personality scans when looking for new files to load into a column in a table or view.

The structure of the value for pathname will vary depending on the directory naming conventions used by your operating system.

Engine Category

The Engine (Text Loading) category contains the following Tiles:

Tile   Attributes   Attribute Values  

GENERIC LOADER  

** none **  

N/A  

The GENERIC LOADER Tile does not have any attributes. In general, preferences do not need to be created for the Engine category, since the GENERIC LOADER Tile does not have attributes that can be set by the user.

Translator Category

The Translator category contains the following Tiles:

Tile   Attributes   Attribute Values  

NULL TRANSLATOR  

SEPARATE  

N/A  

USER TRANSLATOR  

COMMAND  

translator executable  

NULL TRANSLATOR Tile Attribute(s)

The separate attribute specifies that the load files do not contain the actual text of the documents to be loaded, but, rather, contain pointers to separate files where the text of the documents is stored.

See Also:

For more information about how the separate option works for loading text, see "ctxload Utility" in Chapter 9, "Executables and Utilities".  

USER TRANSLATOR Tile Attribute(s)

The command attribute specifies the name of the executable used to translate a load file into the format required by ctxload.

Note:

The specified translator executable must be stored in the appropriate directory in the Oracle home directory.

For example, in a UNIX-based environment, all translator executables must be stored in $ORACLE_HOME/ctx/bin.

In a Windows NT environment, the translator executables must be stored in ORACLE_HOME\BIN.

For more information about the required location of executable files, see the Oracle8 installation documentation for your operating system.  

Predefined and Default Preferences: Indexing

ConText provides the following predefined indexing preferences, grouped according to preference category:

Preference Category   Predefined Preferences   Default  

Data Store Category  

DEFAULT_DIRECT_DATASTORE  

***  

 

DEFAULT_OSFILE  

 

 

DEFAULT_URL  

 

 

MD_BINARY  

 

 

MD_TEXT  

 

Filter Category  

AUTOB  

***  

 

HTML_FILTER  

 

 

WW6B  

 

Lexer Category  

DEFAULT LEXER  

***  

 

KOREAN  

 

 

VGRAM_CHINESE_1  

 

 

VGRAM_CHINESE_2  

 

 

VGRAM_JAPANESE_1  

 

 

VGRAM_JAPANESE_2  

 

Engine Category  

DEFAULT_INDEX  

***  

 

THEME_LEXER  

 

Wordlist Category  

KOREAN_WORDLIST  

 

 

NO_SOUNDEX  

***  

 

SOUNDEX  

 

 

VGRAM_CHINESE_WORDLIST  

 

 

VGRAM_CHINESE_WORDLIST  

 

Stoplist Category  

DEFAULT_STOPLIST  

***  

 

NO_STOPLIST  

 

Data Store Category

The following section provides descriptions of the predefined preferences for the Data Store category.

Note:

DEFAULT_DIRECT_DATASTORE is the default preference for the Data Store preference category.  

DEFAULT_DIRECT_DATASTORE

The DEFAULT_DIRECT_DATASTORE preference calls the DIRECT Tile which is used to indicate that text is stored directly in the text column of a text table.

DEFAULT_DIRECT_DATASTORE does not use any Tile attributes because the DIRECT Tile does not have attributes.

DEFAULT_OSFILE

The DEFAULT_OSFILE preference calls the OSFILE Tile which is used to indicate that text is stored as files in a file system.

DEFAULT_OSFILE uses the PATH Tile attribute and a hardcoded set of dummy directory paths to indicate the directories in which the text files are located.

The hard-coded paths, delimited by colons are: /oracle/data, /oracle/data2, /oracle/data3.

Note:

The DEFAULT_OSFILE preference requires modification to reflect the actual paths for your text files before the preference can be used in a policy.  

DEFAULT_URL

The DEFAULT_URL preference calls the URL Tile which is used to indicate that text is stored as URLs.

DEFAULT_URL uses all of the attribute defaults for the URL Tile:

MD_BINARY

The MD_BINARY preference calls the MASTER DETAIL Tile which is used to indicate text is stored in a master detail table.

MD_BINARY uses the BINARY Tile attribute and a value of YES to indicate that the text in the table is stored in binary format:

MD_TEXT

The MD_TEXT preference calls the MASTER DETAIL Tile which is used to indicate text is stored in a master detail table.

MD_TEXT uses the Tile attribute BINARY and a value of NO to indicate that the text in the table is stored as ASCII text.

Filter Category

The following section provides descriptions of the predefined preferences for the Filter category.

Note:

DEFAULT_NULL_FILTER is the default preference for the Filter preference category.  

AUTOB

The AUTOB preference calls the BLASTER FILTER Tile which specifies an internal filter used to extract text from formatted documents in a text column.

AUTOB uses the FORMAT Tile attribute and a value of 997 to indicate that ConText uses the autorecognize filter to extract text. It can be used to filter text in a column the contains the following document formats:

Document Format   Version  

AmiPro for Windows  

1, 2, 3  

ASCII  

N/A  

HTML  

1, 2, 3  

Lotus 123 for DOS  

4, 5  

Lotus 123 for Windows  

2, 3, 4, 5  

Microsoft Word for Windows  

2, 6.x  

Microsoft Word for DOS  

5.0, 5.5  

Microsoft Word for MAC  

3, 4, 5.x  

Word Perfect for Windows  

5.x, 6.x  

WordPerfect for DOS  

5.0, 5.1, 6.0  

Xerox XIF for UNIX  

5, 6  

DEFAULT_NULL_FILTER

The DEFAULT_NULL_FILTER preference calls the FILTER NOP Tile which indicates that the text column in a text table contains plain, unformatted (ASCII) text and does not require filtering for indexing and highlighting.

DEFAULT_NULL_FILTER does not use any Tile attributes because the FILTER NOP Tile does not have attributes.

HTML_FILTER

The HTML_FILTER preference calls the HTML FILTER Tile and can be used to filter documents in a column that contains only HTML-formatted documents.

WW6B

The WW6B preference calls the BLASTER FILTER Tile which specifies that, for the BLASTER FILTER Tile, the Microsoft Word for Windows 6 internal filter is used to extract text from Word for Windows 6 documents in a text column.

WW6B uses the format Tile attribute and a value of 11 to indicate ConText uses the Word for Windows 6 filter to extract text. It can be used in a column that contains only Word for Windows 6-formatted documents.

Lexer Category

The following section provides descriptions of the predefined preferences for the Lexer category.

Note:

DEFAULT_LEXER is the default preference for the Lexer preference category.  

DEFAULT_LEXER

The predefined DEFAULT_LEXER preference calls the BASIC LEXER Tile, which indicates the lexer settings used to identify word and sentence boundaries for text indexing and text queries.

DEFAULT_LEXER uses the following Tile attributes and values to indicate the lexer settings:

Attribute   Values  

punctuations  

. ? !  

printjoins  

NULL (indicates no characters defined as printjoins for the BASIC LEXER; instead, printjoins determined by NLS initialization parameters)  

skipjoins  

NULL (indicates no characters defined as skipjoins for the BASIC LEXER; instead, skipjoins determined by NLS initialization parameters)  

continuation  

- \  

KOREAN

The KOREAN preference calls the KOREAN LEXER Tile and can be used for parsing Korean text. It has no attributes.

VGRAM_CHINESE_1 and VGRAM_CHINESE_2

The VGRAM_CHINESE preferences call the CHINESE V-GRAM LEXER Tile which indicates the preferences can be used for parsing Chinese text.

The 1 or 2 indicates that the preference uses either method 1 or 2 for identifying tokens in Chinese text (hanzi_indexing attribute).

VGRAM_JAPANESE_1 and VGRAM_JAPANESE_2

The VGRAM_JAPANESE preferences call the JAPANESE V-GRAM LEXER Tile which indicates the preferences can be used for parsing Japanese text.

The 1 or 2 indicates that the preference uses either method 1 or 2 for identifying tokens in Japanese text (kanji_indexing attribute).

THEME_LEXER

The predefined THEME_LEXER preference calls the THEME LEXER Tile, which indicates the preference can be used in a column policy to create theme indexes for a column.

The THEME_LEXER preference does not set any attributes because the THEME LEXER preference doesn't have any attributes.

Engine Category

The following section provides descriptions of the predefined preferences for the Engine category.

DEFAULT_INDEX

The DEFAULT_INDEX preference calls the GENERIC ENGINE Tile which is used to specify the amount of memory reserved for indexing.

DEFAULT_INDEX uses the index_memory attribute and specifies the amount of memory allocated for indexing: 12582912 bytes

Wordlist Category

The following section provides descriptions of the predefined preferences for the Wordlist category.

Note:

NO_SOUNDEX is the default preference for the Wordlist preference category.  

NO_SOUNDEX

The NO_SOUNDEX preference contains the GENERIC WORD LIST Tile which specifies whether Soundex word mappings are generated during text indexing. Soundex can be used in text queries to expand the query to include words that sound similar to the query terms.

NO_SOUNDEX uses the soundex_at_index Tile attribute and a value of 0 to indicate that ConText does not generate Soundex word mappings during text indexing.

SOUNDEX

The SOUNDEX preference contains the GENERIC WORDLIST Tile which specifies whether Soundex word mappings are generated during text indexing. Soundex can be used in text queries to expand the query to include words that sound similar to the query terms.

SOUNDEX uses the soundex_at_index Tile attribute and a value of 1 to indicate that ConText generates Soundex word mappings during text indexing.

Stoplist Category

The following section provides descriptions of the predefined preferences for the Stoplist category.

Note:

DEFAULT_STOPLIST is the default preference for the Stoplist preference category.  

DEFAULT_STOPLIST

The DEFAULT_STOPLIST preference specifies a list of stop words for the GENERIC STOP LIST Tile.

The preference calls the stop_word attribute for each of the following stop words:

STOPWORD  

SEQ  

STOPWORD  

SEQ  

STOPWORD  

SEQ  

STOPWORD  

SEQ  

A  

3  

COULD  

70  

MR  

18  

SUCH  

69  

ABOUT  

34  

FOR  

8  

MRS  

20  

THAN  

43  

AFTER  

63  

FROM  

17  

MS  

21  

THAT  

9  

ALL  

62  

HAD  

51  

MZ  

19  

THE  

7  

ALSO  

50  

HAS  

29  

NO  

71  

THEIR  

47  

AN  

27  

HAVE  

32  

NOT  

61  

THERE  

67  

ANY  

76  

HE  

24  

ONLY  

72  

THEY  

37  

AND  

5  

HER  

45  

OF  

1  

THIS  

35  

ARE  

28  

HIS  

44  

ON  

12  

TO  

2  

AS  

14  

IF  

58  

ONE  

40  

WAS  

26  

AT  

13  

IN  

4  

OR  

33  

WE  

57  

BE  

23  

INC  

48  

OTHER  

54  

WERE  

52  

BECAUSE  

66  

INTO  

75  

OUT  

59  

WHEN  

65  

BEEN  

49  

IS  

10  

OVER  

64  

WHICH  

36  

BUT  

30  

IT  

11  

S  

6  

WHO  

42  

BY  

16  

ITS  

22  

SO  

73  

WILL  

31  

CAN  

68  

LAST  

56  

SAYS  

41  

WITH  

15  

CO  

60  

MORE  

38  

SHE  

25  

WOULD  

39  

CORP  

53  

MOST  

74  

SOME  

55  

UP  

46  

NO_STOPLIST

The NO_STOPLIST preference contains the GENERIC STOP LIST TILE and specifies that no list of stop words is used during text indexing. All words that ConText encounters are stored in the text index.

NO_STOPLIST contains no stop_word attributes to indicate that there are no stopwords used during indexing.

Predefined and Default Preferences: Text Loading

ConText provides the following predefined text loading preferences for the three preference categories for sources:

Preference Category   Predefined Preferences   Default  

Reader Category  

DEFAULT_READER  

***  

Engine Category  

DEFAULT_LOADER  

***  

Translator Category  

DEFAULT_TRANSLATOR  

***  

Reader Category

The following section provides descriptions of the predefined preferences for the Reader category.

DEFAULT_READER

The DEFAULT_READER preference uses the DIRECTORY READER Tile, which has a dummy directory set for the Tile.

Note:

Because it is unknown which directory contains the files to be loaded and path names are operating-system specific, this preference is provided as a default only and should not be used when creating a source.

Before creating a source, you should create your own Reader preference that specifies the directory where your files to be loaded are located.  

Engine Category

The following section provides descriptions of the predefined preferences for the Text Loading Engine category.

DEFAULT_LOADER

The DEFAULT_LOADER preference uses the GENERIC LOADER Tile, which indicates the preference can be used to load text from files in a operating system directory.

Translator Category

The following section provides descriptions of the predefined preferences for the Translator category.

DEFAULT_TRANSLATOR

The DEFAULT_TRANSLATOR preference uses the NULL TRANSLATOR Tile, which indicates no translation is performed on the files to be loaded, because the files are in the format required by ctxload.

Template Policies

The following section provides a brief description of the template policies provided with ConText.

The template policies are owned by CTXSYS. A template policy can be specified as the source policy for a policy during creation.

ConText provides the following template policies:

DEFAULT_POLICY

The DEFAULT_POLICY policy can be used to create a policy which uses all of the default preferences:

Default Preferences   Characteristics  

DEFAULT_DIRECT_DATASTORE  

Text stored in database  

DEFAULT_NULL_FILTER  

No filter (text stored in plain, ASCII format)  

DEFAULT_LEXER  

Basic lexer (standard punctuation and continuation characters, no printjoin or skipjoin characters)  

DEFAULT_INDEX  

Indexing memory = 12582912 bytes, default storage/other clauses for ConText index tables and indexes  

NO_SOUNDEX  

No Soundex word mappings stored during text indexing  

DEFAULT_STOPLIST  

Stoplist is active, default list of stop words  

Note:

DEFAULT_POLICY is the default for source_policy in CREATE_POLICY and CREATE_TEMPLATE_POLICY in the CTX_DDL package.  

TEMPLATE_AUTOB

The TEMPLATE_AUTOB policy can be used to create a policy for a text column that contains documents in mixed formats. The autorecognize Blaster filter is used to automatically identify the format of each document in a column and, if the format is supported by ConText, extract the text of the document for indexing.

TEMPLATE_AUTOB uses the AUTOB predefined preference and all the remaining default preferences.

TEMPLATE_DIRECT

The TEMPLATE_DIRECT policy can be used to create a policy for indexing basic text stored in a text column.

It uses all the default preferences.

TEMPLATE_LONGTEXT_STOPLIST_OFF

The TEMPLATE_LONGTEXT_STOPLIST_OFF policy can be used to create a policy that does not use a stopword list during indexing.

It uses the NO_STOPLIST predefined preference and all the remaining default preferences.

TEMPLATE_LONGTEXT_STOPLIST_ON

The TEMPLATE_LONGTEXT_STOPLIST_ON policy can be used to create a policy that uses a stopword list during indexing.

It uses the DEFAULT_STOPLIST predefined preference and all the remaining default preferences.

TEMPLATE_MD

The TEMPLATE_MD policy can be used to create a policy for indexing plain text stored in the detail column in a master-detail table.

It uses the MD_TEXT predefined preference and all the remaining default preferences.

TEMPLATE_MD_BIN

The TEMPLATE_MD_BIN policy can be used to create a policy for indexing binary text stored in the detail column in a master-detail table.

It uses the MD_BINARY predefined preference and all the remaining default preferences.

TEMPLATE_WW6B

The TEMPLATE_WW6B policy can be used to create a policy for indexing text formatted for Microsoft Word for Windows 6.

It uses the WW6B predefined preference and all the remaining default preferences.

Supported Formats for Mixed-Format Columns

The following section lists all of the formats that ConText supports for columns that use external filters for processing documents in more than one format.

For each format, the format ID is also listed. This is the value that must be specified when creating a Filter preference using the BLASTER FILTER Tile with the executable attribute.

Note:

To index documents in any of these formats using external filters, the external filter must exist and the executable for the filter must be specified in a Filter preference using the executable attribute.  

See Also:

For more information about using format IDs in Filter preferences, see "Creating Filter Preferences" in Chapter 6, "Setting Up and Managing Text".  

Document Format   Format ID  

AmiPro 1.x - 3.1  

19  

AmiPro Graphics SDW Samna Draw  

62  

ASCII  

90  

AT&T Crystal Writer  

46  

AutoCAD (DXF, DXB)  

53  

CEOwrite 3.0  

78  

Computer Graphics Metafile (CGM)  

79  

CorelDraw 2.x and 3.x  

59  

CTOS DEF  

75  

DBase IV 1.0;

DBase III, III +  

37  

DCA/FFT - Final Form Text  

27  

DCA/RFT - Revisable Form Text  

0  

Digital DX  

15  

Digital WPS-PLUS  

47  

EBCDIC  

89  

Enable 1.1, 2.0, 2.15  

11  

Encapsulated PostScript Preview;

Encapsulated PostScript Bitmap  

66  

First Choice 3.0 Data Base  

13  

FrameMaker (MIF) 3.0;

FrameMaker (MIF) 3.0 Win  

42  

Framework III, 1.0, 1.1  

22  

FullWrite Professionl 1.0x  

31  

GIF (Graphical Interchange Format)  

51  

Harvard Graphics  

87  

HP Graphics Language (HPGL)  

83  

HTML Level 1, 2, 3  

91  

IBM Writing Assistant 1.0  

16  

IGES  

52  

Interleaf 5.2; Interleaf 5.2 - 6.0  

32  

JPEG (Joint Photographic Experts Group)  

58  

Legacy 1.x, 2.0  

41  

Lotus 123 4.x;

Lotus 123 3.0;

Lotus 123 1A, 2.0, 2.1  

20  

Lotus Freelance  

85  

Lotus Manuscript 2.0, 2.1  

26  

Lotus PIC  

67  

Macintosh Paint  

88  

Microsoft Windows Paint 2.x  

70  

Macintosh QuickDraw (PICT)  

64  

MacWrite 4.5 - 5.0  

29  

MacWrite II 1.0 - 1.1  

30  

Mass 11, Version 8.0 -8.33  

36  

MastSoft Graphics (MSG)  

49  

Micrografx Designer (DRW)  

60  

MS Access 2.0  

39  

MS Excel 5.0 - 6.0;

MS Excel 4.0;

MS Excel 3.0;

MS Excel 2.1  

21  

MS Powerpoint for Windows 2, 3, 4  

84  

MS RTF; MS RTF (ANSI Char Set)  

17  

MS Word for DOS 6.0;

MS Word for DOS 5.0, 5.5;

MS Word for DOS 4.0;

MS Word for DOS 3.0, 3.1  

8  

MS Word for Mac 5.0, 5.1;

MS Word for Mac 4.0;

MS Word for Mac 3.0  

28  

MS Word for Windows 2.0;

MS Word for Windows 1.x  

18  

MS Word for Windows 6.0;

MS Word for Mac 6.0  

68  

MS Works for Windows 3.0  

69  

MS Write for Windows 3.x  

7  

MultiMate 4;

MultiMate Advantage II;

MultiMate Advantage I;

MultiMate 3.3  

6  

Navy DIF (GSA)  

35  

OfficePower 7;

OfficePower 6  

44  

OfficeWriter 6.0 - 6.2;

OfficeWriter 5.0;

OfficeWriter 4.0  

9  

OS/2 Bitmap;

Windows Bitmap (BMP);

Windows RLE  

63  

Paradox 3.5, 4.0  

38  

PC Paintbrush (PCX)  

71  

PDF (Adobe Acrobat)  

57  

PeachText 5000 2.1.2  

82  

PFS:First Choice 3.0;

PFS:First Choice 2.0;

PFS:First Choice 1.0;

PFS:WRITE Ver C;

Professional Write 2.0 - 2.2;

Professional Write 1.0  

12  

Quattro Pro DOS;

Quattro Pro Windows  

45  

Q&A 4.0;

Q&A Write 1.x, Q&A 3.0  

10  

Rapid File 1.0  

23  

RGIP  

61  

Samna Word IV & IV + 1.0, 2.0  

25  

Sun Raster Graphics  

65  

TIFF (Tagged Image File Format)  

50  

Uniplex V7 - V8  

77  

Vokswriter 3, 4  

74  

Wang PC, Version 3  

24  

Wang WITA  

55  

Windows Clipboard  

72  

Windows ICON  

73  

Windows Metafile (WMF)  

48  

WiziDraw  

86  

WiziWord  

56  

Word For Word Intermediate Communications format (COM)  

34  

WordPerfect for Windows 6.1;

WordPerfect for Windows 6.0;

WordPerfect 6.0  

1  

WordPerfect 5.1 (Mail Merge)  

2  

WordPerfect for Windows 5.x;

WordPerfect 5.1;

WordPerfect 5.0  

3  

WordPerfect Graphics 1 (WPG)  

4  

WordPerfect Graphics 2 (WPG)  

5  

WordPerfect 4.2;

WordPerfect 4.1  

80  

WordPerfect Mac 1.0  

81  

WordPerfect Mac 3.0;

WordPerfect Mac 2.1;

WordPerfect Mac 2.0  

33  

WordStar 5.0, 5.5, 6.0, 7.0  

40  

WordStar 2000, Rel 3.0  

14  

WriteNow 3.0  

54  

Xerox - XIF 5.0, 6.0  

43  

XYWrite IV; XyWrite III Plus  

76  




Prev

Next
Oracle
Copyright © 1997 Oracle Corporation.

All Rights Reserved.

Library

Product

Contents

Index