Oracle8(TM) ConText(R) Cartridge Application Developer's Guide
Release 2.0

A54630-01

Library

Product

Contents

Index

Prev Next

4
Understanding Query Expressions

This chapter explains how to use ConText to create query expressions to find relevant text in documents. The topics covered in this chapter are:

About Query Expressions

A query expression defines the search criteria for retrieving documents using ConText. A query expression consists of query terms (words and phrases) and other components such as operators and special characters which allow users to specify exactly which documents are retrieved by ConText.

A query expression can also call stored query expressions (SQEs) to return stored query results or call PL/SQL functions to return values used in the query.

When a query is executed using any of the methods supported by ConText, one of the arguments included in the query is a query expression. ConText then returns a list of all the documents that satisfy the search criteria, as well as scores that measure the relevance of the document to the search criteria.

Query Terms

Query terms can consist of words and phrases. Query terms can also contain stopwords.

Words and Phrases

The words in a query expression are the individual tokens on which the query expression operators perform an action. If multiple words are contained in a query expression, separated only by blank spaces (no operators), the string of words is considered a phrase and the entire string is searched for during a query.

Stopwords

Stopwords are common words, such as and, the, of, and to, that are not considered significant query terms by themselves because they occur so often in text. However, stopwords can provide useful search information when combined with more significant terms.

For example, a query for documents containing the phrase peanut butter and jelly returns different results than a query for documents containing the terms peanut butter and jelly.

When you define a policy for a column, ConText lets you identify a list of stopwords. When stopwords are encountered in the documents in the column, they are not included as indexed terms in the text index; however, they are recorded.

As a result, stopwords cannot be searched for explicitly in text queries, but can be included as part of a phrase in a query expression.

Attention:

Query expressions made up of only stopwords will result in errors. Ensure that at least one of the terms in your query expression is not a stopword.  

Stoplists can be created in any language supported by ConText. ConText provides a default stoplist in English.

Note:

Stopwords do not have an affect on the theme indexes generated by ConText for your English-language documents.  

Query Expression Components

In addition to query terms, a query expression may contain any or all of the following components:

Component   Purpose  

Operators  

Define the relationships between the terms in a query expression and specify the output returned by the query. The different types of operators are: logical, ranking, result set, proximity, expansion, and thesaurus.  

Wildcard Characters  

Expand query terms using pattern matching  

Grouping Characters  

Group terms and operators in a query expression  

Stored Query Expressions (SQEs)  

Return the results of a query that has been executed and the results stored in an SQE table  

PL/SQL Functions  

Execute a function and use the results in a query expression  

Base-Letter Queries

For languages that use an 8-bit character set, such as French and Spanish, Context gives you the option of converting characters to their base-letter representation before text indexing. This means that words with accents, umlauts, and so on are converted to their base-letter representation before their tokens are placed in the text index.

When you specify a text index that has used base-letter conversion in a query, ConText converts the term in the query expression to match the base-letter representation before the query is processed. In addition, all expansion and stopword checking for the query is performed on the base-letter terms.

Note: The terms in a thesaural query are not converted to base-letter representation before look-up in the thesaurus. The base-letter conversion takes place after the thesaurus look-up and is performed on all the terms returned by the thesaurus.:

For more information about creating an index that supports base-letter conversion, see Oracle8 ConText Cartridge Administrator's Guide.  

Query Expression Examples

The following example of a one-step query returns all articles that contain the word wine in the TEXTTAB.TEXT_COLUMN column. The query expression consists only of the query term wine, surrounded by single quotes.

	SELECT articles FROM texttab
	WHERE CONTAINS(textcol, 'wine') > 0;

The following example of a one-step query returns all articles that contain the phrase wine and roses in the TEXTTAB.TEXT_COLUMN column. The query expression consists of the query phrase wine and roses, surrounded by single quotes.

	SELECT articles FROM texttab
WHERE CONTAINS(textcol, '{wine and roses}') > 0;
See Also:

For more information about the CONTAINS function used in one-step queries, see CONTAINS in Chapter 9.  

Logical Operators

Logical operators combine the terms in a query expression. All single words and phrases may be combined with logical operators. When query terms are combined, the number of spaces around the logical operator is not significant.

Logical operators link query terms together to produce scores that are based on the relationship of the terms to each other. The logical operators combine the scores of their operands up to a maximum value of 100. Operands can be any query terms, as well as other operators.

Operator   Syntax   Description  

AND  

term1&term2

term1 and term2  

Returns documents that contain term1 and term2. Returns the minimum score of its operands. All query terms must occur; lower score taken.  

OR  

term1|term2

term1 or term2  

Returns documents that contain term1 or term2. Returns the maximum score of its operands. At least one term must exist; higher score taken.  

NOT  

term1~term2

term1 not term2  

Returns documents that contain term1 and not term2.  

EQUIVALENCE  

term1=term2

term1 equiv term2  

Specifies that term2 is an acceptable substitution for term1.  

AND Operator

Use the AND operator to search for documents that contain at least one occurrence of each of the query terms. For example, to obtain all the documents that contain the terms batman and robin and penguin, issue the following query:

	'batman & robin & penguin'
	'jupiter and saturn'

In an AND query, the score returned is the score of the lowest query term. In the example above, if the three individual scores for the terms batman, robin, and penguin is 10, 20 and 30 within a document, the document scores 10.

OR Operator

Use the OR operator to search for documents that contain at least one occurrence of any of the query terms. For example, to obtain the documents that contain the term cats or the term dogs, use one of the following:

	'cats | dogs'
'cats OR dogs'

In an OR query, the score returned is the score for the highest query term. In the example above, if the scores for cats and dogs is 30 and 40 within a document, the document scores 40.

NOT Operator

Use the NOT operator to search for documents that contain one query term and not another.

For example, to obtain the documents that contain the term animals but not dogs, use the following expression:

	'animals ~ dogs'

Similarly, to obtain the documents that contain the term transportation but not automobiles or trains, use the following expression:

	'transportation not (automobiles or trains)'

Note:

The NOT operator does not affect the scoring produced by the other logical operators.  

Equivalence Operator

Use the equivalence operator to specify an acceptable substitution for a word in a search. For example, if you want all the documents that contain the phrase alsatians are big dogs or labradors are big dogs, you can write:

'labradors=alsatians are big dogs'

ConText processes the above query faster and more efficiently than the same query written with the accumulate operator. For example, you could write the above query less efficiently and less concisely as follows:

'labradors are big dogs, alsatians are big dogs'

The savings you gain in using the equivalence operator over the accumulate operator is most significant when you have more than one equivalence operator in the query expression. For example, the following query

'labradors=alsatians are big canines=dogs'

is a more efficient, more concise form of:

'labradors are big dogs, 
alsatians are big dogs,
alsatians are big canines,
labradors are big canines'
Precedence of Equivalence Operator

The equivalence operator has higher precedence that all other operators except the unary operators (fuzzy, soundex, stem, and PL/SQL function calls).

Scoring for Logical Operators

ConText calculates a relevance score for each document that meets the selection criteria specified in the query expression. With logical operators, generally every occurrence of a query term in a document counts as 10 towards the total score, which can be 100 or less.

The following table compares how the scores are calculated for the AND (&) and the OR ( | ) operators for four different documents. The first column describes the text of the documents and the second and third columns list the scores for the queries bat & cave and bat | cave. In the second column, Context calculates the score for the AND operator by summing the scores for each query term and returning the lower sum. In the third column, ConText calculates the score for the OR operator by summing the scores for each query term and returning the higher sum.

TEXT   Query   Expression  
  bat & cave   bat | cave  

bat  

(document not returned)  

10  

cave  

(document not returned)  

10  

bat cave

bat cave

cave

cave  

20  

40  

bat

cave

cave  

10  

20  

Score-Changing Operators

Score changing operators behave like logical operators in that they return documents given the terms you specify. However, these operators affect document scores differently and, as such, can be used to change a document's rank in a hitlist with respect to a query term. The following table describes these operators:

Operator   Syntax   Description  

ACCUMULATE  

term1,term2

term1 accum term2  

Returns documents that contain term1 or term2. Calculates score by adding the score of each operand. Similar to OR, except that the returned score is the sum of all scores.  

MINUS  

term1-term2

term1 minus term2  

Returns documents that contain term1. Calculates score by subtracting occurrences of term2 from occurrences of term1.  

NEAR

 

term1;term2

term1 near term2  

Returns documents that contain term1 and term2. Calculates score based on how close term1 is to term2; a score of 100 means terms are adjacent to one another.  

WEIGHT  

term*n  

Returns documents that contain term. Calculates score by multiplying the raw score of term by n, where n is a number from 0.1 to 10.  

Accumulate Operator

Use the accumulate operator to search for documents that contain at least one occurrence of any of the query terms, where the documents that contain the most frequent occurrences of the query terms are given the highest score.

For example, to search for documents that contain either term Brazil or soccer and to have the highest scores attached to the documents that contain the most occurrences of these words, you can issue:

	'soccer,Brazil'
	

Accumulate is similar to OR, in the sense that a document satisfies the query expression if any of the terms occur in the document; however, the scoring is different. OR returns a score based only on the query term that occurs most frequently in a document. Accumulate combines the scores for all the query terms that occur in a document. Thus documents that contain the most query terms are ranked the highest.

MINUS Operator

Use the MINUS operator to search for documents that contain a query term, and when you want the presence of a second query term to cause the document to be ranked lower.

The minus operator is useful for lowering the score of documents that contain "noise". For example, suppose a query on the term cars always returned high scoring documents about Ford cars. You can lower the scoring of the Ford documents by using the expression:

'cars - Ford'

In essence, this expression returns the documents that contain the term cars. However, the score returned for a document is the number of occurrences of cars minus the number of occurrences of Ford. When a returned document does not contain Ford, the occurrence of the term Ford is counted as zero.

Near Operator

Words or phrases that occur close together are considered to be more closely associated than those that are farther apart. The proximity operator calculates a score based on how close words are to each other rather than on how often the word or phrase appears in the document.

The score for a document is the highest score out of all the query terms that occur in proximity to each other. A score of 100 is returned when the query terms are adjacent. When the terms are not adjacent, ConText returns a score based on the following formula:

100 - (number of words between the two query terms)

When there are more than 100 words separating the terms, ConText returns 1.

For example, if the query expression is 'ice;cream', the phrase I love ice cream would score 100, while the phrase ice is colder than cream would score 97. If both phrases occurred in a document, ConText retrieves the document and scores it as 100.

Weight Operator

In expressions that contain more than one query term, use the weight operator to adjust the relative scoring of the query terms. You can reduce the score of a query term by using the weight operator with a number less than 1; you can increase the score of a query term by using the weight operator with a number greater than 1 and less than 10.

The weight operator is useful in accumulate, OR, or AND queries when the expression has more than one query term. With no weighting on individual terms, the score cannot tell you which of the query terms occurs the most. If you are interested in documents that contain a particular query term more than another term, the overall ranking tells you nothing about which documents pertain to the term that you are most interested in.

For example, suppose you have a collection of sports articles. You are interested in the articles about soccer, in particular Brazilian soccer. It turns out that a regular query on soccer, Brazil returns many high ranking articles on US soccer. To raise the ranking of the articles on Brazilian soccer, you can issue the following query:

'soccer, Brazil*3'

Table 4-1 illustrates how the weight operator can change the ranking of three hypothetical documents A, B, and C, which all contain information about soccer. The columns in the table show the total score of four different query expressions on the three documents.

Table 4-1
  soccer   Brazil   soccer,Brazil   soccer,Brazil*3  

A  

20  

10  

30  

50  

B  

10  

30  

40  

100  

C  

50  

10  

60  

70  

The score in the third column containing the query soccer, Brazil is the sum of the scores in the first two columns. The score in the fourth column containing the query soccer,Brazil*3 is the sum of the score of the first column soccer plus three times the score of the second, Brazil.

With the initial query of soccer,Brazil, the documents are ranked in the order C B A. With the query of soccer,Brazil*3, the documents are ranked B C A, which is the preferred ranking.

Result-Set Operators

Use the result-set operators to control what documents are returned from a query result set. The operands for these operators are expressions, which can be an individual query term or a logical combination of query terms that use other operators.

Because these operators manipulate a result set, they cannot be embedded within each other; they must be placed at the outermost level of the query expression.

Result set operators are typically used to exclude noise from the hitlist (irrelevant documents) and to retrieve documents out of a hitlist more efficiently. There are three result set operators:

Operator   Syntax   Description  

THRESHOLD  

expression>n

term>n  

Returns only those documents in the result set that score above the threshold n.

Within an expression, selects documents that contain the query term with score of at least n.  

MAX  

expression:n  

Returns the first n highest scoring documents. For example, :20 means to return the top 20 documents in the hitlist. The value n must be an integer between 1 and 65535.  

FIRST/NEXT

 

expression#m-n  

Returns the specified number of documents as ordered in the hitlist range m to n.  

Threshold Operator

You can use the threshold operator in two ways:

Expression level

Use the expression level threshold operator to eliminate documents in the result set that score below a threshold number. For example, to search for documents that refer to relational databases and return only documents that score greater than 75, use the following expression:

	'relational databases > 75'

When you combine the threshold operator with an OR, ConText compares the higher score with the threshold. This is because OR returns the score of the highest scoring query term. For example, to search for documents that contain at least five occurrences of the word lion or five occurrences of the word tiger, you can write:

'lion | tiger > 50'

When you combine the threshold operator with an AND, ConText compares the lower score with the threshold. This is because AND returns the score of the lowest scoring query term. For example, to search for documents that contain both lion and tiger with at least five occurrences of one of the terms, you can write:

'lion & tiger > 50'

Query Term Level

Use the query term threshold operator in a query expression to select a document based on how a term scores in the document. For example, to select documents that contain at least three occurrences of lion and at least one occurrence of tiger, use:

'(lion > 30) and tiger' 

Max Operator

Use the max operator to retrieve a given number of the highest scoring documents. For example, to obtain the twenty highest scoring documents that contain the word dance, you can write:

'dance:20'

The max operator is particularly useful to prevent writing a large number of records to the hitlist table, which could result in performance degradation.

Note:

The max operator cannot be used with the CTX_QUERY.COUNT_HITS function or with in-memory queries.  

First/Next Operator

Use the first/next operator to return a specified range of documents from the hitlist.

Note:

In a first/next query, the order of the returned documents is not based on score or textkey. ConText returns the documents based on the order in which it encounters the documents in the queried text column  

For example, to return the first 10 documents encountered by ConText that contain the term dog, use the following expression:

	'dog#1-10'

You could then return the next 10 documents using the following expression:

	'dog#11-20'

The first/next operator can be used to create an application interface in which query results (rows in the hitlist) are returned incrementally. Because the query results are returned incrementally, query response is generally faster. The application can display the hitlists in a more manageable size, and control can be returned to the user faster.

Note:

The first/next operator cannot be used with the CTX_QUERY.COUNT_HITS function or with in-memory queries.  

Combined First/Next and Max Queries

You can use the first/next operator extract chunks of a sorted hitlist returned by the max operator. For example, if you use the max operator to return only the highest scoring 50 documents that contain the term cat, you can extract the first 10 documents from the 50 as follows:

	'cat:50#1-10'

Note:

Placing the max operator inside the first/next operator as such is the only instance in which you can embed the max operator in a query expression.  

Expansion Operators

The expansion operators expand a query expression to include variants of the query term supplied by the user. There are three kinds of expansion operators:

Operator   Syntax   Description  

STEM  

$term  

Expands a query to include all terms having the same stem or root word as the specified term.  

SOUNDEX  

!term  

Expands a query to include all terms that sound the same as the specified term (English-language text only).  

FUZZY  

?term  

Expands a query to include all terms with similar spellings as the specified term (English-language text only).  

The expansion operators are unary operators. They may be used in combination with each other and with any other operators described in this chapter. In addition, searches can be broadened by performing an expansion on an expansion.

The methods used by the expansion operators to perform stemming, fuzzy matching, and soundex matching for a text column are determined by the Wordlist preference in the policy for the column.

See Also:

For more information about setting up preferences and policies, see Oracle8 ConText Cartridge Administrator's Guide.  

Stem Expansions

Use the STEM ($) operator to search for terms that have the same linguistic root as the query term. For example:

Input   Expands To  

$scream  

scream screaming screamed  

$distinguish  

distinguish distinguished distinguishes  

$guitars  

guitars guitar  

$commit  

commit committed  

$cat  

cat cats  

$sing  

sang sung sing  

The ConText stemmer, licensed from Xerox Corporation's XSoft Division, supports the following languages: English, French, Spanish, Italian, German, and Dutch.

Note:

If STEM returns a stopword, the stopword is not included in the query or highlighted by CTX_QUERY.HIGHLIGHT.  

Soundex Expansions

The soundex (!) operator enables searches on words that have similar sounds; that is, words that sound like other words. This function allows comparison of words that are spelled differently, but sound alike in English.

Soundex in ConText uses the same logic as the soundex function in SQL to search for words that have a similar sound. It returns all words in a text column that have the same soundex value.

The following example illustrates the results that could be returned for a one-step query that uses SOUNDEX:

	SELECT ID, COMMENT FROM EMP_RESUME
	WHERE CONTAINS (COMMENT, '!SMYTHE') > 0
ID COMMENT
-- ------------ 23 Smith is a hard worker who...
Note:

SOUNDEX works best for languages that use a 7-bit character set, such as English. It can be used, with lesser effectiveness, for languages that use an 8-bit character set, such as many Western European languages.

For more information about the SOUNDEX function in SQL, see Oracle8 Server SQL Reference.  

Fuzzy Expansions

Fuzzy (?) expansions generate words that are spelled similarly. This type of expansion is helpful for finding more accurate results when there are frequent misspellings in the documents in the database.

Unlike the stem expansion, the number of words generated by a fuzzy search depends on what is in the text index; results can vary significantly according to the contents of the database index.

For example:

Input   Expands To  

?cat  

cat cats calc case  

?feline  

feline defined filtering  

?apply  

apply apple applied April  

?read  

lead real  

Note:

Fuzzy works best for languages that use a 7-bit character set, such as English. It can be used, with lesser effectiveness, for languages that use an 8-bit character set, such as many Western European languages. Also, the Japanese lexer provides limited fuzzy matching.

In addition, if fuzzy returns a stopword, the stopword is not included in the query or highlighted by CTX_QUERY.HIGHLIGHT.  

Penetration in Expansion Operators

Penetration allows complex query expansions to be expressed in short concise notation. Penetration is a system of notation for query expressions and does not affect the meaning of the expansion operators or the order in which operations are performed; it is a tool to help you generate non-ambiguous queries using the expansion operators.

Penetration applies the expansion operators to each term within an explicit expression (i.e., an expression delimited by parentheses or braces). Any expansion operators outside an expression delimited by parentheses ( ) or braces { } is applied to each word or phrase inside the expression.

For example:

Query Before Penetration   Query After Penetration  

?(dog, cat, mouse)  

?dog , ?cat , ?mouse  

?(dog,!(cat & mouse))  

?dog , (!?cat & !?mouse)  

?((cat=feline) meows)  

(?cat = ?feline) ?meows  

In the first example, a fuzzy expansion is performed on each term.

In the second example, a fuzzy expansion is performed on each term and a soundex expansion is performed only on the terms cat and mouse because cat and mouse are enclosed in a separate set of parentheses

In the third example, a fuzzy expansion is performed on each term, including both equivalence terms.

Note:

Expansion operators do not penetrate expressions delimited by brackets [ ].  

Base-letter Support

If you have base-letter conversion specified for a text column and the query expression contains a SOUNDEX or FUZZY operator, ConText operates on the base-letter form of the query.

The STEM operator does not support base-letter conversion.

Thesaurus Operators

The thesaurus operators expand a query for a single term (word or phrase) using a thesaurus that defines relationships between the user-specified term and other semantically related terms.

There are ten kinds of thesaurus operators, corresponding to the ten types of relationships that can be defined in an ISO2788 standard thesaurus.

Operator   Syntax   Description  

SYNONYM  

SYN(term[,thes])  

Expands a query to include all the terms defined in the thesaurus as synonyms for the specified word  

PREFERRED  

PT(term[,thes])  

Replaces the specified word in a query with the term defined in the thesaurus as the preferred term for the specified word.  

RELATED  

RT(term[,thes])  

Expands a query to include all the terms defined in the thesaurus as a related term for the specified word.  

TOP  

TT(term[,thes])  

Replaces the specified word in a query with the term defined in the thesaurus as the top term in the standard hierarchy (BT, NT) for the specified word.  

BROADER  

BT(term[,level[,thes]])  

Expands a query to include the term defined in the thesaurus as a broader term for the specified word  

NARROWER  

NT(term[,level[,thes]])  

Expands a query to include all the lower level terms defined in the thesaurus as narrower terms for the specified word  

BROADER GENERIC  

BTG(term[,level[,thes]])  

Expands a query to include all terms defined in the thesaurus as a broader generic terms for the specified word.  

NARROWER GENERIC  

NTG(term[,level[,thes]])  

Expands a query to include all the lower level terms defined in the thesaurus as narrower generic term for the specified word  

BROADER PARTITIVE  

BTP(term[,level[,thes]])  

Expands a query to include all the terms defined in the thesaurus as broader partitive terms for the specified word.  

NARROWER PARTITIVE  

NTP(term[,level[,thes]])  

Expands a query to include all the lower level terms defined in the thesaurus as narrower partitive term for the specified word  

Internally, ConText processes the expansion by bracketing each individual term returned by the expansion, then the terms are accumulated together using the ACCUMULATE operator.

For example, if bird, birdy, and avian are all synonyms:

SYN(bird) is expanded to {bird},{avian},{birdy}.

If a term in a thesaural query does not have corresponding entries in the specified thesaurus, no expansion is produced and the term itself is used in the query.

See Also:

For more information about thesaural relationships and creating thesauri, see Oracle8 ConText Cartridge Administrator's Guide.  

Limitations

The thesaurus operators can be used in conjunction with all the other query expression operators and special characters supported by ConText, with the exception of the near operator.

The maximum length of the expanded query is 32000 characters.

Thesaural operations cannot be nested. For example, the following query is not allowed.

	SYN(BT(bird))

Thesaurus Arguments

The thesaurus operators are implemented in ConText as PL/SQL functions, and, as such, have arguments that must be specified with the operator. All of the notational conventions and usage rules for PL/SQL apply to the thesaurus operators.

The thesaurus operators have the following arguments:

term

Specify the operand for the thesaurus operator. You must specify a term when using the NT operator. For preferred term (PT) and top term (TT) queries, term is replaced by the preferred term/top term defined for the term in the specified thesaurus; however, if no PT or TT entries are defined for the term, the term is not replaced and is used in the query.

For all other thesaural queries, term is expanded to include the synonymous, related, broader, or narrower terms defined for the term in the specified thesaurus.

level

Specify the number of levels traversed in the thesaurus hierarchy to return the broader (BT, BTG, BTP) or narrower (NT, NTG, NTP) term for the specified term. For example, a level of 1 in a BT query returns only the broader term, if one exists, for the specified term. A level of 2 returns the broader term for the specified term, as well as the broader term, if one exists, for the broader term.

The level argument is optional and has a default value of one (1). Zero or negative values for the level argument return only the original query term.

thes

Specify the name of the thesaurus used to return the expansions for the specified term. The thes argument is optional and has a default value of DEFAULT. As a result, a thesaurus named DEFAULT must exist in the thesaurus tables before using any of the thesaurus operators.

Synonym Operator

Use the SYNONYM operator (SYN) to expand a query to include all the terms that have been defined in a thesaurus as synonyms for a specified term.

The following excerpt illustrates a one-step query which returns all documents that contain the term tutorial or any of the synonyms defined for tutorial in the DEFAULT thesaurus:

	...CONTAINS(textcol, 'SYN(tutorial)')...

Compound Phrases in Synonym Operator

Expansion of compound phrases for a term in a synonym query are returned as AND conjunctives.

For example, the compound phrase temperature + measurement + instruments is defined in a thesaurus as a synonym for the term thermometer. In a synonym query for thermometer, the query is expanded to:

	{thermometer},({temperature}&{measurement}&{instruments})

Note:

In a thesaurus, compound phrases can only be defined in synonym relationships for a term.  

Preferred Term Operator

Use the PREFERRED TERM operator (PT) to replace a term in a query with the preferred term that has been defined in a thesaurus for the term.

For example, the term building has a preferred term of construction in a thesaurus. A PT query for building returns all documents that contain the word construction. Documents that contain the word building are not returned.

Related Term Operator

Use the RELATED TERM operator (RT) to expand a query to include all terms with the preferred term that has been defined in a thesaurus for the term.

For example, the term building has a preferred term of construction in a thesaurus. A PT query for building returns all documents that contain the word construction. Documents that contain the word building are not returned.

Broader Term Operator

Use the broader term operators (BT, BTG, BTP) to expand a query to include the term that has been defined in a thesaurus as the broader or higher level term for a specified term. They can also expand the query to include the broader term for the broader term and the broader term for that broader term, and so on up through the thesaurus hierarchy.

Note:

The hierarchy can contain three separate branches, represented by the three broader term operators. In a broader term query, the specified operator only searches up the designated branch of the hierarchy.  

The following excerpt illustrates a one-step query which returns all documents that contain the term tutorial or the BT term defined for tutorial in the DEFAULT thesaurus:

	...CONTAINS(textcol, 'BT(tutorial)')...

Narrower Term Operator

Use the narrower term operators (NT, NTG, NTP) to expand a query to include all the terms that have been defined in a thesaurus as the narrower or lower level terms for a specified term. They can also expand the query to include all of the narrower terms for each narrower term, and so on down through the thesaurus hierarchy.

Note:

The hierarchy can contain three separate branches, represented by the three narrower term operators. During a narrower term query, the specified operator only searches down the designated branch of the hierarchy.  

The following on-step query excerpt illustrates a query which returns all documents that contain either the term tutorial or any of the NT terms defined for tutorial in the DEFAULT thesaurus:

	...CONTAINS(textcol, 'NT(tutorial)')...

Broader and Narrower Term Operator on Homographs

If a homograph (a word or phrase with multiple meanings, but the same spelling) appears in two or more nodes in the same hierarchy branch of a thesaurus, a qualifier is required for each occurrence of the term in the branch.

If the qualifier is not specified for a homograph in a broader or narrower term query, the query expands to include all of the broader/narrower terms for the homograph.

For example, if machine is a broader term for crane (building equipment) and bird is a broader term for crane (waterfoul):

BT(crane) expands to {crane},{machine},{bird}

If the qualifier for a homograph is specified in a broader or narrower term query, only the broader/narrower terms for the qualified homograph are returned.

Using the previous example:

BT(crane{(waterfoul)}) expands to {crane},{bird}

Note:

When specifying a qualifier in a broader or narrower term query, the qualifier and its notation (parentheses) must be escaped, as is shown in this example.  

Top Term Operator

Use the TOP TERM operator (TT) to replace a term in a query with the top term that has been defined for the term in the standard hierarchy (BT, NT) in a thesaurus. Top terms in the generic (BTG, NTG) and partitive (BTP, NTP) hierarchies are not returned.

For example, the term tutorial has a top term of learning systems in the standard hierarchy of a thesaurus. A TT query for tutorial returns all documents that contain the phrase learning systems. Documents that contain the word tutorial are not returned.

Base-letter Support for Thesaural Queries

When ConText processes a query on a base-letter index and the expression contains a thesaurus operator, ConText looks up the query term in the thesaurus without converting the query to base-letter. The expansions obtained from the thesaurus are converted to base-letter and looked up subsequently within the index according to query rules.

This sequence of look-up enables base-letter queries to work independent of whether the thesaurus is in base-letter form. However, if the keys in the thesaurus are in base letter form, these keys will not match the corresponding non-base letter form query terms. When you have a base-letter thesaurus, you must specify the base-letter form in the query.

Operator Precedence

Query expressions are evaluated in order from left to right according to the precedence to their operators. Operators with higher precedence are applied first. Operators of equal precedence are applied in order of their appearance in the expression from left to right.

Within query expressions, operators have the following order of evaluation from highest precedence to lowest:

Near  

;  

Stem  

$  

Fuzzy  

?  

Soundex  

!  

Equivalence  

=  

Control operators  

> * :  

MINUS  

-  

AND  

&  

OR  

|  

Accumulate  

,  

Precedence Examples
Query Expression   Order of Evaluation  

w1 | w2 & w3  

(w1) | (w2 & w3)  

w1 & w2 | w3  

(w1 & w2) | w3  

?w1 , w2 | w3 & w4  

(?w1) , (w2 | (w3 & w4))  

w1 * 5 > 35 : 8  

(w1 * 5) > 35 : 8  

?abc = def ghi & jkl = mno  

((?abc = def) ghi) & (jkl=mno)  

In the first example, because AND has a higher precedence than OR, the query returns all documents that contain w1 and all documents that contain both w2 and w3.

In the second example, the query returns all documents that contain both w1 and w2 and all documents that contain w3.

In the third example, the fuzzy operator is first applied to w1, then the AND operator is applied to arguments w3 and w4, then the OR operator is applied to term w2 and the results of the AND operation, and finally, the score from the fuzzy operation on w1 is added to the score from the OR operation.

In the fourth example, term w1 is weighted by a factor of 5; results are only returned if the score is greater than 35; and only the 8 highest scoring documents that meet the search criteria will be returned.

Altering Precedence

Precedence is altered by grouping characters as follows:

Wildcard Characters

Wildcard characters can be used in query expressions to expand word searches into pattern searches. The wildcard characters are:

Wildcard Character   Description  

%  

The percent wildcard specifies that any characters can appear in multiple positions represented by the wildcard.  

_  

The underscore wildcard specifies a single position in which any character can occur.  

For example, the following abbreviated one-step query finds all terms beginning with the pattern scal in a column named text:

	...contains(TEXT, 'scal%') > 0

Note:

To expand the wildcard query, ConText uses the word list for the text column and rewrites the query with these terms. When your wildcard query expands to a number of terms greater than the maximum allowed in a query, ConText returns an error.

In addition, if a wildcard expression translates to a stopword, the stopword is not included in the query or highlighted by CTX_QUERY.HIGHLIGHT.  

Grouping Characters

The grouping characters control operator precedence by grouping query terms and operators in a query expression. The grouping characters are:

The beginning of a group of terms and operators is indicated by an open character from one of the sets of grouping characters. The ending of a group is indicated by the occurrence of the appropriate close character for the open character that started the group. Between the two characters, other groups may occur.

For example, the open parenthesis indicates the beginning of a group. The first close parenthesis encountered is the end of the group. Any open parentheses encountered before the close parenthesis indicate nested groups.

Brackets perform the same function as the parentheses, but prevent penetration for the expansion operators.

Stored Query Expressions

You can store the results of a query expression and then call the SQE later in a quewry expression to return the stored results. To call a stored query expression, use the SQE operator.

Operator   Syntax   Description  

Stored Query Expression

 

SQE(SQE_name)  

Returns the stored result of an SQE.  

The advantage of calling an SQE in a query expression, rather than specifying query terms, is that the results are typically returned faster, since ConText does not have to query the text table directly.

In addition, SQEs can be used to perform iterative queries, in which an initial query is refined using one or more additional queries.

Using Stored Query Expressions

The process for using stored query expressions is:

  1. Call CTX_QUERY.STORE_SQE to store the results for the text column or policy. With STORE_SQE, you specify a name for the SQE, a policy (which identifies the text column for the SQE), a query expression, and whether the SQE is a session or system SQE
  2. Call the stored query expression in the query expression of a text (or theme) query. ConText returns the results of the SQE in the same way it returns the results of a regular query. If the results of the SQE are out-of-date, ConText automatically re-evaluates the SQE before returning the results.
    Note:

    Because ConText must first determine if the results are out-of-date with respect to the document index, many changes to the index though inserting, deleting, and updating documents will slow down the retrieval of the stored query expression results.  

Administration of stored query expressions can be performed using the REFRESH_SQE, REMOVE_SQE, and PURGE_SQE procedures in the CTX_QUERY PL/SQL package.

Example

To create a session SQE named PROG_LANG, use CTX_QUERY.STORE_SQE as follows:

	exec ctx_query.store_sqe('emp_resumes', 'prog_lang', 	'cobol', 'session');

This SQE queries the text column for the EMP_RESUMES policy (in this case, EMP.RESUMES) and returns all documents that contain the term cobol. It stores the results in the SQE table for the policy.

PROG_LANG can then be called within a query expression as follows:

	select score, docid from emp
where contains(resume, 'sqe(prog_lang)')>0
order by score;

Session and System SQEs

When you initially create an SQE using CTX_QUERY.STORE_SQE, you can specify whether the SQE is for the current session or for all sessions (system SQE).

You can use session SQEs only in the current session. These SQEs are stored only for the duration of the session. When a session is terminated, all session SQEs created during the session are deleted from the SQE tables. If you want to use a session SQE in another session, you must recreate the SQE.

System SQEs can be used in all sessions, including concurrent sessions. When a session is terminated, system SQEs created during the session are not deleted from the SQE tables and can be used in future sessions.

Re-evaluation of Stored Query Expressions

If the text column referenced by an stored query expression has been modified since the stored query expression was created, the stored query expression results may be out-of-date. Before returning the results of an stored query expression in a query expression, ConText verifies that the results are current. If they are not current, ConText automatically evaluates the differences and updates the results.

ConText also verifies that any stored query expressions nested within an stored query expression have up-to-date results.

Note:

ConText does not verify whether PL/SQL functions in stored query expressions have been updated. If a PL/SQL function in an stored query expression has been updated, the stored query expression must be manually re-evaluated.  

Result lists in stored query expression tables may get fragmented by consecutive re-evaluations. You can resolve fragmentation by calling CTX_QUERY.REFRESH_SQE.

Iterative Queries

Iterative queries are queries built on other queries to refine or add to the result set of the original query. Once you define a stored query expression, you can add additional search criteria in two ways:

Extending the Expression in the CONTAINS Procedure

Sometimes you might want to add a condition to a stored query expression to re-define your search criteria. You can do so by extending the query with additional operators when you call CTX_QUERY.CONTAINS. When you extend stored queries in this way, the response time is usually faster than an equivalent query without the SQE operator.

For example, you find that wildcard queries take a long time to process. You therefore define a wildcard query as a stored query expression, Q1, to return all documents indexed under policy pol that have words beginning with the letter z:

ctx_query.store_sqe('pol', 'Q1', 'z%', 'session');

You then extend the query by adding an OR condition: You ask for all documents indexed under policy pol that contain words beginning with the letter z or contains the word cat:

ctx_query.contains('pol', 'SQE(Q1) | cat', 'ctx_temp');

Internally, ConText must still use the text index to find those documents that might have the word cat but not z%; however, the response time is generally much faster than the following equivalent query:

ctx_query.contains('pol', 'z% | cats', 'ctx_temp');

Nesting Stored Query Expressions

You can use stored query expressions to define other stored query expressions. This is useful when you want to refine the result set returned from a stored query expression.

For example, you define the stored query expression, Q1 as follows:

ctx_query.store_sqe('pol', 'Q1', 'lions | tigers', 'session');

You then want to reduce this hitlist by adding another condition, so you define Q2 as follows:

ctx_query.store_sqe('pol', 'Q2', 'SQE(Q1) and zoos', 'session');

You then execute Q2 as follows:

ctx_query.contains('pol', 'SQE(Q2)', 'ctx_temp');

This query searches for all documents that contain the terms lions or tigers and zoos. It is generally faster that the following equivalent query:

ctx_query.contains('pol', 'lions | tigers and zoos', 'ctx_temp');

SQE Tables

Each stored query expression is stored in two tables: a central or system table owned by CTXSYS and an text index table attached to the policy for which the stored query expression was created.

The table owned by CTXSYS is an internal table which stores the stored query expression definitions for all the stored query expressions that have been created for all existing policies. It cannot be accessed directly, but can be viewed through two views, CTX_SQES (users with CTXADMIN role) and CTX_USER_SQES (users with CTXAPP and CTXADMIN roles).

The table used to store the results of an stored query expression for a text column is one of the tables created automatically when the column is indexed; however, the SQR table is only populated when an stored query expression is created and updated when an stored query expression is re-evaluated.

The tablespace, storage clause, and other parameters used to create the SQR table are specified by the Engine preference in the policy for the text column of the stored query expression.

Note:

Similar to the other ConText index tables, the SQR table is an internal table that is accessed only by ConText when an stored query expression is processed in a query.

For more information about policies, preferences, text indexing, and the structure of the stored query expression tables and views, see Oracle8 ConText Cartridge Administrator's Guide.  

Using Operators in Stored Query Expressions

You can use all query expression operators in stored query expressions, with the following exceptions:

Stored query expressions also support all of the special characters and other components that can be used in a query expression, including PL/SQL functions and other stored query expressions.

PL/SQL in Query Expressions

In a query expression, you can call a PL/SQL function that returns a value. The syntax for the PL/SQL operator is as follows:

Syntax   Description  

@owner_name.fname(arg1, arg2, ...,argn)

execute owner_name.fname()

exec owner_name.fname()  

Executes fname() where fname() returns a value. Return values that are not of type VARCHAR2 are cast into strings when possible. If fname() does not return a value, an exception is raised.

In keeping with the notational conventions for PL/SQL, the arguments for the function must be enclosed in parentheses. All characters (including grouping characters) included within the parentheses of the PL/SQL function are not considered as query operands or operators. Any characters outside the PL/SQL parentheses are processed according to the grouping rules used by ConText.  

Example

Calling a PL/SQL function within a query is useful for when you need to convert words to alternate forms. For example, you can call a function that takes acronyms and returns the expanded string.

Suppose you, as user ctxuser, create a function named CONVERT that takes an acronym as input and returns the fully-expanded version of the acronym. Then, to obtain all documents that contain either IBM or International Business Machine, your query expression is the following:

'execute ctxuser.convert(IBM), IBM'

Likewise, you can call a PL/SQL function that translates words. For example, you can call a function french that converts an English word to its French equivalent. You can then search on the French word for cat by issuing the following query:

'@ctxuser.french(cat)'

Escaping Reserved Words and Characters

Reserved words are words that have special meaning to query expressions (and, or, accum, execute) and cannot be included in query expressions as query terms without explicitly identifying them with braces { }. Words and characters enclosed in braces are referred to as escape sequences.

Likewise, any symbols used to represent operators (& | ,) and any other query expression components are considered characters that must be escaped if used as literals in query expressions.

The following is a list of ConText reserved words and characters that must be escaped to be searched on:

Operator/Character   Reserved Word   Equivalent Reserved Character  

And  

AND  

&  

Or  

OR  

|  

Accumulate  

ACCUM  

,  

Minus  

MINUS  

-  

Near  

(none)  

;  

Stem  

(none)  

$  

Soundex  

(none)  

!  

Fuzzy  

(none)  

?  

Threshold  

(none)  

>  

Weight  

(none)  

*  

First/Next  

(none)  

#  

Max  

(none)  

:  

Wildcard (multiple)  

(none)  

%  

Wildcard (single)  

(none)  

_  

Grouping (parentheses)  

(none)  

( )  

Grouping (brackets)  

(none)  

[ ]  

Escape  

(none)  

{ }  

PL/SQL call  

EXECUTE

EXEC  

@  

Stored Query Expression  

SQE  

(none)  

Synonym  

SYN  

(none)  

Preferred  

PT  

(none)  

Related  

RT  

(none)  

Top  

TT  

(none)  

Broader  

BT  

(none)  

Narrower  

NT  

(none)  

Broader Generic  

BTG  

(none)  

Narrower Generic  

NTG  

(none)  

Broader Partitive  

BTP  

(none)  

Narrower Partitive  

NTP  

(none)  

Example

In the following examples, an escape sequence is necessary because each expression contains a query expression operator or reserved symbol:

'{AT&T}'

'{high-voltage}

Note:

It is not necessary to have more than one escape sequence in a query, because everything within a set of braces is considered part of the escape sequence. For example, the query Peter{,} Paul{,} {and} Mary is identical to {Peter, Paul, and Mary}.  

'

Querying the Literal Escape Character

The open brace { signals the beginning of the escape sequence, and the closed brace } indicates the end. Everything between the opening brace and the closing brace is part of the query expression (including any open brace characters). To include the close brace character in a query expression, use }}.

Querying with Special Characters

Context indexes text by identifying tokens (words). For English and most European languages it assumes that blank spaces delimit tokens. At index time, ConText must also know how to interpret punctuation characters and characters that occur within words and numbers. Such special characters must be defined in the BASIC LEXER preference. They are described as follows:

Type of Character   Description  

Punctuations  

Characters that delimit the end of sentences such as the period '.' and question mark '?' and those that occur next to words and numbers, such as the comma ',' and the dollar sign '$'. These characters are not indexed.  

Continuation  

Characters that indicate a word continues on the next line. An example is the hyphen '-'. These characters are not indexed.  

Printjoins  

Characters that join words together such as hyphen '-'. These characters are indexed with the hyphen.  

Skipjoins  

Characters that join words together such as hyphen '-'. These characters are indexed without the hyphen.  

Numjoin  

Characters that occur in numbers such as the decimal point '.'. These characters are indexed.  

Numgroup  

Characters that group digits within a number such as the comma ','. These characters are indexed.  

In the BASIC LEXER preference, ConText defines a default set of characters for each group.

The way you query on tokens that contain these characters depends on how ConText indexes the tokens containing these characters. This is because ConText tokenizes words at query time the same way it tokenizes words at index time. To query on words or numbers that contain special characters, you must know how these words are represented in the index.

See Also:

For more information about defining special characters for the BASIC LEXER preference, see Oracle8 ConText Cartridge Administrator's Guide.  

Querying with Punctuation and Continuation Characters

Punctuation and continuation characters are not indexed with the words they occur next to or with, and thus are ignored by ConText at query time. The following table shows how ConText strips punctuation characters at query time:

Query   Equivalent Query  

'John swims fast. Sharks eat.'  

'John swims fast sharks eat'  

'John swims. Fast sharks eat.'  

'John swims fast sharks eat'  

'{John swims, fast sharks eat}'  

'John swims fast sharks eat'  

'{SHAZAM!}'  

'SHAZAM'  

'{$250}'  

'250'  

'{#101}'  

'101'  

'{phone#}'  

'phone'  

Suggestion:

Because ConText strips punctuation characters at query time, leaving them out of the query expression and using the equivalent query might be a better approach, especially when the characters are reserved as in the last five examples.  

Querying with Printjoins and Skipjoins

Printjoins and skipjoins are characters such as hyphens that join words together.

When you define a character as a printjoin, such as a hyphen, you specify that the words on either side of the hyphen are to be indexed with the hyphen. For example, sister-in-law is indexed as the token sister-in-law.

When you define a character as a skipjoin, such as a hyphen, you specify that the two words on either side of the hyphen are to be indexed as one token without the hyphen. For example, sister-in-law is indexed as sisterinlaw.

To query on words that contain a join character, you must know if the character is defined as a skipjoin or printjoin in the BASIC LEXER preference.

For example, if the hyphen character is defined as a printjoin, you must write your query with the hyphen, since the indexed token contains the hyphen. Thus, to query on all the documents that contain the term sister-in-law, you must write your query as follows with the hyphen:

'{sister-in-law}'

Note:

The '-' character must be escaped, or else ConText interprets it as the MINUS operator.  

However, if the hyphen character is defined as a skipjoin, you must write your query without the hyphen. Thus, to query on all documents that contain sister-in-law, you must write your query as:

'sisterinlaw'

This query really returns all documents that contain sisterinlaw and sister-in-law, provided the hyphen is defined as a skipjoin.

Querying with Numjoins and Numgroups

Numjoin and numgroup characters are characters that can appear in numbers, such as the decimal point and the comma.

Numjoin

A numjoin is a character that occurs once in a string of digits, such as a decimal point, and gets indexed with the number. (ConText defines the decimal as a default numjoin character for the BASIC LEXER preference.) For example, the number 3.14 is indexed as 3.14. Thus to query on 3.14 with the decimal point defined as a numjoin character, you write:

'3.14'

When you define the numjoin character to be NULL, Context indexes 3.14 as the two separate numbers 3 and 14.

Note:

When a period follows a number such as at the end of a sentence, ConText knows to index the number without the decimal point. For example, the number fourteen in the following sentence gets indexed as 14 without the period:

The score was San Francisco 21, Dallas 14.  

Numgroup

A numgroup is a character such as a comma that groups digits together in a number. Numgroup characters get indexed with the number. (ConText defines the comma as a default numjoin character for the BASIC LEXER preference.) For example, the number 6,344,555 gets indexed as 6,344,555.

To query on a number that contains numgroup characters, you must write the query with the numgroup character. For example, to query on 6,344,555, you write:

'{6,344,555}'

Note that the comma must be escaped.

Note:

When you have the comma defined as a numgroup character, you must query on numbers using the comma. That is, a query on {1,000} does not return documents that contain 1000 without the comma. A better query is with the equivalence operator:

'{1,000 }=1000'  

When you define the numgroup character as NULL, numbers such as 1,000 get indexed as 1 and 000.




Prev

Next
Oracle
Copyright © 1997 Oracle Corporation.

All Rights Reserved.

Library

Product

Contents

Index