Oracle8(TM) ConText(R) Cartridge Application Developer's Guide
Release 2.0

A54630-01

Library

Product

Contents

Index

Prev Next

5
Theme Queries

This chapter describes how to perform theme queries. The following topics are covered:

Creating a Theme Index

Theme queries are issued against a set of documents, typically stored in a text column. Before you can execute a theme query on a set of documents, you must first create a theme index. To do so, specify the THEME_LEXER as the lexer preference when you create the policy for the text column. For example:

execute ctx_ddl.create_policy('THEME_POLICY',\
'table1.text', lexer_pref => 'CTXSYS.THEME_LEXER');
Note:

ConText supports theme indexing and queries for English language documents only.

You can perform theme indexing and theme queries only with a ConText server started with the ctxsrv executable. Theme indexing and querying will not work with ctxsrvx.  

See Also::

For more information about creating theme indexes, see Oracle8 ConText Cartridge Administrator's Guide.  

Document Signatures

When you create a theme index for a set of documents in a text column, ConText creates a document signature for each document. A document signature is a collection of the main concepts or themes in the document. ConText can store up to 16 themes per document.

Each theme in the document signature has a theme vector associated with it that defines the theme as part of a hierarchy. For example if two themes in a document are computer software and telephones, ConText might generate the corresponding theme vectors with the following theme tokens and weights:

Theme Vector 1                    Weight
science and technology               40
computer industry                    40
computer software                    40

Theme Vector 2                    Weight
science and technology               30
communications                       30
telecommunications industry          30
telephones                           30

Theme Token Names

When ConText interprets a document to create the theme index, theme token names are derived from the standard names and categories in the knowledge catalog. Theme tokens in the index represent concepts in the document that might appear exactly like the token, as alternate forms of the word, or as a semantically related concept. For example, the canonical form Oracle Corporation might represent Oracle and Oracle Corp in the document.

See Also:

For more information about the knowledge catalog, see "Knowledge Catalog" in Chapter 7.  

Theme Weight

The theme weight is a measure of the strength of a theme relative to the other themes in a document. Weights are associated with theme vectors, and thus theme tokens within the same theme vector have the same weight.

For example, the tokens telephones and communications in Theme Vector 2 have the same weight of 30. When you issue a theme query, ConText uses theme weights to score hits.

Using Theme Queries

To execute a theme query, you specify a query string, which can be a sentence or a phrase with or without operators. ConText interprets your query, creating a normalized form of your query that it can use to match against document signatures. Context returns a list of documents that satisfy the query, based on certain rules, along with a score of how relevant each document is to the query.

Two-Step Query

To execute a theme query with the CTX_QUERY.CONTAINS procedure, you must specify a policy that has a theme lexer associated with it.

For example, you specify a theme query on computer software as follows:

execute ctx_query.contains('THEME_POL', 'computer software', 'CTX_TEMP');

In the above example, ConText generates theme vectors for the query computer software, which ConText attempts to match with document signatures in the theme index.

When a match is found, ConText uses the weight of the matched theme to compute a score that reflects how relevant the match is to the query; the higher the score, the more relevant the hit. ConText returns the matched document as part of the hitlist.

For example, if you issue a theme query with a token of computer software, ConText might return a match on a document that has a theme vector as follows:

Science and Technology    40
Computer Industry         40
Computer Software         40

Likewise, if you issued a query for the token science and technology, ConText returns the above document; however, performing a query on a broad term like science and technology would likely return a larger and more vague hitlist.

One-step Query

You can execute theme queries using the one-step method in SQL*Plus. The way in which ConText matches theme signatures, scores hits, and returns documents is the same as in a two-step query.

For example, to execute a theme query on computer software:

SELECT * FROM TEXTAB
WHERE CONTAINS (text, 'computer software') > 0

Multiple Policies

For a text column that has more than one policy associated with it, you must specify which policy to use in the CONTAINS clause using the pol_hint parameter. You might create two policies for a column when you want to perform both theme and text queries on the column.

For example, if the column text had a regular text policy and a theme policy THEMEPOL associated with it, you would do a theme query as follows:

SELECT ID, SCORE(0) FROM TEXTAB
WHERE CONTAINS (text, 'computer software', 0, 'THEMEPOL') > 0

When you need to specify policy in the CONTAINS function as in this example, you must also specify a placeholder, in this case 0, for the LABEL parameter.

See Also:

For more information about using the pol_hint parameter in the CONTAINS function, see the specification for CONTAINS in Chapter 9.  

Case-sensitivity

Unlike regular text queries, theme queries are case-sensitive. For example, doing a query on the common noun turkey, which describes a type of bird, will not produce a hit on the proper noun Turkey, which describes a country.

Ambiguous Queries

An ambiguous word or phrase is one that is vague or contains very little information. If your query contains an ambiguous term, ConText returns an error. An example of an ambiguous query term is the word images or the phrase good times.

Using Operators with Theme Queries

In theme queries, the following operators have the same semantics as with regular text queries:

Examples

Some valid query strings using operators are as follows:

contains(text, 'telephones & {computer industry}') > 0
contains(text, 'telephones*3 & {computer software}*.5 > 50') > 0

Thesaurus Operators

In a theme query, the thesaurus operators (synonym, broader term, narrower term etc.) work the same way as in a regular text query, provided a thesaurus has been created/loaded.

Grouping Characters

In theme query expressions, the grouping characters (), [] have the same semantics as with a regular text query.

Wildcard Characters

In theme query expressions, the wildcard characters (%, _) work the same way as in regular text queries.

Note:

There is a risk of ambiguity when using the wildcard character. For example, doing a theme query on %court% might return documents that have a theme of 'court of law' or 'tennis court'.  

Unsupported Operators

ConText does not support the following query expression operators with theme queries:




Prev

Next
Oracle
Copyright © 1997 Oracle Corporation.

All Rights Reserved.

Library

Product

Contents

Index