Oracle8 ConText Cartridge QuickStart
Release 2.0
A54627_01

Library

Product

Contents


Prev Next

3
Using Theme Queries and the Linguistic Services

This chapter provides a quick description of the tasks that must be performed to enable theme queries for ConText, as well as to generate linguistic output for use in an application. It also provides examples of theme queries and queries using linguistic output.

Note:

Theme queries and the Linguistic Services are only available for English-language text.  

The following topics are covered in this chapter:

Theme Query Task Map

QuickStart Tasks for Theme Queries

Perform the following tasks to set up a text column in a table, index the column, and perform theme queries on the column:

Linguistic Services Task Map

QuickStart Tasks for Linguistic Services

Perform the following tasks to generate linguistic output for the documents in the column and to query the generated output:

Startup and Hot Upgrade

The first two setup tasks for theme queries and the Linguistic Services are:

Start ConText Servers

Similar to text indexing and queries, theme indexing is performed by ConText servers with the DDL (D) personality and theme queries are processed by ConText servers with the Query (Q) personality.

However, to enable a ConText server to generate linguistic output through the Linguistic Services, the Linguistic (L) personality must be specified for the server.

Note:

For generating theme indexes, performing theme queries, and/or requesting the Linguistic Services, do not use the ctxsrvx executable to start ConText server processes.  

The following command starts a ConText server with the appropriate personalities for creating a theme index and performing theme queries

        $ ctxsrv -user ctxsys/ctxsys -personality DQ -log ctx.log &

The following command starts a ConText server with the appropriate personality for generating linguistic output:

        $ ctxsrv -user ctxsys/ctxsys -personality L -log ctx.log &

Suggestion:

Once linguistic output has been generated for a column, if the text in the column is not going to change, the Linguistic personality is not needed and the server can be shut down.  

See Also:

Oracle8 ConText Cartridge Administrator's Guide  

Perform Hot Upgrade of Columns

The procedure for upgrading a column for theme indexing is the same as the procedure for text indexing, except that the lexer used in the policy for theme indexing is not the same as the lexer (BASIC LEXER) used in the text indexing policy.

To create a theme indexing policy, the Theme Lexer is specified for the policy, using the predefined Lexer preference, THEME_LEXER.

The following example illustrates using a PL/SQL block to create a column policy named ctx_thidx for theme indexing:

begin

ctx_ddl.create_policy('ctx_thidx',

                      'ctxdev.docs.text'

                      lexer_pref => 'THEME_LEXER');

end;

Note:

The only difference between this theme indexing policy and the text indexing policy created in"Perform Hot Upgrade of Columns" in Chapter 2, besides the policy names, is the specification of THEME_LEXER as the Lexer preference for the theme indexing policy.

In addition, the table and column name for both policies are the same, which results in the column having two policies and, subsequently, two indexes: one for text queries and one for theme queries.  

See Also:

Oracle8 ConText Cartridge Administrator's Guide  

Theme Queries

Theme queries search the text column(s) of the queried table(s) for specified themes and returns all rows (i.e. documents) that have the specified themes. A theme represents a major topic or developed subject in a document.

Before you can perform theme queries, you must perform the following tasks:

Once these tasks have been performed, you can perform theme queries for your documents. Some of the issues related to theme queries are discussed in "Theme Query Examples" in this chapter.

See Also:

Oracle8 ConText Cartridge Administrator's Guide  

Create Theme Indexes for Text Columns

To create a theme index for a column, call CTX_DDL.CREATE_INDEX stored procedure and specify the theme indexing policy for the column.

For example:

        exec ctx_ddl.create_index('ctx_thidx')

In this example, CREATE_INDEX is called in SQL*Plus to create a theme index for the text column (ctxdev.docs.text) in the ctx_thidx policy.

After a theme index is created for a column, ConText servers with the Query personality can process theme queries for the column.

Create Result Tables (Two-Step Queries Only)

The structure of the result table for a theme query and the method for creating the table is identical to the result table used in a text query. In fact, you can use the same result table or you can create a different result table for two-step theme queries.

Note:

The two-step theme query example in "Theme Query Task Map" use a different result table than the result table used in the example for two-step text queries.  

Theme Query Examples

The methods for performing theme queries are identical to the three text query methods presented in "Text Queries" in Chapter 2, with the exception that theme queries are case-sensitive and the scoring methods are different.

"Theme Query Task Map" illustrates how to perform theme queries using all three of the supported query methods. In the examples, the theme Oracle is queried.

See Also:

Oracle8 ConText Cartridge Application Developer's Guide  

Case-Sensitivity

Unlike text queries, theme queries are case-sensitive. Theme queries for places and names will return different results than theme queries for the same words or phrases expressed as common nouns.

For example, a query for the term Oracle will produce those documents in which ConText determined Oracle Corporation (the expansion of Oracle) to be a major theme in the document. In contrast, a text query for the term Oracle will return all documents that contain occurrences of either Oracle or oracle, regardless of how the term is used in the document.

Scoring

Similar to text queries, the documents returned by theme queries have a score. While scores in a theme query indicate the relevance of the selected documents to the query, each score is based on the weight of the queried theme in the document, rather than the number of occurrences of the theme.

Theme weights are generated during theme indexing. Theme weight measures the importance of a document theme relative to the other themes in the document.

Columns with Multiple Indexes

If a column has two or more indexes, which may be common with text indexes and theme indexes, it is necessary to specify the name of the policy for the appropriate index as an argument in the CONTAINS function of a one-step query.

For example, if the QuickStart tasks for both text and theme queries have been performed, the column ctxdev.docs.text has both a text indexing policy (ctx_docs) and a theme indexing policy (ctx_thidx) and indexes for both.

The one-step theme query example in "Theme Query Task Map" uses ctx_thidx to identify the index to be searched.

Linguistic Services

The setup tasks required for using the Linguistic Services to generate output for a document are:

Once these tasks have been performed, you can query the output tables for themes and Gists. "Example Queries for Linguistic Output" in this chapter provides examples of the types of queries you can perform.

Create Linguistic Output Tables

The Linguistic Services generate two types of output for a document:

The output is stored in tables specified by the user when requesting the Linguistic Services. The linguistic output tables can have any name; however, they must have the following structure:

        create table ctx_themes (cid number, pk varchar2(64),

        theme varchar2(256), weight number);
        create table ctx_gist (cid number, pk varchar2(64),

        pov varchar2(256), gist long);

In these examples, two tables (ctx_themes and ctx_gist) are created for storing linguistic output. The pk column in each table stores primary keys (textkeys) for each document. The cid column in each table stores policy IDs.

Note:

The output from the Linguistic Services may require modification before it can be used in an application. As such, the output tables usually serve only to temporarily store the output until the output can be modified and moved to an application table.  

See Also:

Oracle8 Server SQL Reference, Oracle8 ConText Cartridge Application Developer's Guide  

Generate Linguistic Output for Documents

To request linguistic output for a document, call the REQUEST_THEMES and/or REQUEST_GIST procedures in the CTX_LING PL/SQL package. Then call the SUBMIT function in CTX_LING to submit the requests to the Services Queue.

The requests in the Services Queue are picked up and processed by the first available ConText servers with the Linguistic personality.

After linguistic output is generated, you can query for lists of themes in documents and use Gists to view summarized versions of your documents.

Note:

REQUEST_THEMES and REQUEST_GIST must be called once for each document for which you want to generate the respective type of linguistic output.

In addition, you must specify a policy name for REQUEST_THEMES and REQUEST_GIST. The specified policy can be either a text indexing policy or a theme indexing policy; however, if you are generating both themes and Gists for a document, you should use the same policy.  

For example:

        exec ctx_ling.request_themes('ctx_thidx',1,'ctx_themes')
        exec ctx_ling.request_gist('ctx_thidx',1,'ctx_gist')
        variable handle number

        exec :handle := ctx_ling.submit

        print handle

In this example, theme and Gist output are requested for a document with pk=1 in the text column for the ctx_thidx theme indexing policy. The pk value corresponds to the primary key for the document. The ctx_themes and ctx_gist tables are specified as the output tables.

Then, the SUBMIT function is called. SUBMIT submits the two separate requests as a single batch request to the Services Queue and returns a handle for the request.

Note:

In this example, the ctx_thidx policy is used to call REQUEST_THEMES and REQUEST_GIST; however, you could also use the ctx_docs example text indexing policy discussed in this manual to generate the same results.  

See Also:

Oracle8 ConText Cartridge Application Developer's Guide  

Example Queries for Linguistic Output

Theme and Gist information is stored as structured data in the linguistic output tables and can be used to present a specialized view of a document.

For example, you may want to display the themes for all the documents in a column or for a single document. You may also want to display the point-of-view (POV) Gist for all documents with a specific theme or the generic Gist for a specific document.

Note:

The Linguistic personality is required only for generating linguistic output for documents. Because linguistic queries are essentially queries for structured data, neither the Linguistic nor the Query personalities are required for querying the linguistic output tables.  

Because linguistic information is stored in tables separate from the base text table, you probably will want to join the text table and the linguistic output tables to return detailed information for the document hits.

The following examples illustrate queries for themes and Gists:

        select theme, weight, title from ctx_themes, docs,

where docs.author='Smith'

and ctx_themes.pk=docs.pk

        order by weight desc;


        select title, gist from ctx_gist, docs

        where ctx_gist.pk=docs.pk

        and pov = 'computer software industry';

In the first example, the query returns theme, weight, and title for each document in docs that has Smith as the author.

In the second example, the query returns the title and POV Gist for each document that has computer software industry as one of its points-of-view (themes) in the ctx_gist table.

Note:

ConText stores themes/POVs as plural nouns or noun phrases. When you perform queries using linguistic output, the query phrase must be entered exactly as it is stored in the output table or the query will likely return no results.

A list of available words or phrases for queries can be obtained by selecting all unique themes from the themes/Gist output table.  

See Also:

Oracle8 ConText Cartridge Application Developer's Guide  




Prev

Next
Oracle
Copyright © 1997 Oracle Corporation.
All Rights Reserved.

Library

Product

Contents