Oracle8i Data Cartridge Developer's Guide
Release 8.1.5

A68002-01

Library

Product

Contents

Index

Prev Next

7
Building Domain Indexes

This chapter describes extensible indexing, including:

Introduction to Extensible Indexing

What is extensible indexing? Why is it important to you as a cartridge developer? How should you go about implementing it?

To answer these questions we first need to understand the modes of indexing provided by the Oracle, which in turn requires that we first consider the role of indexing in information management systems.

What is Indexing?

The impetus to index data arises because of the need to locate specific information and then to retrieve it as efficiently as possible. If you could keep the entire dataset in main memory (equivalent to a person memorizing a book), there would be no need for indexing. Since this is not possible, and since disk access times are much slower than main memory access times, you are forced to wrestle with the art of indexing.

If you think of the form of indexing with which we are most familiar -- the index at the back of a technical book -- you will note that every index token has three characteristics which refer to the item being indexed:

The upshot is that you can retrieve the information much quicker than if you had to page through the entire book (equivalent to sequential scanning of a file)! However, note that while indexing speeds up retrieval, it slows down inserts because you have to update the index.

Index Structures

An index can be any structure which can be used to represent information that can be used to efficiently evaluate a query.

The Relationship between Logical and Physical Structures

There is no single structure that is optimal for all applications.

In each case, you will want to organize the data in a different index structure since different queries require that information be indexed in different ways. As we will discuss below, a Hash structure is best suited for determining exact match, whereas a B-tree is much better suited for range queries.

Moreover, these are not the only kind of queries. What if you want to discover whether Power Station A or B can best service Quadrant 3, or to determine the overlapping coverage zones derived from different distributions of power stations? In these cases, you will want to create operators (inRangeOf, servesArea, etc.) that meet your specific requirements. Unfortunately, you cannot do this by means of either the Hash or B-tree indextypes.

The Need for Index Structures that Encompass Unstructured Data

The limitation of Hash and B-tree indextypes is important because one criterion that distinguishes cartridges from other database applications is that data often incorporates many different kinds of information. While database systems are accomplished in processing scalar values, they cannot encompass the domain-specific data of interest to cartridge developers. Information in these contexts may be made up of text, images, audio, video -- and combinations of these that comprise domain-specific datatypes.

One way to resolve this problem is to create an index that serves as an intermediate structure. This is a logical extension of the basic idea underlying software-based indexing, namely that pointers refer to data (records, pages, files). In this scheme, keywords used to index video may be stored as an index. Going one step further, an intermediate structure may itself be indexed, as you might index abstracts (capsule text descriptions) of films.The advantage of this approach is that it may be easier to construct an index based on textual description of film than it is to index video footage. Employing this strategy you can scan the index without ever referring to the primary data (the film).

Unfortunately, intermediate structures in which text or scalars are used to represent unstructured data cannot satisfy all requirements. For one thing, they are always slower than direct indexing of the data because they introduce a level of indirection. More importantly, if the task is to analyze the density of bone in x-rays, or to categorize primate gestures, or to record the radio emissions of stars, there is no efficient substitute for direct indexing of unstructured data.

Examples of Indextypes

B-tree

While there is no single indextype that can satisfy all needs, the B-tree indextype comes closest to meeting the requirement. Here we describe the Knuth variation in which the index consists of two parts: a sequence set that provides fast sequential access to the data, and an index set that provides direct access to the sequence set.

Figure 7-1 B-tree Indextype Structure


While the nodes of a B-tree will generally not contain the same number of data values, and will usually contain a certain amount of unused space, the B-tree algorithm ensures that it remains balanced (the leaf nodes will all be at the same level).

Hash

Hashing gives fast direct access to a specific stored record based on a given field value. Each record is placed at a location whose address is computed as some function of some field of that record. The same function is used both at the time of insertion and retrieval.

The problem with hashing is that the physical ordering of records has little if any relation to their logical ordering. Also, there may be large unused areas on the disk.

Figure 7-2 Hash Indextype Structure


k-d tree

Our sample scenario integrates geographic data with other kinds of data. Insofar as we are interested in points that can be defined with two dimensions (latitude and longitude), such as geographic location of power stations, we can use a variation on the k-d tree known as the 2-d tree.

In this structure, each node is a datatype with fields for information, the two co-ordinates, a left-link and a right-link which can point to two children.

Figure 7-3 2-d Indextype Structure


The structure allows for range queries. That is, if the user specifies a point (xx, xx) and a distance, the query will return the set of all points within the specified distance of the point.

2-d trees are very easy to implement. However,the fact that a 2-d tree containing k nodes may have a height of k means that insertion and querying may be complex.

Point Quadtree

Figure 7-4 Point Quadtree Indextype Structure


The point quadtree is also used to represent point data in a two dimensional spaces. But these structures divide regions into four parts while 2-d trees divide regions into two. The fields of the record type for this node are comprised of an attribute for information, two co-ordinates, and four compass points (NW, SW, NE, SE) that can therefore point to four children.

Like 2-d trees, point quadtrees are very easy to implement. Also like 2-d trees, the fact that a point quadtree containing k nodes may have a height of k means that insertion and querying may be complex. Each comparison requires comparisons on at least two co-ordinates. However, in practice the lengths from root to leaf tend to be shorter in point quadtrees.

Why is Extensible Indexing Necessary?

The fact is that Oracle provides a limited number of indextypes, so that if (for instance) you wish to utilize either the k-d tree or the point quadtree, you will have to implement this yourself. As you consider your need to access your data, you need to keep in mind the following restrictions that pertain to the standard indextypes:

An Inability to Index Unstructured Data

Oracle's standard modes of indexing do not permit indexing a column that contains LONG or LOB values.

An Inability to Index Attributes of Column Objects

You may not be able to index a column object using Oracle's standard indexing schemes or the elements of a collection type.

An Inability to Index Values Derived from Domain-specific Operations

Oracle object types may be compared using either a map function or an order function. If the object utilizes a map function, then you can define a functional index that can be used implicitly to evaluate relational predicates. However, if an order function is used, you will not be able to use this to construct an index.

Further, you cannot utilize functions in predicates in which the range of the parameters is infinite. With Oracle8i functional indexes allow you to include a function in a predicate, provided you can precompute the function values for all the rows. Typically the index would store the rowid and the functional value. Queries that apply relational operators to values based on derived values utilize the index.

However, you can use functional indexes only if the function is so designed that there are a finite number of input combinations. Put another way: you cannot use functional indexes in cases in which the input parameters do not have a limited cardinality.

The Extensible Indexing API

This SQL-based interface allows you to define domain-specific operators and indexing schemes, and integrate these into the Oracle8i server.

Oracle8i provides a set of pre-defined operators which include arithmetic operators (+, -, *, /), comparison operators (=, >, <) and logical operators (NOT, AND, OR). These operators take as input one or more arguments (or operands) and return a result. They are represented by special characters (+) or keywords (AND).

Like built-in operators, user-defined operators (e.g., Contains) take a set of operands as input and return a result. The implementation of the operator is provided by the user. After a user has defined a new operator, it can be used in SQL statements like any other built-in operator.

For instance, suppose you define a new operator Contains, which takes as input a text document and a keyword, and returns 1 if the document contains the specified keyword. You can then write an SQL query as:

SELECT * FROM Employees WHERE Contains(resume, 'Oracle and UNIX')=1;

Oracle8i uses indexes to efficiently evaluate some built-in operators. For example, a B-tree index can be used to evaluate the comparison operators =, > and <. Similarly, user-defined domain indexes can be used to efficiently evaluate user-defined operators.

Typical database management systems support a few types of access methods (B+Trees, Hash Index) on some set of data types (numbers, strings, etc.). In recent years, databases are more and more being used to store different types of data, such as text, spatial, image, video and audio. In these complex domains, there is a need for indexing complex data types and also specialized indexing techniques. For instance, R-trees are an efficient method of indexing spatial data. No database server can be built with support for all possible kinds of complex data and indexing. The solution is to provide an extensible server which allows the user to define new index types.

The framework to develop new index types is based on the concept of cooperative indexing where an application and the Oracle server cooperate to build and maintain indexes for data types such as text, spatial and On-line-Analytical Processing (OLAP). The application software, in the form of a cartridge, is responsible for defining the index structure, maintaining the index content during load and update operations, and searching the index during query processing. The index structure itself can either be stored in an Oracle database as an Index-Organized Table, etc., or externally as a file.

The extensible indexing framework consists of the following components:

To illustrate the role of each of these components, let us consider a text domain application. Suppose a new indextype TextIndexType be defined as part of the text cartridge. It contains routines for managing and accessing the text index. The text index is an inverted index storing the occurrence list for each token in each of the text documents. The text cartridge also defines the Contains operator for performing content-based search on textual data. It provides both a functional implementation (a simple number function) and an index implementation (using the text index) for the Contains operator.

Now, let Employees be an employee table with a resume column containing textual data.

CREATE TABLE Employees 
(name VARCHAR(128), id INTEGER, resume VARCHAR2(1024));

A domain index can be created on resume column as follows:

CREATE INDEX ResumeTextIndex ON Employees(resume)
INDEXTYPE IS TextIndexType;

The Oracle server invokes the routine corresponding to the create method in the TextIndexType, which results in the creation of an index-organized table to store the occurrence list of all tokens in the resumes (essentially, the inverted index data). The inverted index modeled by ResumeTextIndex is automatically maintained by invoking routines defined in TextIndexType, whenever an Employees row is inserted, updated, or deleted.

Content-based search on the resume column can be performed as follows:

SELECT * FROM Employees WHERE Contains(resume, 'Oracle and UNIX')=1;

Index-based implementation of the Contains operator can take advantage of the previously built inverted index. Specifically, the Oracle server can invoke routines specified in TextIndexType to search the domain index for identifying candidate rows, and then do further processing such as filtering, selection, and fetching of rows. Note that the above query can also be evaluated using the non-index implementation of the Contains operator, if the Oracle server chooses to not use the index defined on resume column. In such a case, the filtering of rows will be done by applying the non-index implementation on each resume instance of the table.

In summary, the extensible indexing interface will

This interface will enable a domain index to operate essentially the same way as any other Oracle Server index, the primary difference being that the Oracle Server will invoke application code specified as part of the indextype to create, drop, truncate, modify, and search a domain index.

It should be noted that an index designer may choose to store the index data in files, rather than in index-organized tables. The SQL interface for extensible indexing makes no restrictions on the location of the index data, only that the application adhere to the protocol for index definition, maintenance and search operations.

Concepts: Extensible Indexing

This section describes the key concepts of the Extensible Indexing Framework.

Overview

For simple data types such as integers and small strings, all aspects of indexing can be easily handled by the database system. This is not the case for documents, images, video clips and other complex data types that require content-based retrieval (CBR). The essential reason is that complex data types have application specific formats, indexing requirements, and selection predicates. For example, there are many different document encodings (e.g., ODA, SGML, plain text) and information retrieval (IR) techniques (e.g., keyword, full-text boolean, similarity, probabilistic, and so on). To effectively accommodate the large and growing number of complex data objects, the database system must support application specific indexing. The approach that we employ to satisfy this requirement is termed extensible indexing.

With Extensible indexing,

In effect, the application controls the structure and semantic content of the domain index. The database system interacts with the application to build, maintain, and employ the domain index. It is highly desirable for the database to handle the physical storage of domain indexes. In the following discussion, we implicitly make the assumption that the index is stored in an index-organized table. Note however, that the extensible indexing paradigm does not impose this requirement. The index could be stored in one or more external files.

To illustrate the notion of extensible indexing, we consider a textual database application with IR functionality. For such applications, document indexing involves parsing the text and inserting the words, or tokens, into an inverted index. Such index entries typically have the following logical form

(token, <docid, data>)

where token is the key, docid is a unique identifier (e.g., object identification) for the related document, and data is a segment containing IR specific quantities. For example, a probabilistic IR scheme could have a data segment with token frequency and occurrence list attributes. The occurrence list identifies all locations within the related document where the token appears. Assuming an IR scheme such as this, each index entry would be of the form:

(token, <docid, frequency, occlist> ..)

The following sample index entry for the token Archimedes illustrates the associated logical content.

(Archimedes, <5, 3, [7 62 225]>, <26, 2, [33, 49]>, ...);

In this sample index entry, the token "Archimedes" appears in document 5 at 3 locations(7, 62, and 225), and in document 26 at 2 locations(33 and 49). Note that the index would contain one entry for every document with the word "Archimedes".

IR applications can use domain indexes to locate documents that satisfy some given selection criteria. After consulting the index, the documents of interest are retrieved with the related docid values. It should be noted that the occurrence lists are required for queries that contain proximity expressions (e.g., the phrase "Oracle Corporation").

When the database system handles the physical storage of domain indexes, applications must be able to:

In the following section, we illustrate the extensible indexing framework by building a text domain index.

Example: A Text Indextype

This section presents an example of adding a text indexing scheme to Oracle RDBMS using the extensible indexing framework. It describes:

Text Indextype Designer

'The sequence of steps required to define the Text Indextype are:

The text cartridge intends to support an operator Contains, that takes as parameters a text value and a key and returns a number value indicating whether the text contained the key. The functional implementation of this operator is a regular function defined as:

CREATE FUNCTION TextContains(Text IN VARCHAR2, Key IN VARCHAR2)
RETURN NUMBER AS
BEGIN
.......
END TextContains;

End User of Text Indextype

Suppose that the text indextype presented in the previous section has been defined in the system. You can define text indexes on text columns and use the associated Contains operator to query text data.

Consider the Employees table defined as follows:

CREATE TABLE Employees
(name VARCHAR2(64), id INTEGER, resume VARCHAR2(2000));

A text domain index can be built on the resume column as follows:

CREATE INDEX ResumeIndex ON Employees(resume) INDEXTYPE IS TextIndexType;

The text data in the resume column can be queried as:

SELECT * FROM Employees WHERE Contains(resume, 'Oracle') =1;

The query execution will use the text index on resume to efficiently evaluate the Contains predicate.

The following sections describe the concepts of indextypes, domain indexes and operators in greater detail.

Indextypes

The purpose of an indextype is to enable efficient search and retrieval functions for complex domains such as text, spatial, image, and OLAP using external software. An indextype is analogous to the sorted or bit-mapped indextype that are supplied internally within the Oracle Server. The essential difference is that the implementation for an indextype is provided by application software, as opposed to the Oracle Server internal routines.

Interface

A set of routine specifications. It does not refer to a separate schema object but rather a logical set of documented method specifications.

ODCIIndex Interface

The set of index definition, maintenance and scan routine specifications.

The interface specifies all the routines which have to be implemented by the index designer. The routines are implemented as type methods.

Creating Indextypes

After the type implementing the ODCIIndex interface has been defined, a new indextype can be created by specifying the list of operators supported by the indextype and referring to the type that implements the index interface.

Using the information retrieval example, the DDL statement for defining the new indextype TextIndexType which supports the Contains operator and whose implementation is provided by the type TextIndexMethods (implemented in the previous section) is as follows:

CREATE INDEXTYPE TextIndexType
     
FOR Contains (VARCHAR2, VARCHAR2)
USING TextIndexMethods;

In addition to the ODCIIndex interface routines, the implementation type must always implement the ODCIGetInterfaces routine. This function returns the list of names of the interfaces implemented by the type. The routine is invoked by Oracle when CREATE INDEXTYPE is executed. In Oracle8i there is only one set of extensible indexing interface routines called SYS.ODCIINDEX1. Thus, the ODCIGetInterfaces routine must return 'SYS'.'ODCIINDEX1' as one of the implemented interfaces.

Dropping Indextypes

A corresponding DROP statement is supported to remove the definition of an indextype. For our example, this statement would be of the following form:

DROP INDEXTYPE TextIndexType;

The default DROP behavior is DROP RESTRICT semantics, that is, if one or more domain indexes exist that uses the indextype then the DROP operation is disallowed. User can override the default behavior with the FORCE option, which drops the indextype and marks dependent domain indexes (if any) invalid. For more details on object dependencies and drop semantics see "Object Dependencies, Drop Semantics, and Validation".

ODCI Index Interface

The ODCIIndex (Oracle Data Cartridge Interface Index) interface consists of the following classes of methods:

Index Definition Methods

Index definition methods allow specification of CREATE, ALTER, DROP, and TRUNCATE behaviors.

ODCIIndexCreate

The ODCIIndexCreate procedure is called when a CREATE INDEX statement is issued that references the indextype. Upon invocation, any physical parameters specified as part of the CREATE INDEX... PARAMETERS (...) statement are passed in along with the description of the index.

A typical action of this procedure is to create tables/files to store index data. Further, if the base table is not empty, this routine should build the index for the existing data in the indexed columns.

ODCIIndexAlter

The ODCIIndexAlter procedure is invoked when a domain index is altered using an ALTER INDEX statement. The description of the domain index to be altered is passed in along with any specified physical parameters.

In addition, this procedure is allowed to handle ALTER with REBUILD option, which supports rebuilding of domain index. The precise behavior in these two cases is defined by the person who implements indextype.

The ODCIIndexAlter routine is also invoked when a domain index is renamed using the ALTER INDEX ... RENAME command.

ODCIIndexTruncate

The ODCIIndexTruncate procedure is called when a TRUNCATE statement is issued against a table that contains a column or OBJECT type attribute indexed by the indextype. After this procedure executes, the domain index should be empty.

ODCIIndexDrop

The ODCIIndexDrop procedure is invoked when a domain index is destroyed using a DROP INDEX statement.

Index Maintenance Methods

Index maintenance methods allow specification of index INSERT, UPDATE, and DELETE behaviors.

ODCIIndexInsert

The ODCIIndexInsert procedure in the indextype is called when a record is inserted in a table that contains columns or OBJECT attributes indexed by the indextype. The new values in the indexed columns are passed in as arguments along with the corresponding row identifier.

ODCIIndexDelete

The ODCIIndexDelete procedure in the indextype is called when a record is deleted from a table that contains columns or OBJECT attributes indexed by the indextype. The old values in the indexed columns are passed in as arguments along with the corresponding row identifier.

ODCIIndexUpdate

The ODCIIndexUpdate procedure in the indextype is called when a record is updated in a table that contains columns or OBJECT attributes indexed by the indextype. The old and new values in the indexed columns are passed in as arguments along with the row identifier.

Index Scan Methods

Index scan methods allow specification of an index-based implementation for evaluating predicates containing operators.

An index scan is specified through three routines, ODCIIndexStart, ODCIIndexFetch, and ODCIIndexClose, which can perform initialization, fetch rows (essentially row identifiers) satisfying the predicate, and clean-up once all rows satisfying the predicate are returned.

ODCIIndexStart

ODCIIndexStart() is invoked to initialize any data structures and start an index scan. The index related information and the operator related information are passed in as arguments.

A typical action performed when ODCIIndexStart() is invoked is to parse and execute SQL statements that query the tables storing the index data. It could also generate some set of result rows to be returned later when ODCIIndexFetch() is invoked (see below).

Since the index and operator related information are passed in as arguments to ODCIIndexStart() and not to the other index scan routines (ODCIIndexFetch() and ODCIIndexClose()), any information needed in the later routines must be saved. This is referred to as the state that has to be shared among the index scan routines. There are two ways of doing this:

In both cases, Oracle RDBMS will pass the SELF value to subsequent ODCIIndexFetch() and ODCIIndexClose() calls which can then use the to access the relevant context information.

There are two modes of evaluating the operator predicate to return the result set of rows.

The choice of evaluating modes as well as what gets saved is left to the index designer. In either case, the Oracle RDBMS simply executes the ODCIIndexStart() routine as part of processing query containing operators which returns the context as an output SELF value.The returned value is passed back to subsequent ODCIIndexFetch() and ODCIIndexClose() calls.

ODCIIndexFetch

ODCIIndexFetch() returns the "next" row identifier of the row that satisfies the operator predicate.The operator predicate is specified in terms of the operator expression (name and arguments) and a lower and upper bound on the operator return values. Thus, a ODCIIndexFetch() call returns the row identifier of the rows for which the operator return value falls within the specified bounds. A NULL is returned to indicate end of index scan. The fetch method supports returning a batch of rows in each call. The state returned by ODCIIndexStart() or a previous call to ODCIIndexFetch() is passed in as an argument.

ODCIIndexClose

ODCIIndexClose() is invoked when the cursor is closed or reused. In this call the Indextype can perform any clean-ups, etc. The current state is passed in as an argument.

Index Metadata Method

The ODCIIndexGetMetadata routine, if it is implemented, is called by the export utility to write implementation-specific metadata into the export dump file. This metadata might be policy information, version information, per-user settings, and so on, which are not stored in the system catalogs. The metadata is written to the dump files as anonymous PL/SQL blocks that get executed at import time immediately prior to the creation of the associated index.

This method on the ODCIIndex interface is required in version 8.1.3 and must be implemented by all domain index implementation types. If ODCIIndexGetMetadata is not found, export will abort the creation of the index. However, for the final release of 8.1, this method will be optional if no implementation-specific metadata is required.

Transaction Semantics during Index Method Execution

The index interface routines (with the exception of index definition methods, namely, ODCIIndexCreate(), ODCIIndexAter(), ODCIIndexTruncate(), ODCIIndexDrop()) are invoked under the same transaction that triggered these actions. Thus, the changes made by these routines are atomic and are committed or aborted based on the parent transaction. To achieve this, there are certain restrictions on the nature of the actions that can be performed in the different indextype routines.

For example, if an INSERT statement caused the ODCIIndexInsert() routine to be invoked, ODCIIndexInsert() runs under the same transaction as INSERT. The ODCIIndexInsert() routine can execute any number of DML statements (for example, insert into index-organized tables). If the original transaction aborts, all the changes made by the indextype routines are rolled back.

However, if the indextype routines cause changes external to the database (like writing to external files), transaction semantics are not assured.

Transaction Semantics for Index Definition Routines

The index definition routines do not have any restrictions on the nature of actions within them. Consider ODCIIndexCreate() to understand this difference. A typical set of actions to be performed in ODCIIndexCreate() could be:

  1. Create an index-organized table

  2. Insert data into the index-organized table

  3. Create a secondary index on a column of the index-organized table

To allow ODCIIndexCreate() to execute an arbitrary sequence of DDL and DML statements, we consider each statement to be an independent operation. Consequently, the changes made by ODCIIndexCreate() are not guaranteed to be atomic. The same is true for other index-definition routines.

Consistency Semantics during Index Method Execution

The index maintenance (and scan routines) execute with the same snapshot as the top level SQL statement performing the DML (or query) operation. This enables the index data processed by the index method to be consistent with the data in the base tables.

Privileges During Index Method Execution

Indextype routines always execute as the owner of the index. To support this, the index access driver will dynamically change user mode to index owner before invoking the indextype routines.

For certain operations, indextype routines may require to store information in tables owned by indextype designer. Indextype implementation must code those actions in a separate routine which will be executed using definer's privileges. For more information on syntax, see CREATE TYPE in the Oracle8i SQL Reference.

Domain Index

This section describes the domain index operations and how metadata associated with the domain index can be obtained.

Domain Index Operations

Creating a Domain Index

A domain index can be created on a column of a table just like a B-tree index. However, an indextype must be explicitly specified. For example:

CREATE INDEX ResumeTextIndex ON Employees(resume)
INDEXTYPE IS TextIndexType
PARAMETERS (':Language English :Ignore the a an');

The INDEXTYPE clause specifies the indextype to be used. The PARAMETERS clause identifies any parameters for the domain index, specified as a string. This string is passed uninterpreted to the ODCIIndexCreate routine for creating the domain index. In the above example, the parameters string identifies the language of the text document (thus identifying the lexical analyzer to use) and the list of stop words which are to be ignored while creating the text index.

Altering a Domain Index

A domain index can be altered using ALTER INDEX statement. For example:

ALTER INDEX ResumeTextIndex PARAMETERS (':Ignore on');

The parameter string is passed uninterpreted to ODCIIndexAlter() routine, which takes appropriate actions to alter the domain index. In the above example, additional stop words to ignore in the text index are specified.

The ALTER statement can be used to rename a domain index.

ALTER INDEX ResumeTextIndex RENAME TO ResumeTIdx;

The ODCIIndexAlter() routine is invoked, which takes appropriate actions to rename the domain index.

In addition, the ALTER statement can be used to rebuild a domain index.

ALTER INDEX ResumeTextIndex REBUILD PARAMETERS (':Ignore of');

The same ODCIIndexAlter() routine is called but with additional information about the ALTER option.

Truncating Domain Index

There is no explicit statement for truncating a domain index. However, when the corresponding table is truncated the truncate procedure specified as part of the indextype is invoked. For example:

TRUNCATE TABLE Employees;

will result in truncating ResumeTextIndex by calling ODCIIndexTruncate() routine.

Dropping a Domain Index

To drop an instance of a domain index, the DROP INDEX statement is used. For our example, this statement would be of the form:

DROP INDEX ResumeTextIndex;

This results in calling the ODCIIndexDrop() routine and passing information about the index.

Domain Index Meta Data

For B-tree indexes, users can query the USER_INDEXES view to get index information. To provide similar support for domain indexes, indextype designers can add any domain-specific metadata in the following manner:

Export/Import of Domain Indexes

Like B-tree and bitmap indexes, domain indexes are exported and subsequently imported when their base tables are exported. However, domain indexes can have implementation-specific metadata associated with them that are not stored in the system catalogs. For example, a text domain index can have associated policy information, a list of irrelevant words, and so on. Export/Import provides a mechanism to opaquely move this metadata from the source platform to target platform.

To move the domain index metadata, the indextype needs to implement the ODCIIndexGetMetadata interface routine (see the reference chapters for details). This interface routine gets invoked when a domain index is being exported. The domain index information is passed in as a parameter. It can return any number of anonymous PL/SQL blocks that are written into the dump file and executed on import. If present, these anonymous PL/SQL blocks are executed immediately before the creation of the associated domain index.

Note that the ODCIIndexGetMetadata is an optional interface routine. It is needed only if the domain index has extra metadata to be moved.

Operators

A user-defined operator is a top-level schema object. It is identified by a name which is in the same namespace as tables, views, types and stand-alone functions.

Operator Bindings

An operator binding identifies the operator with a unique signature (via argument data types), and allows associating a function that provides an implementation for the operator. This enables Oracle to execute the function when the operator is invoked. Multiple operator bindings can be defined as long as they differ in their signatures.

Thus, any operator has an associated set of one or more bindings. Each of this binding can be evaluated using an user-defined function which could be one of

An operator created in a schema can be evaluated using functions defined in the same or different schemas. The operator bindings can be specified at the time of creating the operator. It is ensured that the signatures of the bindings are unique.

Creating operators

An operator can be created by specifying the operator name and its bindings.

For example, an operator Contains can be created in the Ordsys schema with two bindings and the corresponding functions that providing the implementation in Text and Spatial domains.

CREATE OPERATOR Ordsys.Contains
BINDING
(VARCHAR2, VARCHAR2) RETURN NUMBER USING text.contains, 
(Spatial.Geo, Spatial.Geo) RETURN NUMBER USING Spatial.contains;


Note:

Although the return data type is specified as part of operator binding declaration, it is not considered to determine the uniqueness of the binding, However, the specified function must have the same argument and return datatypes as the operator binding.  


Dropping operators

An existing operator and all its bindings can be dropped using the DROP OPERATOR statement as follows:

DROP OPERATOR Contains;

The default DROP behavior is DROP RESTRICT semantics. Namely, if there are any dependent indextypes for any of the operator bindings, then the DROP operation is disallowed.

However, users can override the default behavior by using the FORCE option. For example,

DROP OPERATOR Contains FORCE;

drops operator Contains and all its binding marks any dependent indextype objects (if any) invalid.

Operator Invocation

Operator Usage

User-defined operators can be invoked anywhere built-in operators can be used. i.e. wherever expressions can occur. For example, user-defined operators can be used in the following:

Operator Execution

When an operator is invoked, the evaluation of the operator is transformed to the execution of one of the functions bound to it. This transformation is based on the datatypes of the arguments to the operator. If none of the functions bound to the operator satisfy the signature with which the operator is invoked, an error is raised. There might be some implicit type conversions present during the transformation process.

Examples

Consider the operator created with the following statement:

CREATE OPERATOR Ordsys.Contains
BINDING 
(VARCHAR2, VARCHAR2) RETURN NUMBER 
USING text.contains, 
(spatial.geo, spatial.geo) RETURN NUMBER 
USING spatial.contains;

Consider the operator Contains being used in the following SQL statements:

SELECT * FROM Employee
WHERE Contains(resume, 'Oracle')=1 AND Contains(location, :bay_area)=1;

The invocation of the operator Contains(resume, 'Oracle') is transformed into the execution of the function text.contains(resume, 'Oracle') since the signature of the function matches the datatypes of the operator arguments. Similarly, the invocation of the operator Contains(location, :bay_area) is transformed into the execution of the function spatial.contains(location, :bay_area).

The following statement would raise an error since none of the operator bindings satisfy the argument datatypes:

SELECT * FROM Employee
WHERE Contains(salary, 10000)=1;

Operator Privileges

There are system privileges for operator schema objects. They are:

See the Oracle8i SQL Reference for details.

To use a user-defined operator in an expression, you must own the operator or have EXECUTE privilege on it.

Operators and Indextypes

An operator can be optionally supported by one or more user-defined indextypes. An indextype can "support" one or more operators. This means that a domain index of this indextype can be used in efficiently evaluating these operators. For example, B-tree indexes can be used to evaluate the relational operators like =, < and >. Operators can also be bound to regular functions. For example, an operator Equal can be bound to a function eq(number, number) that compares two numbers. The DDL for this would be:

CREATE OPERATOR Equal 
BINDING(NUMBER, NUMBER) RETURN NUMBER USING eq;

Thus, an indextype designer should first design the set of operators to be supported by the indextype. For each of these operators, a functional implementation should be provided.

The list of operators supported by an indextype are specified when the indextype schema object is created (as described above). The evaluation of operators using indextype is different for operators occurring in WHERE clause compared to operators occurring elsewhere in a SQL statement. Below we consider index-based evaluation of the operators in both these cases.

Operators in WHERE clause

The operators appearing in the WHERE clause can be evaluated efficiently by performing an index scan using the scan methods provided as part of indextype implementation. This involves recognizing operator predicates of a certain form, selection of a domain index, setting up of appropriate index scan, and finally, execution of index scan methods. Let's consider each one of these steps in detail.

Operator Predicates

Indextype "supports" efficient evaluation of those operator predicates, which can be represented by a range of lower and upper bounds on the operator return values. Specifically, predicates of the form

op(...) relop <value expression>, where relop in {<, <=, =, >=,>}

op(...) LIKE <value_expression>

are possible candidates for index scan based evaluation.

Use of the operators in any expression, for example

op(...) + 2 = 3

precludes index-scan based evaluation.

Predicates of the form,

op() is NULL
 

will not be evaluated using index scan. It will always be evaluated using the functional implementation.

Finally, any other operator predicates which can internally be converted into one of the above forms by Oracle can also make use of the index scan based evaluation.

Operator Resolution

The index scan based evaluation of the operator is a possible candidate for predicate evaluation only if the operator occurring in the predicate (as described above) operates on a column or OBJECT attribute indexed using an indextype. The final decision to choose between the indexed implementation and the functional implementation is made by the optimizer. The optimizer takes into account the selectivity and cost while generating the query execution plan.

As an example, consider the query

SELECT * FROM Employees WHERE Contains(resume, 'Oracle') = 1;

The optimizer can choose to use a domain index in evaluating the Contains operator if

If any of the above conditions do not hold, a complete scan of the Employees table is performed and the functional implementation of Contains is applied as a post-filter. If the above conditions are met, the optimizer uses selectivity and cost functions to compare the cost of index-based evaluation with the full table scan and appropriately generates the execution plan.

Consider a slightly different query,

SELECT * FROM Employees WHERE Contains(resume, 'Oracle') =1 AND id =100;

In this query, the Employees table could be accessed through an index on the id column or one on the resume column. The optimizer estimates the costs of the two plans and picks the cheaper one, which could be to use the index on id and apply the Contains operator on the resulting rows. In this case, the functional implementation of Contains() is used and the domain index is not used.

Index scan Setup

If a domain index is selected for the evaluation of an operator predicate, an index scan is set-up. The index scan is performed by the scan methods (ODCIIndexStart(), ODCIIndexFetch(), ODCIIndexClose()) specified as part of the corresponding indextype implementation. The ODCIIndexStart() method is invoked with the operator related information including name and arguments and the lower and upper bounds describing the predicate. After the ODCIIndexStart() call, a series of fetches are performed to obtain row identifiers of rows satisfying the predicate, and finally the ODCIIndexClose() is called when the SQL cursor is destroyed.

Execution Model for Index Scan Methods

The index scan routines must be implemented with an understanding of how the routines' invocations are ordered and how multiple sets of invocations can be interleaved.

As an example, consider the query

SELECT * FROM Emp1, Emp2 WHERE 
Contains(Emp1.resume, 'Oracle') =1 AND Contains(Emp2.resume, 'Unix') =1
AND Emp1.id = Emp2.id;

If the optimizer decides to use the domain indexes on the resume columns of both tables, the indextype routines may be invoked in the following sequence:

start(ctx1, ...); /* corr. to Contains(Emp1.resume, 'Oracle') */
start(ctx2, ...); /* corr. to Contains(Emp2.resume, 'Unix');
fetch(ctx1, ...);
fetch(ctx2, ...);
fetch(ctx1, ...);
...
close(ctx1);
close(ctx2);

Thus, the same indextype routine may be invoked but for different instances of operators. At any time, many operators are being evaluated through the same indextype routines. In case of routines that do not need to maintain any state across calls i.e. all the information is obtained through its parameters (like the create routine), this is not a problem. However, in case of routines needing to maintain state across calls (like the fetch routine which needs to know which row to return next), the state should be maintained in the SELF parameter that is passed in to each call. The SELF parameter (which is an instance of the implementation type) can be used to store either the entire state (if it is not too big) or a handle to the cursor-duration memory that stores the state.

Operators elseWHERE

Using Functional Implementation

Operators occurring in expressions other than in the WHERE clause are evaluated using the functional implementation. For example,

SELECT Contains(resume, 'Oracle') FROM Employee;

would be executed by scanning the Employee table and invoking the functional implementation for Contains on each instance of resume. The function is invoked by passing it the actual value of the resume (text data) in the current row. Note that this function would not make use of any domain indexes that may have been built on the resume column.

However, it's possible to have a functional implementation for an operator that makes use of a domain index. The following sections discuss how functions that use domain indexes can be written and how they are invoked by the system.

Creating Index-based Functional Implementation

For many domain-specific operators, such as Contains, the functional implementation can work in two ways:

  1. If the operator is operating on a column (or OBJECT attribute) that has a domain index of a particular indextype, the function can evaluate the operator by looking at the index data rather than the actual argument value.

    For example, when Contains(resume, 'Oracle') is invoked on a particular row of the Employee table, it is easier for the function to look up the text domain index defined on the resume column and evaluate the operator based on the row identifier for the row containing the resume - rather than work on the resume text data argument.

  2. If the operator is operating on a column that does not have an appropriate domain index defined on it or if the operator is invoked with literal values (non-columns), the functional implementation evaluates the operator based on only the argument values. This is the default behavior for all operator bindings.

To achieve both the behaviors of (1) and (2) above, the functional implementation is provided using a regular function which has three additional arguments - in addition to all the original arguments to the operator. The additional arguments are:

For example, the index-based functional implementation for the Contains operator is provided by the following function.

CREATE FUNCTION TextContains (Text IN VARCHAR2, Key IN VARCHAR2,
indexctx IN ODCIIndexCtx, scanctx IN OUT TextIndexMethods, scanflg IN NUMBER)
RETURN NUMBER AS
BEGIN
.......
END TextContains;

The Contains operator is bound to the above functional implementation as follows:

CREATE OPERATOR Contains
BINDING (VARCHAR2, VARCHAR2) RETURN NUMBER 
WITH INDEX CONTEXT, SCAN CONTEXT TextIndexMethods
USING TextContains;

The WITH INDEX CONTEXT clause specifies that the functional implementation can make use of any applicable domain indexes. The SCAN CONTEXT specifies the datatype of the scan context argument. It must be the same as the implementation type of the relevant indextype that supports this operator.

Operator Resolution

Oracle will invoke the functional implementation for the operator if the operator appears elseWHERE i.e. anywhere other than the WHERE clause. If the functional implementation is index-based (i.e. defined to use an indextype), the additional index information will be passed in as arguments only if the operator's first argument is a column (or OBJECT attribute) with a domain index of the appropriate indextype defined on it.

For example, in the query

SELECT Contains(resume, 'Oracle & Unix') FROM Employees;

the Operator Contains will be evaluated using the index-based functional implementation by passing the index information about the domain index on resume column instead of the resume data.

Operator Execution

To execute the index-based functional implementation, Oracle RDBMS will set-up the arguments in the following manner:

When index information is passed in, the implementation can compute the operator value by doing a domain index lookup using the row identifier as key. The index metadata is used to identify the index structures associated with the domain index. The scan context is typically used to share state with the subsequent invocations of the same operator.

Ancillary Data

Apart from filtering rows, the operator occurring in WHERE clause might need to support returning ancillary data. The ancillary data is modeled as an operator (or multiple operators) with a single literal number argument. It has a functional implementation that has access to state generated by the index-scan based implementation of the primary operator occurring in the WHERE clause.

For example, in the following query,

SELECT Score(1) FROM Employees 
WHERE Contains(resume, 'OCI & UNIX', 1) =1;

Contains is the primary operator which can be evaluated using an index-scan which in addition to determining the rows that satisfy the predicate, also computes a score value for each row. The functional implementation for Score operator simply accesses the state generated by the index-scan to obtain score for a given row identified by its row identifier. The literal argument 1 associates the ancillary operator Score to the corresponding primary operator Contains which generates the ancillary data.

In summary, ancillary data is modeled as independent operator(s), which is invoked by the user with a single number argument that ties it with the corresponding primary operator. Its functional implementation makes use of either the domain index or the state generated by the primary operator occurring in WHERE clause. The functional implementation is invoked with extra arguments: the index context containing the domain index information and the scan context which provides access to the state generated by the primary operator. The following sections discuss how operators modeling ancillary data are defined and invoked.

Creating Operator Binding that Computes Ancillary Data

An indextype designer needs to specify that an operator binding computes ancillary data. Such a binding is referred to as a primary binding. For example, a primary binding for Contains can be defined as follows:

CREATE OPERATOR Contains
BINDING (VARCHAR2, VARCHAR2) RETURN NUMBER
WITH INDEX CONTEXT, SCAN TextIndexMethods COMPUTE ANCILLARY DATA
USING TextContains;

The above definition registers two bindings for Contains, namely:

However, the indextype designer needs to define a single functional implementation:

TextContains(VARCHAR2, VARCHAR2, ODCIIndexCtx, TextIndexMethods, NUMBER).

Creating Operator Binding that Models Ancillary Data

An indextype designer has to implement the functional implementation for ancillary data operators in a manner similar to the index-based functional implementation. As discussed earlier, the function takes extra arguments. After the function is defined, the indextype designer can bind it to the operator with an additional ANCILLARY TO attribute, which indicates that the functional implementation needs to share state with the primary operator binding. The binding that is used for modeling ancillary data is referred to as the ancillary operator binding.

For example, let TextScore() function contain code to evaluate the Score ancillary operator.

CREATE FUNCTION TextScore (Text IN VARCHAR2, Key IN VARCHAR2,
indexctx IN ODCIIndexCtx, scanctx IN OUT TextIndexMethods, scanflg IN NUMBER)
RETURN NUMBER AS
BEGIN
.......
END TextScore;

An ancillary operator binding can be created as follows:

CREATE OPERATOR Score
BINDING (NUMBER) RETURN NUMBER
ANCILLARY TO Contains(VARCHAR2, VARCHAR2) 
USING TextScore;

Operator Resolution

The operators corresponding to ancillary data are invoked by the user with a single number argument.


Note:

The number argument must be a literal in both the ancillary operation and the primary operator invocation. This is required so that the operator association can be done at the query compilation time.  


The corresponding primary operator invocation in the query is determined by matching it with the number passed in as the last argument to the primary operator. After the matching primary operator invocation is found (it is an error to find zero or more than one matching primary operator invocation):

For example, consider the query

SELECT Score(1) FROM Employees
WHERE Contains(resume, ' Oracle & Unix', 1) =1;

The invocation of Score is determined to be ancillary to Contains based on the number argument "1" and the functional implementation for Score gets the following operands: (resume, 'Oracle&Unix', indexctx, scanctx, scanflg) where scanctx is shared with the invocation of Contains.

Operator Execution

The execution would involve using an index scan to process the Contains operator. For each of the rows returned by the fetch() call of the index scan, the functional implementation of Score is invoked by passing it the ODCIIndexCtx argument, which contains the index information, row identifier, and a handle to the index scan state. The functional implementation can use the handle to the index scan state to compute the score.

Object Dependencies, Drop Semantics, and Validation

Dependencies

The dependencies between various objects are as follows:

Thus, the order in which these objects must be created, or their definitions exported for future Import are:

Drop Semantics

The drop behavior for an object is as follows:

The table below shows the default and explicit drop options supported for operators and indextypes. The other schema objects are included for completeness and the corresponding drop behavior already available in Oracle8i.

Schema Object   Default Drop Behavior   Explicit Options Supported  

Function  

FORCE  

None  

Package  

FORCE  

None  

Object Types  

RESTRICT  

FORCE  

Operator  

RESTRICT  

FORCE  

Indextype  

RESTRICT  

FORCE  

Object Validation

Invalid object are automatically revalidated whenever the object is subsequently referenced.

Privileges




Prev

Next
Oracle
Copyright © 1999 Oracle Corporation.

All Rights Reserved.

Library

Product

Contents

Index