8
Tuning Distributed Queries

Oracle supports transparent distributed queries to access data from multiple databases. It also provides many other distributed features, such as transparent distributed transactions and a transparent, fully automatic two-phase commit. This chapter explains how the Oracle8 optimizer decomposes SQL statements, and how this affects performance of distributed queries. The chapter provides guidelines on how to influence the optimizer and avoid performance bottlenecks.

Topics include:

Remote and Distributed Queries

If a SQL statement references one or more remote tables, the optimizer first determines whether all remote tables are located at the same site. If all tables are located at the same remote site, Oracle sends the entire query to the remote site for execution. The remote site sends the resulting rows back to the local site. This is called a remote SQL statement. If the tables are located at more than one site, the optimizer decomposes the query into separate SQL statements to access each of the remote tables. This is called a distributed SQL statement. The site where the query is executed, called the "driving site," is normally the local site.

This section describes:

Remote Data Dictionary Information

If a SQL statement references multiple tables, then the optimizer must determine which columns belong to which tables before it can decompose the SQL statement. For example, for this query:

   SELECT DNAME, ENAME 
     FROM DEPT, EMP@REMOTE 
    WHERE DEPT.DEPTNO = EMP.DEPTNO

The optimizer must first determine that the DNAME column belongs to the DEPT table and the ENAME column to the EMP table. Once the optimizer has the data dictionary information of all remote tables, it can build the decomposed SQL statements.

Column and table names in decomposed SQL statements appear between double quotes. You must enclose in double quotes any column and table names that contain special characters, reserved words, or spaces.

This mechanism also replaces an asterisk (*) in the select list with the actual column names. For example:

   SELECT * FROM DEPT@REMOTE;

Results in the decomposed SQL statement

   SELECT A1."DEPTNO", A1."DNAME", A1."LOC" FROM "DEPT" A1;

Note:
For simplicity, double quotes are not used in the remainder of this chapter.

Remote SQL Statements

If the entire SQL statement is sent to the remote database, the optimizer uses table aliases A1, A2, and so on, for all tables and columns in the query, in order to avoid possible naming conflicts. For example:

   SELECT DNAME, ENAME 
     FROM DEPT@REMOTE, EMP@REMOTE 
   WHERE DEPT.DEPTNO = EMP.DEPTNO;

is sent to the remote database as:

   SELECT A2.DNAME, A1.ENAME 
     FROM DEPT A2, EMP A1 
   WHERE A1.DEPTNO = A2.DEPTNO;

Distributed SQL Statements

When a query accesses data on one or more databases, one site "drives" the execution of the query. This is known as the "driving site"; it is here that the data is joined, grouped and ordered. By default, the local Oracle server is the driving site. A hint called DRIVING_SITE enables you to manually specify the driving site.

The decomposition of SQL statements is important because it determines the number of records or even tables that must be sent through the network. A knowledge of how the optimizer decomposes SQL statements can help you achieve optimum performance for distributed queries.

If a SQL statement references one or more remote tables, the optimizer must decompose the SQL statement into separate queries to be executed on the different databases. For example:

   SELECT DNAME, ENAME 
   FROM DEPT, EMP@REMOTE 
   WHERE DEPT.DEPTNO = EMP.DEPTNO;

Might be decomposed into

   SELECT DEPTNO, DNAME FROM DEPT;

Which is executed locally, and

   SELECT DEPTNO, ENAME FROM EMP;

which is sent to the remote database. The data from both tables is joined locally. All this is done automatically and transparently for the user or application.

In some cases, however, it might be better to send the local table to the remote database and join the two tables on the remote database. This can be achieved either by creating a view, or by using the DRIVING_SITE hint. If you decide to create a view on the remote database, a database link from the remote database to the local database is also needed.

For example (on the remote database):

   CREATE VIEW DEPT_EMP AS 
   SELECT DNAME, ENAME 
   FROM DEPT@LOCAL, EMP 
   WHERE DEPT.DEPTNO = EMP.DEPTNO;

Then select from the remote view instead of the local and remote tables

   SELECT * FROM DEPT_EMP@REMOTE;

Now the local DEPT table is sent through the network to the remote database, joined on the remote database with the EMP table, and the result is sent back to the local database.

Rule-Based Optimization

Rule-based optimization does not have information about indexes for remote tables. It never, therefore, generates a nested loops join between a local table and a remote table with the local table as the outer table in the join. It uses either a nested loops join with the remote table as the outer table or a sort merge join, depending on the indexes available for the local table.

Cost-Based Optimization

Cost-based optimization can consider more execution plans than rule-based optimization. Cost-based optimization knows whether indexes on remote tables are available, and in which cases it would make sense to use them. Cost-based optimization considers index access of the remote tables as well as full table scans, whereas rule-based optimization considers only full table scans.

The particular execution plan and table access that cost-based optimization chooses depends on the table and index statistics. For example, with:

   SELECT DNAME, ENAME 
     FROM DEPT, EMP@REMOTE 
    WHERE DEPT.DEPTNO = EMP.DEPTNO

The optimizer might choose the local DEPT table as the driving table and access the remote EMP table using an index, so the decomposed SQL statement becomes:

   SELECT ENAME FROM EMP WHERE DEPTNO = :1

This decomposed SQL statement is used for a nested loops operation.

Using Views

If tables are on more than one remote site, it can be more effective to create a view than to use the DRIVING_SITE hint. If not all tables are on the same remote database, the optimizer accesses each remote table separately. For example:

   SELECT D.DNAME, E1.ENAME, E2.JOB 
     FROM DEPT D, EMP@REMOTE E1, EMP@REMOTE E2 
    WHERE D.DEPTNO = E1.DEPTNO 
      AND E1.MGR = E2.EMPNO;

Results in the decomposed SQL statements

   SELECT EMPNO, ENAME FROM EMP;

And

   SELECT ENAME, MGR, DEPTNO FROM EMP;

If you want to join the two EMP tables remotely, you can create a view to accomplish this. Create a view with the join of the remote tables on the remote database. For example (on the remote database):

   CREATE VIEW EMPS AS 
   SELECT E1.DEPTNO, E1.ENAME, E2.JOB 
     FROM EMP E1, EMP E2 
    WHERE E1.MGR = E2.EMPNO;

And now select from the remote view instead of the remote tables:

   SELECT D.DNAME, E.ENAME, E.JOB 
     FROM DEPT D, EMPS@REMOTE E 
   WHERE D.DEPTNO = E.DEPTNO;

This results in the decomposed SQL statement

   SELECT DEPTNO, ENAME, JOB FROM EMPS;

Using Hints

In a distributed query, all hints are supported for local tables. For remote tables, however, you can use only join order and join operation hints. (Hints for access methods, parallel hints, and so on, have no effect.) For remote mapped queries, all hints are supported.

EXPLAIN PLAN and SQL Decomposition

EXPLAIN PLAN gives information not only about the overall execution plan of SQL statements, but also about the way in which the optimizer decomposes SQL statements. EXPLAIN PLAN stores information in the PLAN_TABLE table. If remote tables are used in a SQL statement, the OPERATION column will contain the value REMOTE to indicate that a remote table is referenced, and the OTHER column will contain the decomposed SQL statement that will be sent to the remote database. For example:

   EXPLAIN PLAN FOR SELECT DNAME FROM DEPT@REMOTE 
   SELECT OPERATION, OTHER FROM PLAN_TABLE 

   OPERATION OTHER 
   --------- -------------------------------------
   REMOTE    SELECT A1."DNAME" FROM "DEPT" A1

Note the table alias and the double quotes around the column and table names.

See Also:
Chapter 13, "Using EXPLAIN PLAN".

Partition Views

You can use partition views to coalesce tables that have the same structure, but that also contain different partitions of data. This is useful for a distributed database where each partition resides on a database and the data in each partition has common geographical properties.

When a query is executed on such a partition view, and the query contains a predicate that contains the result set to a subset of the view's partitions, the optimizer chooses a plan which skips partitions that are not needed for the query. This partition elimination takes place at run time, when the execution plan references all partitions.

Rules for Use

This section describes the circumstances under which a UNION ALL view enables the optimizer to skip partitions. The Oracle server that contains the partition view must conform to the following rules:

The PARTITION_VIEW_ENABLED initialization parameter is set to TRUE

The cost-based optimizer is used

Note:
To use the cost-based optimizer you must analyze all tables used in the UNION ALL views. Alternatively, you can use a hint or set the parameter OPTIMIZER_MODE to ALL_ROWS or FIRST_ROW. To set OPTIMIZER_MODE or PARTITION_VIEW_ENABLED you can also use the ALTER SESSION statement.

Within a UNION ALL view there are multiple select statements, and each of these is called a "branch". A UNION ALL view is a partition view if each select statement it defines conforms to the following rules:

The branch has exactly one table in the FROM clause.
The branch contains a WHERE clause that defines the subset of data from the partition that is contained in the view.
None of the following are used within the branch: WHERE clause with subquery, group by, aggregate functions, distinct, rownum, connect by/start with.
The SELECT list of each branch is *, or explicit expansion of "*".
The column names and column datatypes for all branches in the UNION ALL view are exactly the same.
All tables used in the branch must have indexes (if any) on the same columns and number of columns.

Partition elimination is based on column transitivity with constant predicates. The WHERE clause used in the query that accesses the partition view is pushed down to the WHERE clause of each of the branches in the UNION ALL view definition. Consider the following example:

   SELECT * FROM EMP_VIEW WHERE deptno=30;

When the view EMP_VIEW is defined as:

   SELECT * FROM EMP@d10 WHERE deptno=10 
     UNION ALL 
   SELECT * FROM EMP@d20 WHERE deptno=20 
     UNION ALL 
   SELECT * FROM EMP@d30 WHERE deptno=30 
     UNION ALL 
   SELECT * FROM EMP@d40 WHERE deptno=40

The "WHERE deptno=30" predicate used in the query is pushed down to the queries in the UNION ALL view. For a WHERE clause such as "WHERE deptno=10 and deptno=30", the optimizer applies transitivity rules to generate an extra predicate of "10=30". This extra predicate is always false, thus the table (EMP@d10) need not be accessed.

Transitivity applies to predicates which conform to the following rules:

The predicates in the WHERE clause for each branch are of the form:
```
RELATION AND RELATION ... 
```
where relation is of the form
```
COLUMN_NAME RELOP CONSTANT_EXPRESSION 
```
and relop is one of =, !=, >, >=, <, <=

Note:
BETWEEN ... AND is allowed by these rules, but IN is not.
At least one predicate in the query referencing the view exists in the same form.

EXPLAIN PLAN Output

To confirm that the system recognizes a partition view, check the EXPLAIN PLAN output. The following operations will appear in the OPERATIONS column of the EXPLAIN PLAN output, if a query was executed on a partition view:

VIEW

This entry should include the optimizer cost in the COST column.

UNION-ALL

This entry should specify PARTITION in the OPTION column.

FILTER

When an operation is a child of the UNION-ALL operation, FILTER indicates that a constant predicate was generated that will always be FALSE. The partition will be eliminated.

If PARTITION does not appear in the option column of the UNION-ALL operation, the partition view was not recognized, and no partitions were eliminated. Make sure that the UNION ALL view adheres to the rules as defined in "Rules for Use" .

Partition View Example

The following example shows a partition view CUSTOMER that is partitioned into two partitions. The EAST database contains the East Coast customers, and the WEST database contains the customers from the West Coast.

The WEST database contains the following table CUSTOMER_WEST:

   CREATE TABLE CUSTOMER_WEST 
   ( cust_no   NUMBER CONSTRAINT CUSTOMER_WEST_PK PRIMARY KEY, 
     cname     VARCHAR2(10), 
     location  VARCHAR2(10) 
    );

The EAST database contains the database CUSTOMER_EAST:

   CREATE TABLE CUSTOMER_EAST 
   ( cust_no   NUMBER CONSTRAINT CUSTOMER_EAST_PK PRIMARY KEY, 
     cname     VARCHAR2(10), 
     location  VARCHAR2(10) 
    );

The following partition view is created at the EAST database (you could create a similar view at the WEST database):

   CREATE VIEW customer AS 
     
   SELECT * FROM CUSTOMER_EAST 
   WHERE location='EAST'  
   UNION ALL 
   SELECT * FROM CUSTOMER_WEST@WEST 
   WHERE location='WEST';

If you execute the following statement, notice that the CUSTOMER_WEST table in the WEST database is not accessed:

EXPLAIN PLAN FOR SELECT * FROM CUSTOMER WHERE location='EAST';

Note:
The EAST database still needs column name and column datatype information for the CUSTOMER_WEST table, therefore it still needs a connection to the WEST database. In addition that the cost-based optimizer must be used. You could do this, for example, by issuing the statement ALTER SESSION SET OPTIMIZER_MODE=ALL_ROWS.

As shown in the EXPLAIN PLAN output, the optimizer recognizes that the CUSTOMER_WEST partition need not be accessed:

SELECT LPAD(' ',LEVEL*3-3)||OPERATION OPERATION,COST,OPTIONS,
OBJECT_NODE, OTHER 
FROM PLAN_TABLE 
CONNECT BY PARENT_ID = PRIOR ID 
START WITH PARENT_ID IS NULL 
    

OPERATION                 COST OPTIONS    OBJECT_NOD OTHER 
------------------------- ---- ---------- ---------- ------------------------- 
SELECT STATEMENT             1 
   VIEW                      1 
      UNION-ALL                PARTITION 
         TABLE ACCESS        1 FULL 
         FILTER 
            REMOTE           1            WEST.WORLD SELECT "CUST_NO","CNAME", 
                                                     "LOCATION" FROM "CUSTOMER 
                                                     _WEST" "CUSTOMER_WEST" WH 
                                                     ERE "LOCATION"='EAST' AND 
                                                      "LOCATION"='WEST'

Distributed Query Restrictions

Distributed queries within the same version of Oracle have these restrictions:

Cost-based optimization should be used for distributed queries. Rule-based optimization does not generate nested loop joins between remote and local tables when the tables are joined with equijoins.
In cost-based optimization, no more than 20 indexes per remote table are considered when generating query plans. The order of the indexes varies; if the 20-index limitation is exceeded, random variation in query plans may result.
Reverse indexes on remote tables are not visible to the optimizer. This can prevent nested-loop joins from being used for remote tables if there is an equijoin using a column with only a reverse index.
Cost-based optimization cannot recognize that a remote object is partitioned. Thus the optimizer may generate less than optimal plans for remote partitioned objects, particularly when partition pruning would have been possible, had the object been local.
Remote views are not merged and the optimizer has no statistics for them. It is best to replicate all mergeable views at all sites to obtain good query plans. (See the following exception.)
Neither cost-based nor rule-based optimization can execute joins remotely. All joins are executed at the driving site. This can affect performance for CREATE TABLE ... AS SELECT if all the tables in the select list are remote. In this case you should create a view for the SELECT statement at the remote site.

Transparent Gateways

The Transparent Gateways are used to access data from other data sources (relational databases, hierarchical databases, file systems, and so on). Transparent Gateways provide a means to transparently access data from a non-Oracle system, just as if it were another Oracle database.

Optimizing Heterogeneous Distributed SQL Statements

When a SQL statement accesses data from non-Oracle systems, it is said to be a heterogeneous distributed SQL statement. To optimize heterogeneous distributed SQL statements, follow the same guidelines as for optimizing distributed SQL statements that access Oracle databases only. However, you must take into consideration that the non-Oracle system usually does not support all the functions and operators that Oracle8 supports. The Transparent Gateways therefore tell Oracle (at connect time) which functions and operators they do support. If the other data source does not support a function or operator, Oracle will perform that function or operator. In this case Oracle obtains the data from the other data source and applies the function or operator locally. This affects the way in which the SQL statements are decomposed and can affect performance, especially if Oracle is not on the same machine as the other data source.

Gateways and Partition Views

You can use partition views with Oracle Transparent Gateways version 8 or higher. Make sure you adhere to the rules that are defined in "Rules for Use". In particular:

The cost-based optimizer must be used, by using hints or setting the parameter OPTIMIZER_MODE to ALL_ROWS or FIRST_ROWS.
Indexes used for each partition must be the same. Please consult your gateway specific installation and users guide to find out whether the gateway will send index information of the non-Oracle system to the Oracle Server. If the gateway will send index information to the optimizer, make sure that each partition uses the same number of indexes, and that you have indexed the same columns. If the gateway does not send index information, the Oracle optimizer will not be aware of the indexes on partitions. Indexes are therefore considered to be the same for each partition in the non-Oracle system. If one partition resides on an Oracle server, you cannot have an index defined on that partition.
The column names and column datatypes for all branches in the UNION ALL view must be the same. Non-Oracle system datatypes are mapped onto Oracle datatypes. Make sure that the datatypes of each partition that resides in the different non-Oracle systems all map to the same Oracle datatype. To see how datatypes are mapped onto Oracle datatypes, you can execute a DESCRIBE command in SQL*Plus or Server Manager.

Summary: Optimizing Performance of Distributed Queries

You can improve performance of distributed queries in several ways:

Choose the best SQL statement

In many cases there are several SQL statements which can achieve the same result. If all tables are on the same database, the difference in performance between these SQL statements might be minimal; but if the tables are located on different databases, the difference in performance might be more significant.
Use cost-based optimization

Cost-based optimization can use indexes on remote tables, considers more execution plans than rule-based optimization, and generally gives better results. With cost-based optimization performance of distributed queries is generally satisfactory. Only in rare occasions is it necessary to change SQL statements, create views, or use procedural code.
Use views

In some situations, views can be used to improve performance of distributed queries; for example:
- To join several remote tables on the remote database
- To send a different table through the network
- Use procedural code
In some rare occasions it can be more efficient to replace a distributed query by procedural code, such as a PL/SQL procedure or a precompiler program. This option is mentioned here only for completeness, not because it is often needed.

VIEW	This entry should include the optimizer cost in the COST column.
UNION-ALL	This entry should specify PARTITION in the OPTION column.
FILTER	When an operation is a child of the UNION-ALL operation, FILTER indicates that a constant predicate was generated that will always be FALSE. The partition will be eliminated.

8 Tuning Distributed Queries