Oracle8i Administrator's Guide
Release 8.1.5

A67772-01

Library

Product

Contents

Index

Prev Next

18
Managing Hash Clusters

This chapter describes how to manage hash clusters, and includes the following topics:

See Also: Before attempting tasks described in this chapter, familiarize yourself with the concepts in Chapter 12, "Guidelines for Managing Schema Objects".

Guidelines for Managing Hash Clusters

This section describes guidelines to consider before attempting to manage hash clusters, and includes the following topics:

Storing a table in a hash cluster is an optional way to improve the performance of data retrieval. A hash cluster provides an alternative to a non-clustered table with an index or an index cluster. With an indexed table or index cluster, Oracle locates the rows in a table using key values that Oracle stores in a separate index. To use hashing, you create a hash cluster and load tables into it. Oracle physically stores the rows of a table in a hash cluster and retrieves them according to the results of a hash function.

Oracle uses a hash function to generate a distribution of numeric values, called hash values, which are based on specific cluster key values. The key of a hash cluster, like the key of an index cluster, can be a single column or composite key (multiple column key). To find or store a row in a hash cluster, Oracle applies the hash function to the row's cluster key value; the resulting hash value corresponds to a data block in the cluster, which Oracle then reads or writes on behalf of the issued statement.

To find or store a row in an indexed table or cluster, a minimum of two (there are usually more) I/Os must be performed:

In contrast, Oracle uses a hash function to locate a row in a hash cluster; no I/O is required. As a result, a minimum of one I/O operation is necessary to read or write a row in a hash cluster.

See Also: For more information about hash clusters, see Oracle8i Concepts.

Advantages of Hashing

If you opt to use indexing rather than hashing, consider whether to store a table individually or as part of a cluster.

Hashing is most advantageous when you have the following conditions:

Disadvantages of Hashing

Hashing is not advantageous in the following situations:

See Also: For more information about creating hash clusters and specifying hash functions see the Oracle8i SQL Reference.

For information about hash functions and specifying user-defined hash functions, see Oracle8i Concepts.

Even if you decide to use hashing, a table can still have separate indexes on any columns, including the cluster key. See the Oracle8i Application Developer's Guide - Fundamentals for additional recommendations.

Estimate Size Required by Hash Clusters and Set Storage Parameters

As with index clusters, it is important to estimate the storage required for the data in a hash cluster.

Oracle guarantees that the initial allocation of space is sufficient to store the hash table according to the settings SIZE and HASHKEYS. If settings for the storage parameters INITIAL, NEXT, and MINEXTENTS do not account for the hash table size, incremental (additional) extents are allocated until at least SIZE*HASHKEYS is reached. For example, assume that the data block size is 2K, the available data space per block is approximately 1900 bytes (data block size minus overhead), and that the STORAGE and HASH parameters are specified in the CREATE CLUSTER command as follows:

STORAGE (INITIAL 100K
    NEXT 150K
    MINEXTENTS 1
    PCTINCREASE 0)
SIZE 1500
HASHKEYS 100

In this example, only one hash key can be assigned per data block. Therefore, the initial space required for the hash cluster is at least 100*2K or 200K. The settings for the storage parameters do not account for this requirement. Therefore, an initial extent of 100K and a second extent of 150K are allocated to the hash cluster.

Alternatively, assume the HASH parameters are specified as follows:

SIZE 500 HASHKEYS 100

In this case, three hash keys are assigned to each data block. Therefore, the initial space required for the hash cluster is at least 34*2K or 68K. The initial settings for the storage parameters are sufficient for this requirement (an initial extent of 100K is allocated to the hash cluster).

Creating Hash Clusters

After creating a hash cluster, you can create tables in the cluster. A hash cluster is created using the SQL command CREATE CLUSTER. The following statement creates a cluster named TRIAL_CLUSTER that stores the TRIAL table, clustered by the TRIALNO column:

CREATE CLUSTER trial_cluster (trialno NUMBER(5,0))
    PCTUSED 80
    PCTFREE 5
    TABLESPACE users
    STORAGE (INITIAL 250K     NEXT 50K
      MINEXTENTS 1     MAXEXTENTS 3
      PCTINCREASE 0)
    HASH IS trialno HASHKEYS 150;

CREATE TABLE trial (
    trialno         NUMBER(5,0) PRIMARY KEY,
    ...)
    CLUSTER trial_cluster (trialno);

The following sections explain setting the parameters of the CREATE CLUSTER command specific to hash clusters.

See Also: For additional information about creating tables in a cluster, guidelines for setting other parameters of the CREATE CLUSTER command, and the privileges required to create a hash cluster, see "Creating Clusters".

Creating Single Table Hash Clusters

You can also create a single table hash cluster, which provides fast access to rows in a table; however, this table must be the only table in the hash cluster. Essentially, there must be a one-to-one mapping between hash keys and data rows. The following statement creates a single-table hash cluster named PEANUT with the cluster key VARIETY:

CREATE CLUSTER peanut (variety NUMBER)
   SIZE 512 SINGLE TABLE HASHKEYS 500;

Oracle rounds the HASHKEY value up to the nearest prime number, so this cluster has


Note:

The single table option is valid only for hash clusters. HASHKEYS must also be specified.  


a maximum of 503 hash key values, each of size 512 bytes:

See Also: For more information about the CREATE CLUSTER statement, see the Oracle8i SQL Reference.

Controlling Space Use Within a Hash Cluster

When creating a hash cluster, it is important to choose the cluster key correctly and set the HASH IS, SIZE, and HASHKEYS parameters so that performance and space use are optimal. The following guidelines describe how to set these parameters.

Choosing the Key

Choosing the correct cluster key is dependent on the most common types of queries issued against the clustered tables. For example, consider the EMP table in a hash cluster. If queries often select rows by employee number, the EMPNO column should be the cluster key.Iif queries often select rows by department number, the DEPTNO column should be the cluster key. For hash clusters that contain a single table, the cluster key is typically the entire primary key of the contained table.

The key of a hash cluster, like that of an index cluster, can be a single column or a composite key (multiple column key). A hash cluster with a composite key must use Oracle's internal hash function.

Setting HASH IS

Specify the HASH IS parameter only if the cluster key is a single column of the NUMBER datatype, and contains uniformly distributed integers. If the above conditions apply, you can distribute rows in the cluster so that each unique cluster key value hashes, with no collisions, to a unique hash value. If these conditions do not apply, omit this option so that you use the internal hash function.

Setting SIZE

SIZE should be set to the average amount of space required to hold all rows for any given hash key. Therefore, to properly determine SIZE, you must be aware of the characteristics of your data:

Overestimating the value of SIZE increases the amount of unused space in the cluster. If space efficiency is more important than the performance of data retrieval, disregard the above adjustments and use the estimated value for SIZE.

Setting HASHKEYS

For maximum distribution of rows in a hash cluster, Oracle rounds the HASHKEYS value up to the nearest prime number.

Controlling Space in Hash Clusters: Examples

The following examples show how to correctly choose the cluster key and set the HASH IS, SIZE, and HASHKEYS parameters. For all examples, assume that the data block size is 2K and that on average, 1950 bytes of each block is available data space (block size minus overhead).

Example 1  

You decide to load the EMP table into a hash cluster. Most queries retrieve employee records by their employee number. You estimate that the maximum number of rows in the EMP table at any given time is 10000 and that the average row size is 55 bytes.  

 

In this case, EMPNO should be the cluster key. Since this column contains integers that are unique, the internal hash function can be bypassed. SIZE can be set to the average row size, 55 bytes; note that 34 hash keys are assigned per data block. HASHKEYS can be set to the number of rows in the table, 10000, rounded up to the next highest prime number, 10007:  

 

CREATE CLUSTER emp_cluster (empno 
NUMBER)
. . .
SIZE 55
HASH IS empno HASHKEYS 10007;
 

Example 2  

Conditions similar to the previous example exist. In this case, however, rows are usually retrieved by department number. At most, there are 1000 departments with an average of 10 employees per department. Note that department numbers increment by 10 (0, 10, 20, 30, . . . ).  

 

In this case, DEPTNO should be the cluster key. Since this column contains integers that are uniformly distributed, the internal hash function can be bypassed. A pre-estimated SIZE (the average amount of space required to hold all rows per department) is 55 bytes * 10, or 550 bytes. Using this value for SIZE, only three hash keys can be assigned per data block. If you expect some collisions and want maximum performance of data retrieval, slightly alter your estimated SIZE to prevent collisions from requiring overflow blocks. By adjusting SIZE by 12%, to 620 bytes (see previous section about setting SIZE for clarification), only three hash keys are assigned per data block, leaving more space for rows from expected collisions.  

 

HASHKEYS can be set to the number of unique department numbers, 1000, rounded up to the next highest prime number, 1009:  

 

CREATE CLUSTER emp_cluster (deptno NUMBER)
. . .
SIZE 620
HASH IS deptno HASHKEYS 1009;
 

Altering Hash Clusters

You can alter a hash cluster with the SQL ALTER CLUSTER statement:

ALTER CLUSTER emp_dept . . . ;

The implications for altering a hash cluster are identical to those for altering an index cluster. However, note that the SIZE, HASHKEYS, and HASH IS parameters cannot be specified in an ALTER CLUSTER statement. You must re-create the cluster to change these parameters and then copy the data from the original cluster.

See Also: For more information about altering an index cluster, see ""Altering Clusters" .

Dropping Hash Clusters

You can drop a hash cluster using the SQL DROP CLUSTER statement:

DROP CLUSTER emp_dept;

A table in a hash cluster is dropped using the SQL DROP TABLE statement. The implications of dropping hash clusters and tables in hash clusters are the same for index clusters.

See Also: For more information about dropping clusters, see "Dropping Clusters".




Prev

Next
Oracle
Copyright © 1999 Oracle Corporation.

All Rights Reserved.

Library

Product

Contents

Index