What
is a cluster table?
|
More Resources by
Google: |
|
|
|
|
Provided
By:
John Kazerooni
A cluster is a schema object that contains one or more tables that all have one or more
columns in common. Rows of one or more tables that share the same value in these common columns are
physically stored together within the database.
Generally,
you should only cluster tables that are frequently joined on the cluster key columns in SQL
statements. Clustering multiple tables improves the performance of joins, but it is likely to reduce
the performance of full table scans, INSERT statements, and UPDATE statements that modify cluster
key values. Before clustering, consider its
Cluster Keys
The
columns defined by the CREATE CLUSTER command make up the cluster key. These cluster columns must
correspond in both datatype and size to columns in each of the clustered tables, although they need
not correspond in name.
Types
of Clusters
A
cluster can be either an indexed cluster or a hash cluster.
Indexed Clusters
In
an indexed cluster, Oracle stores together rows having the same cluster key value. Each distinct
cluster key value is stored only once in each data block, regardless of the number of tables and
rows in which it occurs. This saves disk space and improves performance for many operations.
You
may want to use indexed clusters in the following cases:
·
Your queries retrieve rows
over a range of cluster key values.
·
Your clustered tables may
grow unpredictably.
After
you create an indexed cluster, you must create an index on the cluster key before you can issue any
data manipulation language (DML) statements against a table in the cluster. This index is called the
cluster index.
A
cluster index provides quick access to rows within a cluster based on the cluster key. If you issue
a SQL statement that searches for a row in the cluster based on its cluster key value, Oracle
searches the cluster index for the cluster key value and then locates the row in the cluster based
on its ROWID.
Hash Clusters
In
a hash cluster, Oracle stores together rows that have the same hash key value. The hash value for a
row is the value returned by the cluster's hash function. When you create a hash cluster, you can
either specify a hash function or use the Oracle internal hash function. Hash values are not
actually stored in the cluster, although cluster key values are stored for every row in the cluster.
You
may want to use hash clusters in the following cases:
·
Your queries retrieve rows
based on equality conditions involving all cluster key columns.
·
Your clustered tables are
static or you can determine the maximum number of rows and the maximum amount of space required by
the cluster when you create the cluster.
The
hash function provides access to rows in the table based on the cluster key value. If you issue a
SQL statement that locates a row in the cluster based on its cluster key value, Oracle applies the
hash function to the given cluster key value and uses the resulting hash value to locate the
matching rows. Because multiple cluster key values can map to the same hash value, Oracle must also
check the row's cluster key value. This process often results in less I/O than the process for the
indexed cluster, because the index search is not required.
Oracle's
internal hash function returns values ranging from 0 to the value of HASHKEYS - 1. If you specify a
column with the HASH IS clause, the column values need not fall into this range. Oracle divides the
column value by the HASHKEYS value and uses the remainder as the hash value. The hash value for null
is HASHKEYS - 1. Oracle also rounds the HASHKEYS value up to the nearest prime number to obtain the
actual number of hash values. This rounding reduces the likelihood of hash collisions, or multiple
cluster key values having the same hash value.
You
cannot create a cluster index for a hash cluster, and you need not create an index on a hash cluster
key.
If
you cannot fit all rows for one hash value into a data block, do not use hash clusters. Performance
is very poor in this circumstance because an insert or update of a row in a hash cluster with a size
exceeding the data block size fills the block and performs row chaining to contain the rest of the
row.
Cluster
Size
Oracle
uses the value of the SIZE parameter to determine the space reserved for rows corresponding to one
cluster key value or one hash value. This space then determines the maximum number of cluster or
hash values stored in a data block. If the SIZE value is not a divisor of the data block size,
Oracle uses the next largest divisor. If the SIZE value is larger than the data block size, Oracle
uses the operating system block size, reserving at least one data block per cluster or hash value.
Oracle
also considers the length of the cluster key when determining how much space to reserve for the rows
having a cluster key value. Larger cluster keys require larger sizes. To see the actual size, query
the KEY_SIZE column of the USER_CLUSTERS data dictionary view. This does not apply to hash clusters
because hash values are not actually stored in the cluster.
Although
the maximum number of cluster and hash key values per data block is fixed on a per `-cluster basis,
Oracle does not reserve an equal amount of space for each cluster or hash key value. Varying this
space stores data more efficiently, because the data stored per cluster or hash key value is rarely
fixed.
A
SIZE value smaller than the space needed by the average cluster or hash key value may require the
data for one cluster key or hash key value to occupy multiple data blocks. A SIZE value much larger
results in wasted space.
When you create a hash cluster, Oracle
immediately allocates space for the cluster based on the values of the SIZE and HASHKEYS parameters.
Adding Tables
to a Cluster
You
can add tables to an existing cluster by issuing a CREATE TABLE statement with the CLUSTER clause. A
cluster can contain as many as 32 tables, although the performance gains of clustering are often
lost in clusters of more than four or five tables.
All
tables in the cluster have the cluster's storage characteristics as specified by the PCTUSED,
PCTFREE, INITRANS, MAXTRANS, TABLESPACE, and STORAGE parameters.
Example
I
The
following statement creates an indexed cluster named PERSONNEL with the cluster key column
DEPARTMENT_NUMBER, a cluster size of 512 bytes, and storage parameter values:
CREATE
CLUSTER personnel
( department_number NUMBER(2) )
SIZE 512
STORAGE (INITIAL 100K NEXT 50K PCTINCREASE 10);
The
following statements add the EMP and DEPT tables to the cluster:
CREATE
TABLE emp
(empno NUMBER
PRIMARY KEY,
ename VARCHAR2(10)
NOT NULL
CHECK (ename = UPPER(ename)),
job VARCHAR2(9),
mgr NUMBER
REFERENCES scott.emp(empno),
hiredate DATE
CHECK (hiredate >= SYSDATE),
sal NUMBER(10,2)
CHECK (sal > 500),
comm NUMBER(9,0)
DEFAULT NULL,
deptno NUMBER(2)
NOT NULL )
CLUSTER personnel (deptno);
CREATE
TABLE dept
(deptno NUMBER(2),
dname VARCHAR2(9),
loc VARCHAR2(9))
CLUSTER personnel (deptno);
The
following statement creates the cluster index on the cluster key of PERSONNEL:
CREATE
INDEX idx_personnel ON CLUSTER personnel;
After
creating the cluster index, you can insert rows into either the EMP or DEPT tables.
Example
II
The
following statement creates a hash cluster named PERSONNEL with the cluster key column
DEPARTMENT_NUMBER, a maximum of 503 hash key values, each of size 512 bytes, and storage parameter
values:
CREATE
CLUSTER personnel
(
department_number NUMBER )
SIZE
512 HASHKEYS 500
STORAGE (INITIAL 100K NEXT 50K
PCTINCREASE 10);
Because
the above statement omits the HASH IS clause, Oracle uses the internal hash function for the
cluster.
Example
III
The
following statement creates a hash cluster named PERSONNEL with the cluster key made up of the
columns HOME_AREA_CODE and HOME_PREFIX, and uses a SQL expression containing these columns for the
hash function:
CREATE
CLUSTER personnel
( home_area_code NUMBER,
home_prefix NUMBER )
HASHKEYS 20
HASH IS MOD(home_area_code + home_prefix, 101);
|