1
Parallel Processing and Parallel Databases

This chapter introduces parallel processing and parallel database technologies. Both offer great advantages for Online Transaction Processing (OLTP) and Decision Support Systems (DSS). The administrator's challenge is to selectively deploy these technologies to fully use their multiprocessing powers.

To do this successfully you must understand how multiprocessing works, what resources it requires, and when you can and cannot effectively apply it. This chapter answers the following questions:

What Is Parallel Processing?

This section defines parallel processing and describes its use.

Parallel Processing Defined

Parallel processing divides a large task into many smaller tasks and executes the smaller tasks concurrently on several nodes. As a result, the larger task completes more quickly.

Note:
A node is a separate processor, often on a separate machine. Multiple processors, however, can reside on a single machine.

Some tasks can be effectively divided and are good candidates for parallel processing. For example, in a bank with one teller, customers must form one line to be served. With two tellers, the task of waiting on customers can be effectively split between the two tellers so customers are served twice as fast. This is an instance where parallel processing is an effective solution.

Not all tasks can be effectively divided. Assume that for the previous example, the bank manager must approve all loan requests. In this case, parallel processing does not necessarily improve service. No matter how many tellers are available to process loans, all requests must form a single line for bank manager approval. No amount of parallel processing can overcome this built-in restriction.

The following figures contrast sequential processing of a single parallel query with parallel processing of the same query. Figure 1-1 illustrates sequential processing in which a query executes as a single task.

Figure 1-1 Sequential Processing of a Single Task

Figure 1-2 illustrates parallel processing in which a query is divided into multiple, smaller tasks, and each component task executes on a separate instance.

Figure 1-2 Parallel Processing: Executing Component Tasks in Parallel

These figures contrast sequential processing with parallel processing of multiple independent tasks from online transaction processing (OLTP) environments. Figure 1-3 shows sequential processing of multiple independent tasks.

Figure 1-3 Sequential Processing of Multiple Independent Tasks

Figure 1-4 shows parallel processing of independent tasks.

Figure 1-4 Parallel Processing: Executing Independent Tasks in Parallel

In sequential processing, independent tasks compete for a single resource. Only task 1 runs without having to wait. Task 2 must wait until task 1 completes. Task 3 must wait until tasks 1 and 2 complete, and so on. Although the figure shows the independent tasks as the same size, the sizes of the tasks may vary.

By contrast, if you have a parallel server on a symmetric multiprocessor you can assign more CPU power to the tasks depending on how your CPUs are partitioned. Each independent task executes immediately on its own processor; no wait time is involved.

Problems of Parallel Processing

Effective implementation of parallel processing involves two challenges:

Structuring tasks so some tasks execute at the same time "in parallel"
Preserving task sequencing for tasks that must execute serially

Characteristics of a Parallel System

A parallel processing system has the following characteristics:

Each processor in a system can perform tasks concurrently
Tasks may need to be synchronized
Nodes usually share resources, such as data, disks, and other devices

Parallel Processing for SMPs and MPPs

Parallel processing architectures support:

Clustered and massively parallel processing (MPP) hardware where each node has its own memory
Single memory systems, also known as "symmetric multiprocessing" (SMP) hardware, where multiple processors use one memory resource

Clustered and MPP machines have multiple memories, with each node typically having its own memory. Such systems promise significant price and performance benefits by using commodity memory and bus components to eliminate memory bottlenecks.

Database management systems supporting only one type of hardware limit the portability of applications, the potential to migrate applications to new hardware systems, and the scalability of applications. Oracle Parallel Server (OPS) exploits both clusters and MPP systems, and has no such limitations. Oracle without the Parallel Server Option exploits single CPU or SMP machines.

Parallel Processing for Integrated Operations

Parallel database software must effectively deploy the system's processing power to handle diverse applications such as online transaction processing (OLTP) applications, decision support system (DSS) applications, and mixtures of OLTP and DSS systems or "hybrid" systems.

OLTP applications are characterized by short transactions with low CPU and I/O usage. DSS applications have large transactions, with high CPU and I/O usage.

Parallel database software is often specialized, usually to serve as query processors. Since they are designed to serve a single function, however, specialized servers do not provide a common foundation for integrated operations. These include online decision support, batch reporting, data warehousing, OLTP, distributed operations, and high availability systems. Specialized servers have been used most successfully in the area of very large databases: in DSS applications, for example.

Versatile parallel database software should offer excellent price/performance on open systems hardware, and serve a wide variety of enterprise computing needs. Features such as online backup, data replication, portability, interoperability, and support for a wide variety of client tools can enable a parallel server to support application integration, distributed operations, and mixed application workloads.

What Is a Parallel Server?

A variety of hardware architectures allow multiple computers to share access to data, software, or peripheral devices. A parallel server is designed to take advantage of such architectures by running multiple instances that "share" a single physical database. In appropriate applications, a parallel server allows access to a single database by users on multiple machines with increased performance in terms of speedup and improved scaleup to process larger workloads.

A parallel server processes transactions in parallel by servicing a stream of transactions using multiple CPUs on different nodes where each CPU processes an entire transaction. Using parallel data manipulation language (PDML), one transaction can be executed by multiple nodes. This is an efficient approach because many applications consist of online insert and update transactions that tend to have short data access requirements. In addition to balancing the workload among CPUs, the parallel database provides concurrent access to data and ensures data integrity.

What Are the Key Elements of Parallel Processing?

This section describes key elements of parallel processing:

Speedup and Scaleup: the Goals of Parallel Processing

You can measure the performance goals of parallel processing in terms of two important properties:

Speedup
Scaleup

Speedup

Speedup is the extent to which more hardware can perform the same task in less time than the original system. With added hardware, speedup holds the task constant and measures time savings. Figure 1-5 shows how each parallel hardware system performs half of the original task in half the time required to perform it on a single system.

Figure 1-5 Speedup

With good speedup, additional processors reduce system response time. You can measure speedup using this formula:

Where:

Time_Parallel

is the elapsed time spent by a larger, parallel system on the same task.

For example, if the original system took 60 seconds to perform a task, and two parallel systems took 30 seconds, then the value of speedup would equal 2.

However, you may not experience direct, linear speedup. Instead, speedup may be more logarithmic. That is, assume the system can perform a task of size "x" in a time duration of "n". But for a task of size 2x, the system may require a time duration of 3n.

Note:
For most OLTP applications, no speedup can be expected: only scaleup. The overhead due to synchronization can, in fact, cause speed-down.

Scaleup

Scaleup is the factor that expresses how much more work can be done in the same time period by a larger system. With added hardware, a formula for scaleup holds the time constant, and measures the increased size of the job which can be done.

Figure 1-6 Scaleup

If transaction volumes grow and you have good scale-up, you can keep response time constant by adding hardware resources such as CPUs.

You can measure scaleup using this formula:

Where:

Volume_Parallel

is the transaction volume processed in a given amount of time on a parallel system.

For example, if the original system processes 100 transactions in a given amount of time and the parallel system processes 200 transactions in this amount of time, then the value of scaleup would be equal to 2. A value of 2 indicates the ideal of linear scaleup: when twice as much hardware can process twice the data volume in the same amount of time.

Synchronization: A Critical Success Factor

Coordination of concurrent tasks is called synchronization. Synchronization is necessary for correctness. The key to successful parallel processing is to divide tasks so very little synchronization is necessary. The less synchronization necessary, the better the speedup and scaleup.

In parallel processing among nodes, having a high-speed interconnect among the parallel processors is helpful. The overhead of this synchronization can be very expensive if a great deal of inter-node communication is necessary. For parallel processing within a node, messaging is not necessary: shared memory is used instead. Messaging and locking between nodes is handled by the Integrated Distributed Lock Manager (IDLM).

The amount of synchronization depends on the amount of resources and the number of users and tasks working on the resources. Little synchronization may be needed to coordinate a small number of concurrent tasks, but significant synchronization may be necessary to coordinate many concurrent tasks.

Overhead

A great deal of time spent in synchronization indicates high contention for resources.

Note:
Too much time spent in synchronization can diminish the benefits of parallel processing. With less time spent in synchronization, better speedup and scaleup can be achieved.

Response time equals time spent waiting and time spent doing useful work. Table 1-1 illustrates how overhead increases as more concurrent processes are added. If 3 processes request a service at the same time, and they are served serially, then response time for process 1 is 1 second. Response time for process 2 is 2 seconds (waiting 1 second for process 1 to complete, then being serviced for 1 second). Response time for process 3 is 3 seconds (2 seconds waiting time plus 1 second service time).

Table 1-1 Increased Overhead with Increased Processes

Process Number	Service Time	Waiting Time	Response Time
1	1 second	0 seconds	1 second
2	1 second	1 second	2 seconds
3	1 second	2 seconds	3 seconds

One task, in fact, may require multiple messages. If tasks must continually wait to synchronize, then several messages may be needed per task.

Cost of Synchronization

While synchronization is a necessary element of parallel processing to preserve correctness, you need to manage its cost in terms of performance and system resources. Different types of parallel processing software may permit synchronization, but a given approach may or may not be cost-effective.

Sometimes you can achieve synchronization very inexpensively. In other cases the cost of synchronization may be too high. For example, if one table takes inserts from many nodes, a lot of synchronization is necessary. There may be high contention from the different nodes to insert into the same datablock: the datablock must be passed between the different nodes. This type of synchronization can be done, but the overhead in some cases might be significant.

Locking

Locks are resource control mechanisms that synchronize tasks. Many different types of locking mechanisms are required to synchronize tasks required by parallel processing.

The Integrated Distributed Lock Manager (Integrated DLM, or IDLM) is the internal locking facility used with OPS. It coordinates resource sharing between nodes running a parallel server. The instances of a parallel server use the IDLM to communicate with each other and coordinate modification of database resources. Each node operates independently of other nodes, except when contending for the same resource.

The IDLM allows applications to synchronize access to resources such as data, software, and peripheral devices, so concurrent requests for the same resource are coordinated among applications running on different nodes.

The IDLM performs the following services for applications:

Keeps track of the current "ownership" of a resource
Accepts lock requests for resources from application processes
Notifies the requesting process when a lock on a resource is available
Notifies processes blocking a resource that they should release or downgrade a lock
Obtains access to a resource for a process

See Also:
Chapter 7, "Overview of Locking Mechanisms", for a discussion of locking mechanisms internal to the Oracle database, and Chapter 8, "Integrated Distributed Lock Manager".

Messaging

Parallel processing performs best when you have fast and efficient communication among nodes. The optimal type of system to have is one with high bandwidth and low latency that efficiently communicates with the IDLM. Bandwidth is the total size of messages that can be sent per second. Latency is the time in seconds that it takes to place a message on the interconnect and receive a response.

Most MPP systems and clusters have networks with reasonably high bandwidths. Latency, on the other hand, is an operating system issue that is mostly influenced by interconnect software and interconnect protocols. MPP systems, and most clusters, characteristically use interconnects with high bandwidth and low latency; other clusters may use Ethernet connections with relatively low bandwidth and high latency.

What Are the Benefits of Parallel Processing?

Parallel processing can benefit certain types of applications by providing:

You can achieve improved response time either by breaking up a large task into smaller components or by reducing wait time, as shown in Figure 1-3.

Table 1-2 shows which types of workload can attain speedup and scaleup with properly implemented parallel processing.

Table 1-2 Speedup and Scaleup with Different Workloads

Workload	Speedup	Scaleup
OLTP	No	Yes
DSS	Yes	Yes
Batch (Mixed)	Possible	Yes
Parallel Query	Yes	Yes

Enhanced Throughput: Scaleup

If tasks can run independently of one another, they can be distributed to different CPUs or nodes and you can achieve scaleup: more processes can run through the database in the same amount of time.

If processes can run ten times faster, then the system can accomplish ten times more in the original amount of time. The parallel query feature, for example, permits scaleup: a system might maintain the same response time if the data queried increases tenfold, or if more users can be served. OPS without the parallel query feature also provides scaleup, but by running the same query sequentially on different nodes.

With a mixed workload of DSS, OLTP, and reporting applications, you can achieve scaleup by running multiple programs on different nodes. You can also achieve speedup by rewriting the batch programs and separating them into a number of parallel streams to take advantage of the multiple CPUs that are available.

Improved Response Time: Speedup

DSS applications and parallel query can attain speedup with parallel processing: each transaction can run faster. For OLTP applications, however, no speedup can be expected: only scaleup. With OLTP applications, each process is independent. Even with parallel processing, each insert or update on an order table still runs at the same speed. In fact, the overhead due to synchronization may cause a slight speed-down. Since each of the operations is small, it is inappropriate to attempt to parallelize them; the synchronization overhead would be greater than the benefit.

You can also achieve speedup with batch processing. The degree of speedup, however, depends on the cost and amount of synchronization between tasks.

What Are the Benefits of Parallel Databases?

Parallel database technology can benefit certain kinds of applications by enabling:

Higher Performance

With more CPUs available to an application, higher speedup and scaleup can be attained. The performance improvement depends on the degree of inter-node locking and synchronization activities. Each lock operation is processor- and message-intensive; there can be a lot of latency. The volume of lock operations and database contention, as well as the throughput and performance of the IDLM, ultimately determine the scalability of the system.

High Availability

Nodes are isolated from each other, so a failure at one node does not bring the entire system down. One of the surviving nodes recovers the failed node and the system continues to provide data access to users. This means data is much more available than it would be with a single node upon node failure. This also amounts to significantly higher database availability.

Greater Flexibility

An OPS environment is extremely flexible. You can allocate or deallocate instances as necessary. For example, as database demand increases, you can temporarily allocate more instances. Then you can deallocate the instances and use them for other purposes once they are no longer required.

More Users

Parallel database technology can make it possible to overcome memory limits, enabling a single system to serve thousands of users.

Do You Need Parallel Server?

This section describes the following Oracle configurations that deliver high performance for different types of applications:

The parallel server is one of several Oracle options that provide high-performance relational databases serving many users. You can combine these configurations to suit your needs. A parallel server can be one of several servers in a distributed database environment, and the client-server configuration can combine various Oracle configurations into a hybrid system to meet specific application requirements.

Note:
Support for any given Oracle configuration is platform-dependent; check to confirm that your platform supports the configuration you want.

For optimal performance, configure your system according to your particular application requirements and available resources, then design and tune the database and applications to make the best use of the configuration. Consider also the migration of existing hardware or software to the new system or to future systems.

The following sections help you determine which Oracle configuration best meets your needs.

Single Instance with Exclusive Access

Figure 1-7 illustrates a single instance database system running on a symmetric multiprocessor (SMP). The database itself is located on a set of disks.

Figure 1-7 Single Instance Database System

A single instance accessing a single database can improve performance by running on a larger computer. A large single computer does not require coordination between several nodes and generally performs better than two small computers in a multinode system. However, two small computers often cost less than one large one.

The cost of redesigning and tuning your database and applications for OPS might be significant if you want to migrate from a single computer to a multi-node system. In situations like this, consider whether a larger, single computer might be a better solution than moving to a parallel server.

See Also:
Oracle8i Concepts for more information about single instance Oracle.

Multi-Instance Database Systems

Figure 1-8 illustrates the OPS option running on a cluster or MPP. This configuration is also referred to as a "multi-instance database system". OPS is an excellent solution for applications that can be configured to minimize the passing of data between instances on different nodes.

Figure 1-8 Multi-Instance Database System

This system requires the LMD, LCK, and BSP processes as well as foreground processes on each instance. These processes coordinate global locking by communicating directly from one instance to another by way of the interconnect.

Note:
BSP is the Block Server Process that only exists in an OPS environment. BSP manages out-going messages to requesting nodes as well as the transmission of consistent read blocks from one node to another. The LMD process manages in-coming lock requests for instances holding locks needed by other instances.

See Also:
For more information about BSP, please refer to "Cache Fusion Processing and the Block Server Process" and Chapter 20, "Cache Fusion and Inter-instance Performance".

In OPS, instances are decoupled from databases. In exclusive mode, there is a one-to-one correspondence of instance to database. In shared (parallel) mode, however, there can be many instances to a single database.

In general, any single application performs best when it has exclusive access to a database on a larger system, as compared with its performance on a smaller node of a multinode environment. This is because the cost of synchronization automatically increases if you migrate to a multinode environment. The performance difference depends on characteristics of that application and all other applications sharing access to the database.

Applications with one or both of the following characteristics are well suited to run on separate instances of a parallel server:

Applications that primarily only read (not update) data

Applications that either change disjoint groups of datablocks or change the same datablocks at different times

Distributed Database Systems

You can link several Oracle servers and databases to form a distributed database system. This configuration includes multiple databases, each of which is accessed directly by a single server and which can be accessed indirectly by other instances through server-to-server cooperation. You can use each node for database processing, but the data is permanently partitioned among the nodes. A parallel server, in contrast, has multiple instances with direct access to one database.

Note:
Oracle Parallel Server can one component of a distributed database.

Figure 1-9 illustrates a distributed database system. This database system requires the RECO background process on each instance. There is no LCK, LMON, or LMD background process because this is not an OPS configuration, and the Integrated Distributed Lock Manager is not needed.

Figure 1-9 Distributed Database System

The multiple databases of a distributed system can be treated as one logical database, because servers can access remote databases transparently using Net8.

If you can partition your data into multiple databases with minimal overlap, you can use a distributed database system instead of a parallel server, sharing data between the databases, as mentioned, with Net8. A parallel server provides automatic data sharing among nodes through the common database.

A distributed database system allows data to reside at several widely separated sites. Users can access data from geographically separated databases providing network connections exist between the separate nodes. A parallel server requires all data to be at a single site because of the requirement for low latency, high bandwidth communication among nodes. But a parallel server can also be part of a distributed database system as illustrated in Figure 1-10.

Figure 1-10 Oracle Parallel Server as Part of a Distributed Database

Multiple databases require separate database administration, and a distributed database system requires coordinated administration of the databases and network protocols. A parallel server can consolidate several databases to simplify administrative tasks.

Multiple databases can provide greater availability than a single instance accessing a single database, because an instance failure in a distributed database system does not prevent access to data in the other databases: only the database owned by the failed instance is inaccessible. A parallel server, however, allows continued access to all data when one instance fails, including data accessed by the instance running on the failed node.

A parallel server accessing a single consolidated database avoids the need for distributed updates, inserts, or deletions and more expensive two-phase commits by allowing a transaction on any node to write to multiple tables simultaneously, regardless of which nodes usually write to those tables.

See Also:
Oracle8i Backup and Recovery Guide for more information about instance recovery and Oracle8i Distributed Database Systems for more information about Oracle distributed database features.

Client-Server Systems

Any Oracle configurations can run in a client-server environment. In Oracle, a client application runs on a remote computer using Net8 to access an Oracle server through a network. The performance of this configuration is typically limited to the power of the single server node.

Figure 1-11 illustrates an Oracle client-server system.

Figure 1-11 Client-Server System

Note:
Client-server processing is suitable for any Oracle configuration. Check your Oracle platform-specific documentation to see whether it is implemented on your platform.

The client-server configuration allows you to off-load processing from the computer that runs an Oracle server. If you have too many applications running on one machine, you can off-load them to improve performance. However, if your database server is reaching its processing limits you might want to move either to a larger machine or to a multinode system.

For compute-intensive applications, you could run some applications on one node of a multinode system while running Oracle and other applications on another node or on several other nodes. In this way you could use various nodes of a parallel machine as client nodes and one as a server node.

If the database has several distinct, high-throughput parts, a parallel server running on high-performance nodes can provide quick processing for each part of the database while also handling occasional access across parts.

A client-server configuration requires that the network convey all communications between clients and the database. This may not be appropriate for high-volume communications as is required for many batch applications.

What Is the Role of Parallel Execution?

With its parallel execution features, Oracle can divide the work of processing certain types of SQL statements among multiple query server processes.

OPS provides the framework for parallel execution to work between nodes. Parallel execution features behave the same way in Oracle with or without the Parallel Server Option. The only difference is that OPS enables multiple nodes to execute on behalf of a single query or other parallel operation.

In some applications, notably data warehousing applications, individual queries consume a great deal of CPU resources and require significant disk I/O, unlike most online insert or update transactions. To take advantage of multiprocessing systems, the data server must parallelize individual queries into units of work that can be processed simultaneously. Figure 1-12 shows an example of parallel query processing.

Figure 1-12 Example of Parallel Execution Processing

If the query was not processed in parallel, disks would be read serially with a single I/O. A single CPU would have to scan all rows in the LINE_ITEMS table and total the revenues across all rows. With the query parallelized, disks are read in parallel, with multiple I/Os. Several CPUs can scan a part of the table in parallel and aggregate the results. Parallel query benefits not only from multiple CPUs but also from more of the available I/O bandwidth.

1Parallel Processing and Parallel Databases