Performance is a requirement for all databases, but the requirements for that performance vary greatly, depending on the types of applications accessing the database. Online transaction processing (OLTP) and e-commerce applications are characterized by increasingly large user populations concurrently accessing large volumes of data for short and frequent insert or update transactions. Such environments require support for high throughput, a variety of available indexing strategies, and excellent data concurrency.
Data warehouses are very large databases specifically designed for periodically loading massive amounts of data for frequent and ad-hoc complex queries. A data warehouse has constantly growing data volumes and numbers of accessing applications and users. It should be optimized to perform well for a wide variety of long-running, ad-hoc queries. Such environments require adequate support for query rewrite capabilities, efficient indexing capabilities, a wide selection of partitioning strategies, and extended support for parallel execution.
Oracle Database outperforms SQL Server 2000 on a wide variety of industry and ISV-specific benchmarks, and is recognized as the industry leader in database scalability.
This document describes the technical differences between Oracle Database 10g, the latest version of Oracle Database, and SQL Server 2000 that explain Oracle Database's superior performance in both benchmarks as well as real customer environments. After a short introduction to the differences in platform availability, the document focuses on the major techniques commonly used to ensure good performance and scalability in modern, enterprise-class, relational database systems: concurrency model, indexing, partitioning, parallel execution, and clustering.
Additionally, it gives an introduction to some of the unique features that Oracle Database provides to enhance the performance of ETL processes in Data Warehousing environments.
The main differentiators between the two products are summarized in the table below:
Concurrency Model
Multi-version read consistency
Non-Escalating row-level locking
Shared read locks or dirty reads
Locks escalate
Indexing capabilities
B-Tree indexes
Index-organized Tables
Bitmap indexes
Bitmap Join Indexes
B-Tree indexes
Clustered Indexes
Not supported
Not supported
Partitioning options
Range, hash, list and composite partitioning
Local and global indexes
Not supported
Only local indexes with member tables
Parallel execution
Queries, INSERT, UPDATE, DELETE
Queries only
Clustered configurations
Transparent scalability with Real Application Clusters
Requires data partitioning in member tables and Distributed Partitioned Views
Additional data warehousing capabilities
Materialized Views
MERGE
Multi-table INSERT
Pipelined table Functions
Indexed Views
Not supported
Not supported
Not supported
PLATFORM AVAILABILITY
Even though product portability is not directly linked to performance, software availability across a wide variety of hardware and operating systems enables users to seamlessly upgrade or replace their hardware systems without having to worry about changing, redesigning or rebuilding their applications. In other words, cross-platform availability helps preserve the initial investments in application software and helps deliver performance consistency across multiple platforms.
Oracle Database is available on a large selection of hardware and operating systems, scaling from low-end uni-processor servers to large symetrical multi-processor machines to multi-node clusters. Oracle Database supports all major Unix platforms, including Linux, Microsoft operating systems, and a variety of other systems, including OS/390 mainframes. With Oracle, users are able to upgrade hardware and operating systems without changing or rewriting their applications.
SQL Server 2000 only runs on Microsoft's operating systems. Customers wishing to upgrade hardware are limited to platforms running these systems and must face the cost of converting their systems completely if they ever outgrow the capacity of their platform
CONCURRENCY MODEL
Oracle Database
SQL Server 2000
Multi-version read consistency
Not available
No read locks
Requires shared read locks to avoid dirty reads
No dirty reads
Dirty reads if not using shared locks
Non-escalating row-level locking
Locks escalate
Readers don't block writers
Readers block writers
Writers don't block readers
Writers block readers
Minimal deadlocks under load
Deadlocks can be a serious problem under load
Multi-Version Read Consistency
Oracle's implementation of multi-version read consistency always provides consistent and accurate results. When an update occurs in a transaction, the original data values are recorded in the database's undo records. Oracle uses the current information in the undo records to construct a read-consistent view of a table's data, and to ensure that a version of the information, consistent at the beginning of the uncommitted transaction, can always be returned to any user. Other databases, such as SQL Server 2000, have to choose a workaround to avoid concurrency problems, such as locking data to prevent it from changing while being read or preventing or queries from reading changed but uncommitted information.
SQL Server 2000 does not provide multi-version read consistency. Instead it requires applications to either use shared locks for read operations, with various levels of isolation, or to accept dirty reads. Shared locks prevent data that is read from being changed by concurrent transactions. Clearly, this implementation restricts the ability of the system to properly service concurrent requests in environments involving a mix of reads and writes, as explained in Microsoft's documentation: "SQL Server, in contrast, uses shared locks to ensure that data readers only see committed data. These readers take and release shared locks as they read data. These shared locks do not affect other readers. A reader waits for a writer to commit the changes before reading a record. A reader holding shared locks also blocks a writer trying to update the same data."
A consequence is that "releasing locks quickly for applications that support high numbers of users is more important in SQL Server than in Oracle."
The only alternative developers have to t his problem is to build separate workload environments, where intensive read activities, such as reporting, cannot interfere with on-line transactional applications. Regardless of which approach is used, SQL Server 2000 developers usually have to find some compromise in their application design in order to get acceptable data concurrency and accuracy.
In Oracle, writers and readers never block each other. Oracle's powerful multi-version read consistency allows mixed workload environments to function properly without incurring any performance penalty for the users.
PARTITIONING
Partitioning allows large database structures (tables, indexes, etc.) to be decomposed into smaller and more manageable pieces. Although it is primarily considered a feature for manageability and availability, partitioning also provides a number of performance benefits.
Oracle 's Partitioning Options
Oracle Database offers several table partitioning methods designed to handle different application scenarios7:
Range partitioning uses ranges of column values to map rows to partitions. Partitioning by range is particularly well suited for historical databases. Range partitioning is also the ideal partitioning method to support 'rolling window' operations in a data warehouse.
Hash partitioning uses a hash function on the partitioning columns to stripe data into partitions. Hash partitioning is an effective means of evenly distributing data.
List partitioning allows users to have explicit control over how rows map to partitions. This is done by specifying a list of discrete values for the partitioning column in the description for each partition.
In addition, Oracle supports range-hash and range-list composite partitioning.
SQL Server 2000's Partitioning Options
SQL Server 2000 does not support partitioning as generally defined in the database industry. Instead, it as if from one table. The data is partitioned between the member tables based on ranges of data values in one of the columns, called the partitioning column. The data ranges for each member table are defined in a CHECK constraint specified on the partitioning column. The view uses UNION ALL to combine selects of all the member tables into a single result set.
SQL Server 2000 distinguishes between local and distributed supports partitioned views.
A partitioned view joins horizontally partitioned data from a set of member tables across one or more servers, making the data appear partitioned views. In a local partitioned view, all participating tables and the view reside on the same instance of SQL Server. In a distributed partitioned view, at least one of the participating tables resides on a different (remote) server. Distributed Partitioned Views are further discussed in the Clustering section of this document.
PARALLEL EXECUTION OF OPERATIONS
Parallel execution of SQL operations can vastly improve the performance for operations involving large volumes of data. It helps reduce response time for data-intensive operations on large databases typically associated with decision support systems and data warehouses.
Oracle will execute INSERT, UPDATE, DELETE, and MERGE8 statements in parallel when accessing both partitioned and non-partitioned database objects.
With SQL Server 2000, INSERT, UPDATE, and DELETE statements are executed serially (MERGE is not supported).
CLUSTERING
Clusters are groups of independent servers, or nodes, connected via a private network (called a cluster interconnect), that work collaboratively as a single system. Clusters allow applications to scale beyond the limits imposed by single node systems when processing loads exceed the capacity of large individual servers.
Only Oracle provides real support for clustered configurations, where full and transparent scalability can be obtained by simply adding new nodes as the demand increases. As indicated in Microsoft's documentation, "SQL Server 2000 does not support this type of clustering". Instead, users are forced to use a federation of databases to achieve some kind of scalability. The following sections explain in more details how both architectures differ and the impact on performance and scalability.
Oracle Real Application Clusters
Real Application Clusters (RAC) is the Oracle Database option that supports hardware clusters.
Oracle Real Application Clusters uses a shared disk approach. In a shared disk database architecture, database files are logically shared among the nodes of a loosely coupled system with each instance having access to all the data.
Oracle Real Application Clusters uses the patented Cache FusionTM architecture, a technology that utilizes the interconnected caches of all the nodes in the cluster to satisfy database requests for any type of application (OLTP, DSS, packaged applications). Query requests can now be satisfied both by the local cache as well as any of the other caches. Update operations do not require successive disk write and read operations for synchronization since the local node can obtain the needed block directly from any of the other cluster node's database caches. Oracle Cache FusionTM exploits low latency cluster interconnect protocols to directly ship needed data blocks from the remote node's cache to the local cache. This removes slow disk operations from the critical path of inter-node synchronization. Expensive disk accesses are only performed when none of the caches contain the necessary data and when an update transaction is committed, requiring disk write guarantees. This implementation effectively expands the working set of the database cache and reduces disk I/O operations to dramatically speed up database operations.
The utilization of Cache Fusion in Oracle allows customers to easily take advantage of the scalability provided by Oracle Real Application Clusters, with little or no performance cost. Customers can horizontally scale the database tier as demand grows. Because the full Cache Fusion implementation in Oracle eliminates the latencies associated with disk based cache coordination, applications can now scale effectively without having to be cluster aware.
Additionally, cluster load balancing features enable a large number of user connections to be automatically and transparently distributed among cluster nodes, making optimal use of the overall cluster processing capabilities
SQL Server 2000 Federated Databases Architecture
Distributed partitioned views can be used to implement a federation of database servers10. A federation is a group of servers administered independently, but which cooperate to share the processing load of a system.
With a federated database, the data is divided between the different servers. When a user normally connects to a federated database, they are connected to one server. If the user requests data that happens to reside on a different server, the retrieval takes significantly longer than retrieving data stored on the local server.
When distributing data across multiple servers, there is a performance hit for querying a remote server. Analysis should be done on the type of queries implemented in the OLTP environment to get a baseline on what data is being touched by specific queries. Because there is a certain degree of overhead in running distributed queries, this overhead may in some cases outweigh the benefits of distributing your tables."
Because of this, developers and administrators must build their systems to avoid this occurrence
Create multiple databases, each on a different member server running an instance of SQL Ser. To build a federation of database servers, administrators and developers must13: ver 2000.
Partition the individual tables in the original database so that most related data is placed together on a member server. This may require different methods of distributing the data in the various tables across all the member databases; partitioning some tables; making complete copies of other tables in each member database; and leaving some tables intact on the original server.
Devise data routing rules that can be incorporated in the business services tier, so that applications can send each SQL statement to the member server that stores most of the data required by the statement.
Obviously this implementation process requires an excellent knowledge and a deep understanding of both the application logic and the database layout. When scaling out an existing application, database administrators need to identify which tables are good candidates for being partitioned and which tables should be replicated on all servers. This process assumes that the partition keys on which the partitions are based were correctly selected. Developers then may have to redesign large parts of the business logic of these applications to make sure that each SQL statement is routed to a member server that contains most, if not all, of the data required to process the statement. The end result of these workarounds means that SQL Server 2000's federated database design can only be used for certain applications. The difference in the architecture adopted in the two products has many consequences in terms of performance and scalability.
ADDITIONAL DATA WAREHOUSING CAPABILITIES
Oracle provides several unique features useful in data warehousing environments, in particular during the Extraction, Transformation and Loading (ETL) process. During these phases, various methods are used to access and manipulate source data and load it into a data warehouse.
Merge
The MERGE statement is a new SQL statement that provides the ability to update or insert rows conditionally into a table or a view, depending upon which is needed, reducing the number of application statements and application complexity.
The MERGE statement can be used to select rows from one table for update or insertion into another table. Such operations are frequently used in a number of data warehousing applications where tables need to be periodically refreshed with new data arriving from on-line systems. This new data might contain changes to the existing rows of the warehouse table or might introduce a new row altogether. The MERGE statement typically addresses these types of situations.
MERGE brings significant performance improvement due to the optimization of execution and the reduction of scan and join operations compared to what would be performed using an equivalent sequence of DML statements.
SQL Server 2000 does not support an equivalent of the MERGE statement. Without MERGE, these operations can only be expressed as a sequence of INSERT and UPDATE statements. This approach suffers from deficiencies in both performance and usability.
Multi-Table Inserts
Multi-table INSERTs allow data to be inserted into more than one table using a single SQL statement, which is more efficient than using multiple, separate SQL statements for each table.
This feature is very useful in data warehousing systems, where data is transferred from one or more operational data sources to a set of target tables. Multi-table inserts extend the scope of the INSERT . . . SELECT statement to insert rows into multiple tables as part of a single DML statement.
This new feature brings significant performance improvement19 due to optimization of execution and reduction of scan operations on the source data. No materialization in temporary tables is required and source data is scanned once for the entire operation instead of once for each target table with multiple statements.
Multi-table inserts make SQL more useful for data transformations and conditional handling and allow faster loading and transformations of large volumes of data.
SQL Server 2000 does not support multi-table inserts, meaning that similar operations can only be expressed as a sequence of INSERT statements, requiring more scan operations on the source data.
Pipelined Table Functions
Table functions are functions written in PL/SQL, Java, or C, which can accept cursors or sets of rows as input arguments and produce a set of rows as output.
Oracle allows table functions to pipeline results by returning result rows or subsets of rows incrementally for further SQL processing as soon as they are created, instead of waiting for the whole set to be completely constructed before being returned.
Oracle also provides support for parallel execution of table functions, allowing a set of input rows to be partitioned among multiple instances of a parallel function.
Pipelining and parallel execution of table functions help to improve the performance and scalability of a number of applications. In a data warehouse environment, table functions can be used in the ETL (Extraction-Transformation- Load) process that builds a data warehouse by extracting data from an OLTP system. The extracted data passes through a sequence of transformations before it is loaded into a data warehouse. In Oracle, the transformations can be pipelined and executed in parallel using table functions, avoiding the cost of materializing the outputs of each transformation in intermediate staging tables. The whole process becomes more efficient, more scalable and non interruptive.
SQL Server 2000 does not provide support for pipelined table functions.