Oracle Sharding 18c – Multi-shard Query Enhancements

In the previous post of the 18c blog post series, we looked at the Oracle RAC Sharding feature that allows you to virtually shard data across instances of an Oracle RAC database. In this post, I will discuss the Multi-shard query enhancements –  in Oracle Sharding 18c.

Multi-shard Query Enhancements

In Oracle Sharding 12.2, multi-shard queries (via proxy routing) supported system-managed sharding method. In Oracle Sharding 18c, multi-shard queries have been enhanced to now support the composite and user-defined sharding methods as well.

For multi-shard queries, one can set different consistency levels by the initialization parameter MULTISHARD_QUERY_DATA_CONSISTENCY.

The query explain plan is also enhanced to display the information for all shards participating in the multi-shard query.

For centralized diagnostics, the SQL SHARDS() clause can be leveraged to query the V$, DBA/USER/ALL views, dictionary tables across all shards.

For example:

Screen Shot 2018-12-04 at 1.13.36 PM

For in-depth collateral on Sharding, visit the Oracle Sharding OTN portal – https://www.oracle.com/goto/sharding

Must-attend Oracle Sharding Sessions of OOW 2018

Here is the list of must-attend sessions on Oracle Sharding at the Oracle OpenWorld 2018. Do update your calendars accordingly.  Great opportunity to meet with the development team 1:1 to learn more about this technology at the Demo booth.

Sharding Sessions - OOW2018

Please see the details on Sharding conference sessions below:

High Availability and Sharding Deep Dive with Next-Generation Oracle Database [TRN4032]

Wei Hu, Vice President, Oracle

Monday, Oct 22, 9:00 a.m. – 9:45 a.m. | Moscone West – Room 3007

Do you need global availability and scalability? Then the next generation of Oracle Database sharding is for you. In this session learn how to design geo-replicated transaction processing systems for linear scalability with total fault isolation. Whether you want to use commodity shared-nothing hardware or span data centers worldwide, the latest enhancements to existing capabilities for planned maintenance, high availability, disaster recovery, and self-repair enable best-in-class availability for Oracle Databases of any size on any platform.

Industrial-strength Microservice Architectures with Next-Generation Oracle Database [TRN5515]

Dominic Giles, Master Product Manager, Oracle

Wei Hu, Vice President, Oracle

Anil Madan, Oracle

Monday, Oct 22, 3:45 p.m. – 4:30 p.m. | Moscone West – Room 3003

Microservice Architectures is a very exciting way to develop modern applications. As with any advance, this architecture can impose significant availability and scalability demands on the underlying infrastructure. This session reviews the challenges and shows how Oracle Database features such as multi-tenant, sharding, and the asynchonous messaging capabilities provided by Oracle Database Advanced Queuing, can simplify the development and operation of industrial-strength microservice architectures. This session will also include real-life examples to illustrate how large scale microservice-based applications are constructed.

Oracle Sharding: Geo-Distributed, Scalable, Multimodel Cloud-Native DBMS [PRO4037]

Mark Dilman, Director, Software Development, Oracle

Srinagesh Battula, Sr. Principal Product Manager, Oracle

Gairik Chakraborty, Senior Director, Database Administration, Epsilon

Tuesday, Oct 23, 12:30 p.m. – 1:15 p.m. | Moscone West – Room 3007

Oracle Database with Oracle Sharding is a globally distributed multimodel (relational and document) cloud-native (and on-premises) DBMS. It is built on shared-nothing architecture in which data is horizontally partitioned across databases that share no hardware or software. It provides linear scalability, fault isolation, and geographic data distribution for shard-amenable applications. Oracle Sharding does all this while rendering the strong consistency, full power of SQL and the Oracle Database ecosystem. Attend this session to learn how you can deploy a sharded database and elastically scale your transactions, database capacity, and concurrent users. The session covers Oracle Sharding 18c features such as user-defined sharding, RAC sharding, Oracle Multitenant support, and more.

Data and Application Modeling in the Brave New World of Oracle Sharding [BUS1845]

John Kanagaraj, Sr. Member of Technical Staff, Architecture, PayPal Inc.

Tuesday, Oct 23, 3:45 p.m. – 4:30 p.m. | Moscone West – Room 3007

The newly introduced Oracle Sharding feature promises horizontal scalability and higher availability for relational applications. However, success or failure in designing and running a sharded application depends on understanding sharding principles and adapting the application to use the right data model. Based on considerable shared experiences with designing, developing, and maintaining sharded applications, this session explores how to qualify and design applications that use the new Oracle Sharding feature.

Oracle Maximum Availability Architecture: Best Practices for Oracle Database 18c [TIP4028]

Lawrence To, Senior Director, MAA Best Practices (Database On-Premise, Exadata, Recovery Appliance and Cloud Database), Oracle

Michael Smith, Consulting Member of Technical Staff, Oracle

Thursday, Oct 25, 11:00 a.m. – 11:45 a.m. | Moscone South – Room 3007

Join Oracle development for the latest updates on high availability best practices in this well-established and heavily attended technical deep-dive session. Learn how to optimize all aspects of Oracle Data Guard 18c. See how to use session draining, application continuity, Oracle Database In-Memory with Oracle RAC, and Oracle GoldenGate with Oracle Data Guard to mask outages and planned maintenance from users and to accelerate time to repair. Hear about the latest high availability best practices with Oracle Multitenant and understand how the new sharded architecture for OLTP applications can achieve even higher levels of high availability and fault isolation. Find out how everything you know about Oracle Maximum Availability Architecture on-premises can be deployed in the cloud.

Using Oracle Sharding on Oracle Cloud Infrastructure [CAS5896]

Charles Baker, Senior Director of Product Management and Strategy, Oracle

Velimir Radanovic, Architect, Oracle

Thursday, Oct 25, 1:00 p.m. – 1:45 p.m. | Moscone South – Room 160

Oracle Sharding offers unlimited scalability, fault isolation, and geographic distribution for web applications. Deployed on the Oracle Cloud, Oracle Sharding provides a cost-efficient way to build high-performance and low-latency applications for OLTP, IoT, DNS analytics; machine learning; and more. Come to this session to learn about how to deploy a sharded database on Oracle Cloud, migrate from a nonsharded to a sharded database, do high-speed data ingestion into the sharded database, deploy microservices using sharding and multitenant features, and much more.

Have a great OOW. We look forward to seeing you there.

Oracle Podcast: Automated Workload Management for Replicated Databases with Oracle Database 12c – Global Data Services

This podcast provides an overview of Global Data Services, a revolutionary new high availability technology introduced with Oracle Database 12c. Hear how Global Data Services helps Active Data Guard and Oracle GoldenGate customers improve availability, performance and ROI with automated workload balancing, routing and service high-availability for geographically distributed, replicated Oracle databases.

Here is the link to the GDS podcast

For more info on GDS, do visit OTN GDS Portal.

Oracle Global Data Services (GDS) Recorded OOW Session

Title: Using Oracle Global Data Services (GDS) for automated workload management across Active Data Guard or Oracle Goldengate replicas

Oracle GoldenGate and Active Data Guard allow for distribution of application workloads across replicated databases, but it is a challenge to efficiently use all the databases for best performance and availability. In the 2014 Oracle OpenWorld, I discussed how the Global Data Services feature of Oracle Database provides locality-based workload routing, load balancing, and service failover across replicas. Learn how you can load-balance read/write workloads in an active/active Oracle GoldenGate configuration and load-balance read-only workloads over Oracle Active Data Guard reader farm, among numerous other use cases.

Here is the link to the recording of the session.

For more info on GDS, please visit GDS portal on Oracle Technology Network.

 

Oracle Sharding 18c Blog Post Series – #3: Oracle RAC Sharding

In the first post of the Oracle Database 18c Sharding features blog post series, I covered User-defined sharding method and discussed in detail on how on it can be leveraged for global data distribution and hybrid cloud deployment. In the second post, I  discussed the PDB Sharding feature with which we can distribute data with Sharding and yet consolidate databases with Multitenant – a best of both worlds. In this post, I will discuss the Oracle RAC Sharding feature that allows you to virtually shard data across instances of an Oracle RAC database.

#3:  Oracle RAC Sharding

Oracle RAC Sharding is a feature that logically establishes affinity for table partitions to  RAC instances. This reduces block pings across instances while yielding better cache affinity.

RACsharding

Figure 1.  High-performance for shard-aware RAC applications with RAC Sharding

This feature which is applicable to non-sharded RAC databases, takes advantage of direct routing API of Oracle Sharding – to specify the partitioning key as part of the check-out of a given connection from the connection pool. The application must use integrated Oracle clients such as Oracle Universal Connection Pool (UCP), Oracle Call Interface (OCI) Session Pool, Oracle Data Provider for .NET (ODP.NET) Connection Pool etc. Apache Tomcat, JBoss, IBM WebSphere and Oracle WebLogic can use UCP support for sharding. PHP, Python, Perl, and Node.js can use OCI support.

Requests that specify partitioning key are routed to the instance that logically holds the corresponding partition while the requests that don’t specify partitioning key still work transparently.

This capability empowers Oracle RAC with the performance and linear scalability of an Oracle Sharded Database with minimal application changes. The application just specifies the sharding key (as part of the connection check-out) for the most performance critical operations.  Here is an example on how to use Oracle Universal Connection Pooling (UCP) Sharding API for RAC Sharding feature.

OracleShardingKey keyMaryEmail =
pds.createShardingKeyBuilder()
.subkey(“mary.smith@xyz.com”, OracleType.VARCHAR2)
.build();

Connection connection =
pds.createConnectionBuilder()
.shardingKey(keyMaryEmail)
.build();

The RAC Sharding feature is enabled on the RAC database by executing the following:

alter system enable affinity <TableName>;

No changes to the database schema are required. There is no requirement to deploy Sharding infrastructure as well.

Note: Application changes which are required to supply the partition key, don’t have to affect all modules of the application – can be done only for the most performance intensive operations. Access via partitioning  key will go to the RAC instance that holds the logical subset of the data. All instances can still access all the data. Requests that don’t provide the partitioning key will be routed based on the default load balancing policy. 

For more information on RAC Sharding, take a look at the following documentation Links:

Oracle RAC Sharding 

https://docs.oracle.com/en/database/oracle/oracle-database/18/racad/workload-management-with-dynamic-database-services.html#GUID-1100990E-18BD-4C92-A0C7-3C08C40AAD14

UCP APIs for Database Sharding Support

https://docs.oracle.com/en/database/oracle/oracle-database/18/jjucp/ucp-sharding-api.html#GUID-3B91E8F7-AE45-4CC8-B87F-6F4FC9EEA001

JDBC Support for Database Sharding

https://docs.oracle.com/en/database/oracle/oracle-database/18/jjdbc/database-sharding.html#GUID-1D7795CA-79DC-452B-9FCC-0EF430F87461

For in-depth collateral on Sharding, visit the Oracle Sharding OTN portal – https://www.oracle.com/goto/sharding and do follow me on Twitter  – https://twitter.com/nageshbattula

In the next blog post, I will discuss the Multi-shard query enhancements in Oracle Sharding 18c.

Oracle Sharding 18c Blog Post Series – #2: PDB Sharding

In the first of the Oracle Database 18c Sharding features blog post series, I covered User-defined sharding method and discussed in detail on how on it can be leveraged for global data distribution and hybrid cloud deployment. In this post, I will discuss the PDB Sharding feature.

#2 PDB Sharding

The multitenant architecture allows an Oracle database to function as a multitenant container database (CDB). A CDB includes zero, one, or many customer-created pluggable databases (PDBs). A PDB is a portable collection of schemas, schema objects, and nonschema objects that appears to an Oracle Net client as a non-CDB.  Read this doc for the background on CDBs.

From Oracle Database 18c onwards, all Oracle databases will be container (CDB) or pluggable (PDB) databases by default. This PDB Sharding feature adds support for the ability to use PDBs as shards or shard catalog databases within the sharding architecture. PDB Sharding capability renders good manageability benefits for a sharded database. For example – consolidation of shards, manage many as one, database upgrades etc.

In Oracle Database 18c, we support shard as a single PDB in a given CDB. In addition to a single shard PDB, a CDB can contain other non-shard PDBs.

 

pdbsharding

Figure 1.  Oracle Sharded Database with pluggable databases as shards

Use ‘add cdb’ and ‘add shard’ GDSCTL commands to add pre-created containers and PDBs to the sharded database.

add cdb -connect <CDB$ROOT_connect_string> -pwd <gsmuser_pwd>

add shard -connect <PDB_connect_string> -shardgroup <sg_name> -pwd <gsmuser_pwd> -cdb <db_unique_name_of_CDB_as_displayed_during_add_cdb>

deploy

Note: The ‘add cdb’ command just adds metadata to the catalog for the CDB so that we know the connect string to CDB$ROOT and the GSMUSER password.  The new “-cdb” option on ‘add shard’ tells the shard catalog that the shard is a PDB and that it lives in the CDB specified by -cdb.

For more information on PDB sharding, please peruse the following documentation links:

Using Oracle Multitenant with Oracle Sharding

https://docs.oracle.com/en/database/oracle/oracle-database/18/shard/sharding-deployment.html#GUID-4FFB5665-B925-4DB6-ACDD-807D6E699365

For in-depth collateral on Sharding, visit the Oracle Sharding OTN portal – https://www.oracle.com/goto/sharding and do follow me on Twitter  – https://twitter.com/nageshbattula

In my next blog post, I will discuss RAC Sharding feature – a capability which empowers Oracle RAC with the performance and scalability of a Sharded database, with minimal application changes.

Oracle Sharding 18c Blog Post Series – #1: User-defined Sharding

When the Oracle Sharding feature was introduced in Oracle Database 12c Release 2, the focus was on internet scale applications that require linear scalability and fault isolation. These applications deal with 100s of millions or billions of users, massive concurrent user base and extremely large databases. With Oracle 18c, Oracle Sharding has been made more inclusive to applications and various other use cases by increasing its flexibility and strategies.

Here is the list of the salient Oracle Database 18c Sharding features:

  1. User-defined Sharding
  2. PDB Sharding
  3. RAC Sharding
  4. JSON & Spatial capabilities and Multi-shard query enhancements
  5. Midtier Sharding

I plan to cover the details of these Sharding features in this blog post series. In the first post, we will discuss the new user-defined sharding method that aids in global data distribution and hybrid cloud deployment.

#1 User-defined Sharding:

In Oracle Database 12c, we introduced two sharding methods – system-managed sharding and composite sharding. 

System-managed sharding is based on partitioning by consistent hash. This sharding method randomly and evenly distributes data across shards and automatically redistributes it when shards are added to or removed from the sharded database. Consistent Hash is good for application where there are millions and even billions of values of the sharding key and it is not practical to manage them individually. This method enables linear scalability of transactions, concurrent users and the database capacity.

With composite sharding method, data is first partitioned by list or range (super_sharding_key) and then further partitioned by consistent hash (sharding_key). The two levels of sharding make it possible to map data to a set of shards, and then automatically maintain balanced distribution of data across that set of shards. Composite sharding is ideal for global data distribution where shards are placed in each geography and within a given geography data is uniformly distributed. Another use case for this method is – a set of powerful shards dedicated to Gold class users and set of low powered shards for Silver class users. Within each class of users, data is uniformly distributed.

But what if there are much less distinct values of the sharding key, maybe thousands, and the customer wants tighter control in mapping data to shards. For such cases, in Oracle Database 18c, we introduced user-defined sharding – a sharding method which is based on partitioning by RANGE or LIST and allows the user to explicitly specify mapping of data to shards. User-defined sharding is ideal for performance, regulatory, or other reasons when the user needs to store related data in the same shard and have full control on moving data between shards.

User-controlled data distribution provides:

  • Regulatory compliance – data remains in country of origin
  • Hybrid clouds – some shards are on-premises and other shards in the cloud
  • Cloud bursting – ability to move data from on-premises to the cloud during peak seasons
  • Increased visibility into planned maintenance – when a shard needs to be brought down for maintenance, the administrator knows exactly which data will not be available
  • Each shard can have different hardware and HA configuration
  • More efficient range queries

With user-defined sharding, the user owns the control to maintain balanced data distribution.

The term shardspace is used to describe a shard or a set of fully replicated shards that is used to store a part of an SDB partitioned by user-defined subset of key values. The shardspaces comprising an SDB may be in the same data center or in different data centers. With user-defined sharding, you can scale by adding additional shardspaces – extend the range or list of sharding_keys. Each shardspace can have different location, platform and replication topology.

UDS1

Figure 1.  Oracle Sharded Database with User-defined sharding

High-Level Steps for Creating a User-defined Sharded Schema

  • Create Shardspaces for each shard and all of its replicas
  • Manually create a tablespace for each partition in the primary shard of a given shardspace
  • Specify partitioning scheme and column that is the sharding key for the root table
    • The DBA explicitly maps partitions to the tablespaces
    • Child and grandchild tables all inherit the same partitioning scheme using the partition by reference clause
  • Specify which tables are duplicated on all shards (common reference data)

Here are the GDSCTL commands to create a sharded table for data sovereignty using the user-defined sharding method. In this example there are three shardspaces: NA_space, EU_space and ASIA_space.

ADD SHARDSPACE –SHARDSPACE NA_space

ADD SHARDSPACE –SHARDSPACE EU_space

ADD SHARDSPACE –SHARDSPACE ASIA_space

CREATE SHARD -SHARDSPACE NA_space -DEPLOY_AS primary  -DESTINATION host01 -CREDENTIAL oracle_cred -REGION NA

CREATE SHARD -SHARDSPACE EU_space -DEPLOY_AS primary -DESTINATION host02 -CREDENTIAL oracle_cred -REGION EU

CREATE SHARD -SHARDSPACE ASIA_space -DEPLOY_AS primary -DESTINATION host03 -CREDENTIAL oracle_cred -REGION ASIA

DEPLOY

UDS2

Figure 2.  Dedicated partitions for different user-defined shardspaces

Now create the shardspace specific tablespaces using SQL.

CREATE TABLESPACE NA_tbs IN SHARDSPACE NA_space;

CREATE TABLESPACE EU_tbs IN SHARDSPACE EU_space;

CREATE TABLESPACE ASIA_tbs IN SHARDSPACE ASIA_space;

Once the shards and the tablespaces are created, you can go ahead and create the sharded tables using SQL as shown below.

CREATE SHARDED TABLE accounts

( id NUMBER, account_number NUMBER , customer_id NUMBER

, branch_id NUMBER , geo VARCHAR2(20), status VARCHAR2(1)
)
PARTITION BY LIST (geo)
( PARTITION p_NA VALUES (‘NA’) TABLESPACE NA_tbs

, PARTITION p_EU VALUES (‘EU’) TABLESPACE EU_tbs

, PARTITION p_ASIA VALUES (‘ASIA’) TABLESPACE ASIA_tbs
);

It is as simple as that. This blog covered how user-defined sharding helps prepare organizations to honor data sovereignty requirements by preventing access to customer data outside of their home legal jurisdiction.

For more information on user-defined sharding, please peruse the following documentation links:

Sharding Methods:

https://docs.oracle.com/en/database/oracle/oracle-database/18/shard/sharding-methods.html#GUID-3B07D91C-CEAA-4170-A94B-ACF47BEE617B

SDB Deployment:

https://docs.oracle.com/en/database/oracle/oracle-database/18/shard/sharding-deployment.html#GUID-F99B8742-4089-4E77-87D4-4691EA932207

For in-depth collateral on Sharding, visit the Oracle Sharding OTN portal – https://www.oracle.com/goto/sharding and do follow me on Twitter  – https://twitter.com/nageshbattula

 

How to use Data Pump to import data into the shards of a Sharded Database (SDB)?

Data Pump Import (impdp)  has a new capability starting from Oracle Database 12.2 to filter appropriate records for loading into individual shards of an SDB. It supports all methods of Sharding – namely System Managed, Composite and User-defined Sharding.

If you have a stand-alone database, then you need to export the table data using the data_options=group_partition_table_data. This unloads all table data at once instead of unloading each table partition independently.

Here is an example that walks you through the steps:

On the Source Database:

SQL>grant DATAPUMP_EXP_FULL_DATABASE to  testuser1;

SQL>create or replace directory def_dir1 as ‘/tmp’

SQL>grant read, write on directory def_dir1 to testuser1;

On the non-sharded database, export the table data using “data_options=group_partition_table_data” as shown below. This unloads all table data at once instead of doing each table partition independently. Data Pump exports all data into the dump file.

$ expdp testuser1/testuser1@src_db                   \

directory=def_dir1                                         \

schemas=testuser1                                         \

dumpfile=no_part%u                                     \

logfile=tkdpshrd_exp.log                              \

data_options=group_partition_table_data \

parallel=4

 

Pre-reqs (on the Target Sharded Database)

On the Shard Catalog

Create the Data Pump directory objects (on shardcat and all shards) and grant appropriate role to the user as shown below.

SQL>alter session enable shard ddl

SQL>create or replace directory def_dir1 as ‘/tmp

SQL>grant read, write on directory def_dir1 TO <user>;

SQL>grant DATAPUMP_IMP_FULL_DATABASE  to  <user>;

<create the sharded tables that match the structure of the source tables>

<Create other schema objects>

On each of the shards:

Execute the import specifying CONTENT=DATA_ONLY.  Because the tables already exist in the shard, attempts to recreate them will fail. The Data Pump Import detects that you are importing into a shard and only load rows that belong to that shard.  You need to run the import command on each shard.

impdp testuser1/testuser1@shard1          \

directory=def_dir1                                 \

dumpfile=no_part%u                             \

logfile=tkdpshrd_imp_01.log                \

table_exists_action=append                 \

content=data_only                                \

parallel=4

At the time of import, the Data Pump filters out rows that do not belong in the given shard. Data Pump passes key values for each row to a internal GSM function that tells  whether the row belongs in that shard.

 

Oracle GDS Licensing

Oracle Global Data Services (GDS) is a feature of Oracle Database 12c that provides connect-time and run-time load balancing, region affinity, replication lag tolerance based workload routing, and enables inter-database service failover across a set of replicas.

Here are the details of how Oracle GDS is licensed.

  • Databases in a GDS configuration:
    • Must be Database Enterprise Edition (EE) +  Licensed for Oracle Active Data Guard Option and/or Oracle GoldenGate
  • GSM Software
    • No separate license is required
  • GDS Catalog Database
    • No Database EE license is required, if this is a schema in an existing repository (e.g. Oracle Enterprise Manager) database
    • No Database EE license is required (same as Oracle RMAN/EM repository license), if hosted as a separate single instance database
    • If Oracle RAC / Data Guard is used, Oracle RAC Option and Database EE license (for the extra nodes and/or standby) is required

For more information on GDS, do visit OTN GDS portal – https://www.oracle.com/goto/gds

AskTOM Office Hours: Oracle Sharding Session (April 23rd, 2018)

Exploring Oracle Sharding:
AskTOM Office Hours offers free, open Q&A sessions with Oracle Database experts. Here is the recording of the Oracle Sharding session that I conducted on April 23rd, 2018.
Description:
Oracle Database with Sharding is a globally distributed multi-model (relational & document) cloud-native (and on-premises) DBMS. It is built on shared-nothing architecture where data is horizontally partitioned across databases that share no hardware or software. It provides linear scalability, fault isolation and geographic data distribution for shard-amenable applications. Sharding does all this while rendering strong consistency, full power of SQL, and the Oracle Database ecosystem. Listen to this recorded session to learn how you can deploy a sharded DB and elastically scale your transactions, database capacity and concurrent users.

 

Highlights:

@1.52 – Introduction to Oracle Sharding

@4:08 – Benefits of Sharding

@6:28 – Architecture and Key Features

@21:02 – Data Modeling and Application Considerations for a Sharded Database (SDB)

@29.51 –  Sharded Schema Creation

@36:14 – Direct Routing against an SDB

@41:23 – Proxy Routing for reporting workload

@48:35 – Results of Oracle Sharding Scalability Characterization study