June 1st 2020
Benchmark a Hybrid Transactional and Analytical RDBMS (Photo: Sawitre)
By Sergio Ferragut and Monte Zweben
Have you ever waited overnight for that report from yesterday’s sales? Or maybe you longed for the updated demand forecast that predicts inventory requirements from real-time point-of-sale and order management data. We are always waiting for our analytics. And worse yet, it usually takes weeks to request changes to our reports. To add insult to injury, you keep getting taxed for the increasing costs of the specialized analytics database.
But what if this were no longer necessary? What if you could consolidate the analytics to run on the same database as the operational workloads? Your first response is probably, “you’re nuts. We moved the analytics off the database because it was never fast enough.” That used to be true, but no longer. Now it is possible to reduce your database costs by consolidating your analytics onto a different kind of SQL RDBMS called an Hybrid Transactional Analytical Platform (HTAP). Some HTAP databases can be specialized, engineered systems that may cost millions of dollars. But there is a new kind of HTAP on a scale-out architecture. This architecture distributes data across multiple servers, and performs computations on different engines on each server to achieve scale and performance for hybrid workloads.
HTAP is a term coined by Gartner to describe the emerging database technologies and applications that can deliver both online transaction processing (OLTP) and analytical processing (OLAP) to produce richer application experiences and more business value. Forrester calls this the Translytical Data Platform. Without this kind of platform, businesses are forced to transform and move the data out of their application database(s) through a complex pipeline that utilizes operational data stores, data warehouses and data marts in order to eventually enable analysis and data science. This is extremely time consuming and causes delays in:
- Producing reports
- Updating features for deployed machine learning models resulting in predictions made on old data
- Machine learning model re-training, resulting in less accurate models.
In this blog, we show how to measure the performance of HTAP workloads. To do so we combine the TPC-C and TPC-H benchmarks running concurrently on the same set of tables. TPC-C emulates wholesale retail business users taking sales orders, processing payments, doing stock level checks, and recording deliveries with high concurrency on a high volume of transactional data. The TPC-H benchmark consists of 22 complex analytical queries that scan large amounts of historic data with relatively low concurrency.
TPC-C New Order Transaction
The transactions being measured in the workload runs we tested are not single SQL statements, but actually complex application logic consisting of a series of select, insert and update statements packaged in an ACID transaction that is either committed or rolled back as a whole. Each New Order transaction consists of an average of ten different line items and requires an average of 46 SQL statements to process.
The new order transaction begins with a customer and district lookup in order to get the discount, credit and tax info as well as reserve an order id which are all needed to process the new order. We use a python cell here to store the results of the SQL and set up some BeakerX variables for polyglot access:
Next we update the incremented next order id for the district, create a new order entry in both ORDER and NEW_ORDER tables, and check the warehouse stock level for the items being ordered:
Next, the application logic calculates order totals and stock replenishment needs for each item in the order, updates the stock record, and creates an order line record. In the sample code we only do this once, but in the actual benchmark, it occurs for each line item in the order, on average 10 per order.
The TPC-H benchmark was designed to test decision support systems that provide insights into business operations. Here, TPC-H query #2 searches for the suppliers in Europe that have the minimum European price for size 15, brass parts:
This requires that we first calculate the current minimum price across Europe and then find the brass parts in size 15 that are supplied at that minimum price in Europe. This requires multiple joins and aggregations across large datasets. This kind of query achieves higher levels of throughput by horizontally scaling with parallel distributed processing.
The Splice Machine RDBMS has separate workers for OLTP and OLAP workloads. To achieve the following results, we utilized an AWS cluster consisting of 8 m5.2xlarge nodes that was split between 4 OLTP workers and 4 OLAP workers and ran with 100 concurrent TPC-C users and 8 concurrent TPC-H users. Splice Machine offers a free trial cluster utilizing the same hardware configuration and the included HTAP notebook that allows you to easily experiment with other concurrency configurations. We ran ten-minute long HTAP workloads, measuring total counts by transaction category by workload and recorded all of the response times.
The first two charts are cumulative counts of completed transactions. The X-axis measures the elapsed test time in seconds. In the TPC-C chart, counts are split by request type, where each transaction measured is a set of many SQL statements; 2I 3U 4S in the legend means 2 inserts, 3 updates, 4 selects as minimum operation counts. In many transactions, the SQL statement count grows along with the item count growth in the order and the average order size is 10 items. In the TPC-H Queries chart, we count the total number of executed queries. These are complex queries that involve table scans, large joins and aggregations, subqueries, and other costly query structures.
The latency chart shows us how transaction response times vary throughout the execution of the workloads. The 99th percentile line shows that 99 percent of all transactions ran in less than slightly over a second. The 50th percentile line shows that 50% of all queries consistently run in under a ¼ of a second.
In summary, on a 4×4 Splice Machine database, at 100 TPC-C users and 8 TPC-H users, running for 10 minutes we saw 191,789 TPC-C transactions and 14 TPC-H queries complete, with an SLA of 99% of queries running in under 1.085 seconds for all but a single initial spike at 3.3 seconds due to test initialization overhead.
Next, we wanted to understand how this would scale if we doubled the resources. We set up an 8 OLTP x 8 OLAP Splice Machine database and re-ran the same test with 100 TPC-C users and 8 TPC-H users for 10 minutes:
More computational resources This resulted in significant growth in throughput on both workloads:
- TPC-C transactions = 279,480 (+46%)
- TPC-H queries = 32 Qs (+128%)
- SLAs were also better, with 99% of transactions running in under 0.79 seconds.
TPC-H growth seemed better than expected and TPC-C not as much as expected, so we ran another test to try to understand this better. Our theory was that 100 TPC-C users were not saturating the Splice Machine OLTP workers so we increased the concurrency on TPC-C to 200 and ran for another 10 minutes:
The results of this test were:
- TPC-C transaction = 330,728 ( +72% as compared to 1st run)
- TPC-H queries = 24 (+71% as compared to 1st run)
- SLAs — 99% of transactions in under 1.3 seconds
confirming that we can still increase throughput on an 8×8 by requesting more transactions.
We were able to run concurrent TPC-C and TPC-H workloads while maintaining steady transaction response times. We also tested increasing resources by doubling the OLTP and OLAP workers and saw a nearly linear improvement in both workloads. The drop in TPC-H result in the final test, however, led us to investigate why this happened.
By inspecting the resource consumption across all services, we realized that the default Kubernetes configuration using CPU Requests (as opposed to CPU Limits) allows one service to consume CPU above its allotted amount by grabbing unused CPU allocated for other services.
Variable resource allocation could be a good or bad thing depending on your needs. Given variable workloads, we may want to dynamically use the resources of the OLTP workers for OLAP work, like we saw in the second test or to reclaim CPU allocated for OLAP work as we increase the work on the OLTP side. On the other hand, we might want to try fixed CPU limits for the OLAP workers in order to guarantee OLTP SLAs. Further information on available configurations, how to tune them, and their impact will be a fruitful area to explore in a future blog post.