HeatWave User Guide  /  HeatWave Quickstarts  /  tpch Analytics Quickstart

8.2 tpch Analytics Quickstart

The tpch Analytics Quickstart shows how to import data into the MySQL DB System using the MySQL Shell Parallel Table Import Utility, manually loading data into HeatWave, and running queries.

This quickstart contains the following sections:

tpch Prerequisites

Review the HeatWave Quickstart Requirements.

Generating tpch Sample Data

Examples in this Quickstart use the tpch sample database, which is an ad-hoc decision support database derived from the TPC Benchmark™ H (TPC-H) specification. For an overview of the tpch schema, refer to the Logical Database Design section of the specification document.

The following instructions describe how to generate tpch sample data using the dbgen utility. The instructions assume you are on a Linux system that has gcc and make libraries installed.

To generate tpch sample data:

  1. Download the TPC-H tools zip file from TPC Download Current.

  2. Extract the zip file to a location on your system.

  3. Change to the dbgen directory and make a copy of the makefile template.

    Press CTRL+C to copy
    $> cd 2.18.0/dbgen $> cp makefile.suite makefile
  4. Configure the following settings in the makefile:

    Press CTRL+C to copy
    $> CC = gcc $> DATABASE= ORACLE $> MACHINE = LINUX $> WORKLOAD = TPCH
  5. Run make to build the dbgen utility:

    Press CTRL+C to copy
    $> make
  6. Issue the following dbgen command to generate a 1GB set of data files for the tpch database:

    Press CTRL+C to copy
    $> ./dbgen -s 1

    The operation may take a few minutes. When finished, the following data files appear in the working directory, one for each table in the tpch database:

    Press CTRL+C to copy
    $> ls -1 *.tbl customer.tbl lineitem.tbl nation.tbl orders.tbl partsupp.tbl part.tbl region.tbl supplier.tbl

Creating the tpch Sample Database and Importing Data

This topic describes how to create the tpch sample database on the MySQL DB System and import the sample data. The sample data must be available on the MySQL DB System before it can be loaded into the HeatWave Cluster.

Sample database creation and import operations are performed using MySQL Shell. The MySQL Shell Parallel Table Import Utility provides fast data import for large data files. The utility analyzes an input data file, divides it into chunks, and uploads the chunks to the target MySQL DB System using parallel connections. The utility is capable of completing a large data import many times faster than a standard single-threaded upload using a LOAD DATA statement. For additional information, see Parallel Table Import Utility.

To create the tpch sample database on the MySQL DB System and import data:

  1. Start MySQL Shell and connect to the MySQL DB System endpoint:

    Press CTRL+C to copy
    $> mysqlsh --mysql Username@DBSystem_IP_Address_or_Host_Name

    The --mysql option opens a ClassicSession, which is required when using the MySQL Shell Parallel Table Import Utility.

    MySQL Shell opens in JavaScript execution mode by default.

    Press CTRL+C to copy
    MySQL>JS>
  2. Change the MySQL Shell execution mode from JavaScript to SQL:

    Press CTRL+C to copy
    MySQL>JS> \sql
  3. Create the tpch sample database and tables:

    Press CTRL+C to copy
    mysql> CREATE DATABASE tpch character set utf8mb4; mysql> USE tpch; mysql> CREATE TABLE nation ( N_NATIONKEY INTEGER primary key, N_NAME CHAR(25) NOT NULL, N_REGIONKEY INTEGER NOT NULL, N_COMMENT VARCHAR(152)); mysql> CREATE TABLE region ( R_REGIONKEY INTEGER primary key, R_NAME CHAR(25) NOT NULL, R_COMMENT VARCHAR(152)); mysql> CREATE TABLE part ( P_PARTKEY INTEGER primary key, P_NAME VARCHAR(55) NOT NULL, P_MFGR CHAR(25) NOT NULL, P_BRAND CHAR(10) NOT NULL, P_TYPE VARCHAR(25) NOT NULL, P_SIZE INTEGER NOT NULL, P_CONTAINER CHAR(10) NOT NULL, P_RETAILPRICE DECIMAL(15,2) NOT NULL, P_COMMENT VARCHAR(23) NOT NULL ); mysql> CREATE TABLE supplier ( S_SUPPKEY INTEGER primary key, S_NAME CHAR(25) NOT NULL, S_ADDRESS VARCHAR(40) NOT NULL, S_NATIONKEY INTEGER NOT NULL, S_PHONE CHAR(15) NOT NULL, S_ACCTBAL DECIMAL(15,2) NOT NULL, S_COMMENT VARCHAR(101) NOT NULL); mysql> CREATE TABLE partsupp ( PS_PARTKEY INTEGER NOT NULL, PS_SUPPKEY INTEGER NOT NULL, PS_AVAILQTY INTEGER NOT NULL, PS_SUPPLYCOST DECIMAL(15,2) NOT NULL, PS_COMMENT VARCHAR(199) NOT NULL, primary key (ps_partkey, ps_suppkey) ); mysql> CREATE TABLE customer ( C_CUSTKEY INTEGER primary key, C_NAME VARCHAR(25) NOT NULL, C_ADDRESS VARCHAR(40) NOT NULL, C_NATIONKEY INTEGER NOT NULL, C_PHONE CHAR(15) NOT NULL, C_ACCTBAL DECIMAL(15,2) NOT NULL, C_MKTSEGMENT CHAR(10) NOT NULL, C_COMMENT VARCHAR(117) NOT NULL); mysql> CREATE TABLE orders ( O_ORDERKEY INTEGER primary key, O_CUSTKEY INTEGER NOT NULL, O_ORDERSTATUS CHAR(1) NOT NULL, O_TOTALPRICE DECIMAL(15,2) NOT NULL, O_ORDERDATE DATE NOT NULL, O_ORDERPRIORITY CHAR(15) NOT NULL, O_CLERK CHAR(15) NOT NULL, O_SHIPPRIORITY INTEGER NOT NULL, O_COMMENT VARCHAR(79) NOT NULL); mysql> CREATE TABLE lineitem ( L_ORDERKEY INTEGER NOT NULL, L_PARTKEY INTEGER NOT NULL, L_SUPPKEY INTEGER NOT NULL, L_LINENUMBER INTEGER NOT NULL, L_QUANTITY DECIMAL(15,2) NOT NULL, L_EXTENDEDPRICE DECIMAL(15,2) NOT NULL, L_DISCOUNT DECIMAL(15,2) NOT NULL, L_TAX DECIMAL(15,2) NOT NULL, L_RETURNFLAG CHAR(1) NOT NULL, L_LINESTATUS CHAR(1) NOT NULL, L_SHIPDATE DATE NOT NULL, L_COMMITDATE DATE NOT NULL, L_RECEIPTDATE DATE NOT NULL, L_SHIPINSTRUCT CHAR(25) NOT NULL, L_SHIPMODE CHAR(10) NOT NULL, L_COMMENT VARCHAR(44) NOT NULL, primary key(L_ORDERKEY,L_LINENUMBER));
  4. Verify that the tpch schema and tables were created:

    Press CTRL+C to copy
    MySQL>SQL> SHOW TABLES; +----------------+ | Tables_in_tpch | +----------------+ | customer | | lineitem | | nation | | orders | | part | | partsupp | | region | | supplier | +----------------+
  5. Change back to JavaScript execution mode to use the Parallel Table Import Utility:

    Press CTRL+C to copy
    MySQL>SQL> \js
  6. Execute the following operations to import the data into the tpch database on the MySQL DB System.

    Note

    For information about the util.importTable() options used in the following commands, see Parallel Table Import Utility. The number of parallel threads specified using the threads option depends on the number of CPU cores of the shape. It is assumed that sample data fields are terminated by the pipe "|" character.

    Press CTRL+C to copy
    MySQL>JS> util.importTable("nation.tbl",{table: "nation", fieldsTerminatedBy:"|", bytesPerChunk:"100M", threads:16}) MySQL>JS> util.importTable("region.tbl",{table: "region", fieldsTerminatedBy:"|", bytesPerChunk:"100M", threads:16}) MySQL>JS> util.importTable("part.tbl",{table: "part", fieldsTerminatedBy:"|", bytesPerChunk:"100M", threads:16}) MySQL>JS> util.importTable("supplier.tbl",{table: "supplier", fieldsTerminatedBy:"|" , bytesPerChunk:"100M", threads:16}) MySQL>JS> util.importTable("partsupp.tbl",{table: "partsupp", fieldsTerminatedBy:"|", bytesPerChunk:"100M", threads:16}) MySQL>JS> util.importTable("customer.tbl",{table: "customer", fieldsTerminatedBy:"|", bytesPerChunk:"100M", threads:16}) MySQL>JS> util.importTable("orders.tbl",{table: "orders", fieldsTerminatedBy:"|", bytesPerChunk:"100M", threads:16}) MySQL>JS> util.importTable("lineitem.tbl",{table: "lineitem", fieldsTerminatedBy:"|", bytesPerChunk:"100M", threads:16})

Loading tpch Data Into HeatWave MySQL

To load the tpch sample data into the HeatWave Cluster:

Note

For HeatWave on AWS, load data into HeatWave using the HeatWave Console. See Manage Data in HeatWave with Workspaces in the HeatWave on AWS Service Guide.

  1. Start MySQL Shell and connect to the MySQL DB System endpoint:

    Press CTRL+C to copy
    $> mysqlsh Username@DBSystem_IP_Address_or_Host_Name

    MySQL Shell opens in JavaScript execution mode by default.

    Press CTRL+C to copy
    MySQL>JS>
  2. Change the MySQL Shell execution mode to SQL:

    Press CTRL+C to copy
    MySQL>JS> \sql
  3. Execute the following operations to prepare the tpch sample database tables and load them into the HeatWave Cluster. The operations performed include defining string column encodings, defining the secondary engine, and executing SECONDARY_LOAD operations.

    Press CTRL+C to copy
    mysql> USE tpch; mysql> ALTER TABLE nation modify `N_NAME` CHAR(25) NOT NULL COMMENT 'RAPID_COLUMN=ENCODING=SORTED'; mysql> ALTER TABLE nation modify `N_COMMENT` VARCHAR(152) COMMENT 'RAPID_COLUMN=ENCODING=SORTED'; mysql> ALTER TABLE nation SECONDARY_ENGINE=RAPID; mysql> ALTER TABLE nation SECONDARY_LOAD; mysql> ALTER TABLE region modify `R_NAME` CHAR(25) NOT NULL COMMENT 'RAPID_COLUMN=ENCODING=SORTED'; mysql> ALTER TABLE region modify `R_COMMENT` VARCHAR(152) COMMENT 'RAPID_COLUMN=ENCODING=SORTED'; mysql> ALTER TABLE region SECONDARY_ENGINE=RAPID; mysql> ALTER TABLE region SECONDARY_LOAD; mysql> ALTER TABLE part modify `P_MFGR` CHAR(25) NOT NULL COMMENT 'RAPID_COLUMN=ENCODING=SORTED'; mysql> ALTER TABLE part modify `P_BRAND` CHAR(10) NOT NULL COMMENT 'RAPID_COLUMN=ENCODING=SORTED'; mysql> ALTER TABLE part modify `P_CONTAINER` CHAR(10) NOT NULL COMMENT 'RAPID_COLUMN=ENCODING=SORTED'; mysql> ALTER TABLE part modify `P_COMMENT` VARCHAR(23) NOT NULL COMMENT 'RAPID_COLUMN=ENCODING=SORTED'; mysql> ALTER TABLE part SECONDARY_ENGINE=RAPID; mysql> ALTER TABLE part SECONDARY_LOAD; mysql> ALTER TABLE supplier modify `S_NAME` CHAR(25) NOT NULL COMMENT 'RAPID_COLUMN=ENCODING=SORTED'; mysql> ALTER TABLE supplier modify `S_ADDRESS` VARCHAR(40) NOT NULL COMMENT 'RAPID_COLUMN=ENCODING=SORTED'; mysql> ALTER TABLE supplier modify `S_PHONE` CHAR(15) NOT NULL COMMENT 'RAPID_COLUMN=ENCODING=SORTED'; mysql> ALTER TABLE supplier SECONDARY_ENGINE=RAPID; mysql> ALTER TABLE supplier SECONDARY_LOAD; mysql> ALTER TABLE partsupp modify `PS_COMMENT` VARCHAR(199) NOT NULL COMMENT 'RAPID_COLUMN=ENCODING=SORTED'; mysql> ALTER TABLE partsupp SECONDARY_ENGINE=RAPID; mysql> ALTER TABLE partsupp SECONDARY_LOAD; mysql> ALTER TABLE customer modify `C_NAME` VARCHAR(25) NOT NULL COMMENT 'RAPID_COLUMN=ENCODING=SORTED'; mysql> ALTER TABLE customer modify `C_ADDRESS` VARCHAR(40) NOT NULL COMMENT 'RAPID_COLUMN=ENCODING=SORTED'; mysql> ALTER TABLE customer modify `C_MKTSEGMENT` CHAR(10) NOT NULL COMMENT 'RAPID_COLUMN=ENCODING=SORTED'; mysql> ALTER TABLE customer modify `C_COMMENT` VARCHAR(117) NOT NULL COMMENT 'RAPID_COLUMN=ENCODING=SORTED'; mysql> ALTER TABLE customer SECONDARY_ENGINE=RAPID; mysql> ALTER TABLE customer SECONDARY_LOAD; mysql> ALTER TABLE orders modify `O_ORDERSTATUS` CHAR(1) NOT NULL COMMENT 'RAPID_COLUMN=ENCODING=SORTED'; mysql> ALTER TABLE orders modify `O_ORDERPRIORITY` CHAR(15) NOT NULL COMMENT 'RAPID_COLUMN=ENCODING=SORTED'; mysql> ALTER TABLE orders modify `O_CLERK` CHAR(15) NOT NULL COMMENT 'RAPID_COLUMN=ENCODING=SORTED'; mysql> ALTER TABLE orders SECONDARY_ENGINE=RAPID; mysql> ALTER TABLE orders SECONDARY_LOAD; mysql> ALTER TABLE lineitem modify `L_RETURNFLAG` CHAR(1) NOT NULL COMMENT 'RAPID_COLUMN=ENCODING=SORTED'; mysql> ALTER TABLE lineitem modify `L_LINESTATUS` CHAR(1) NOT NULL COMMENT 'RAPID_COLUMN=ENCODING=SORTED'; mysql> ALTER TABLE lineitem modify `L_SHIPINSTRUCT` CHAR(25) NOT NULL COMMENT 'RAPID_COLUMN=ENCODING=SORTED'; mysql> ALTER TABLE lineitem modify `L_SHIPMODE` CHAR(10) NOT NULL COMMENT 'RAPID_COLUMN=ENCODING=SORTED'; mysql> ALTER TABLE lineitem modify `L_COMMENT` VARCHAR(44) NOT NULL COMMENT 'RAPID_COLUMN=ENCODING=SORTED'; mysql> ALTER TABLE lineitem SECONDARY_ENGINE=RAPID; mysql> ALTER TABLE lineitem SECONDARY_LOAD;
  4. Verify that the tpch sample database tables are loaded in the HeatWave Cluster by querying LOAD_STATUS data from the HeatWave Performance Schema tables. Loaded tables have an AVAIL_RPDGSTABSTATE load status.

    Press CTRL+C to copy
    MySQL>SQL> USE performance_schema; MySQL>SQL> SELECT NAME, LOAD_STATUS FROM rpd_tables,rpd_table_id WHERE rpd_tables.ID = rpd_table_id.ID; +------------------------------+---------------------+ | NAME | LOAD_STATUS | +------------------------------+---------------------+ | tpch.supplier | AVAIL_RPDGSTABSTATE | | tpch.partsupp | AVAIL_RPDGSTABSTATE | | tpch.orders | AVAIL_RPDGSTABSTATE | | tpch.lineitem | AVAIL_RPDGSTABSTATE | | tpch.customer | AVAIL_RPDGSTABSTATE | | tpch.nation | AVAIL_RPDGSTABSTATE | | tpch.region | AVAIL_RPDGSTABSTATE | | tpch.part | AVAIL_RPDGSTABSTATE | +------------------------------+---------------------+

Running tpch Queries

This topic describes how to query tpch data in the HeatWave Cluster. After tables are loaded into the HeatWave Cluster, queries that qualify are automatically offloaded to the HeatWave Cluster for accelerated processing. To run queries:

Note

For HeatWave on AWS, run queries from the Query Editor in the HeatWave Console. See Running Queries in the HeatWave on AWS Service Guide.

  1. Start MySQL Shell and connect to the MySQL DB System endpoint:

    Press CTRL+C to copy
    $> mysqlsh Username@DBSystem_IP_Address_or_Host_Name

    MySQL Shell opens in JavaScript execution mode by default.

    Press CTRL+C to copy
    MySQL>JS>
  2. Change the MySQL Shell execution mode to SQL:

    Press CTRL+C to copy
    MySQL>JS> \sql
  3. Change to the tpch database:

    Press CTRL+C to copy
    MySQL>SQL> USE tpch; Default schema set to `tpch`.Fetching table and column names from `tpch` for auto-completion... Press ^C to stop.
  4. Before running a query, use EXPLAIN to verify that the query can be offloaded to the HeatWave Cluster. For example:

    Press CTRL+C to copy
    MySQL>SQL> EXPLAIN SELECT SUM(l_extendedprice * l_discount) AS revenue FROM lineitem WHERE l_shipdate >= date '1994-01-01'; *************************** 1. row *************************** id: 1 select_type: SIMPLE table: lineitem partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 56834662 filtered: 33.33 Extra: Using where; Using secondary engine RAPID

    If the query can be offloaded, the Extra column in the EXPLAIN output reports Using secondary engine RAPID.

  5. After verifying that the query can be offloaded, run the query and note the execution time.

    Press CTRL+C to copy
    MySQL>SQL> SELECT SUM(l_extendedprice * l_discount) AS revenue FROM lineitem WHERE l_shipdate >= date '1994-01-01'; +------------------+ | revenue | +------------------+ | 82752894454.9036 | +------------------+ 1 row in set (0.04 sec)
  6. To compare the HeatWave execution time with MySQL DB System execution time, disable the use_secondary_engine variable to see how long it takes to run the same query on the MySQL DB System. For example:

    Press CTRL+C to copy
    MySQL>SQL> SET SESSION use_secondary_engine=OFF; MySQL>SQL> SELECT SUM(l_extendedprice * l_discount) AS revenue FROM lineitem WHERE l_shipdate >= date '1994-01-01'; +------------------+ | revenue | +------------------+ | 82752894454.9036 | +------------------+ 1 row in set (24.20 sec)

    For other tpch sample database queries that you can run, see Additional tpch Queries. For more information about running queries, refer to Section 2.3, “Running Queries”.

Additional tpch Queries

This topic provides additional tpch queries that you can run to test the HeatWave Cluster.

  • TPCH-Q1: Pricing Summary Report Query

    As described in the TPC Benchmark™ H (TPC-H) specification: "The Pricing Summary Report Query provides a summary pricing report for all lineitems shipped as of a given date. The date is within 60 - 120 days of the greatest ship date contained in the database. The query lists totals for extended price, discounted extended price, discounted extended price plus tax, average quantity, average extended price, and average discount. These aggregates are grouped by RETURNFLAG and LINESTATUS, and listed in ascending order of RETURNFLAG and LINESTATUS. A count of the number of lineitems in each group is included."

    Press CTRL+C to copy
    mysql> SELECT l_returnflag, l_linestatus, SUM(l_quantity) AS sum_qty, SUM(l_extendedprice) AS sum_base_price, SUM(l_extendedprice * (1 - l_discount)) AS sum_disc_price, SUM(l_extendedprice * (1 - l_discount) * (1 + l_tax)) AS sum_charge, AVG(l_quantity) AS avg_qty, AVG(l_extendedprice) AS avg_price, AVG(l_discount) AS avg_disc, COUNT(*) AS count_order FROM lineitem WHERE l_shipdate <= DATE '1998-12-01' - INTERVAL '90' DAY GROUP BY l_returnflag , l_linestatus ORDER BY l_returnflag , l_linestatus;
  • TPCH-Q3: Shipping Priority Query

    As described in the TPC Benchmark™ H (TPC-H) specification: "The Shipping Priority Query retrieves the shipping priority and potential revenue, defined as the sum of l_extendedprice * (1-l_discount), of the orders having the largest revenue among those that had not been shipped as of a given date. Orders are listed in decreasing order of revenue. If more than 10 unshipped orders exist, only the 10 orders with the largest revenue are listed."

    Press CTRL+C to copy
    mysql> SELECT l_orderkey, SUM(l_extendedprice * (1 - l_discount)) AS revenue, o_orderdate, o_shippriority FROM customer, orders, lineitem WHERE c_mktsegment = 'BUILDING' AND c_custkey = o_custkey AND l_orderkey = o_orderkey AND o_orderdate < DATE '1995-03-15' AND l_shipdate > DATE '1995-03-15' GROUP BY l_orderkey , o_orderdate , o_shippriority ORDER BY revenue DESC , o_orderdate LIMIT 10;
  • TPCH-Q9: Product Type Profit Measure Query

    As described in the TPC Benchmark™ H (TPC-H) specification: "The Product Type Profit Measure Query finds, for each nation and each year, the profit for all parts ordered in that year that contain a specified substring in their names and that were filled by a supplier in that nation. The profit is defined as the sum of [(l_extendedprice*(1-l_discount)) - (ps_supplycost * l_quantity)] for all lineitems describing parts in the specified line. The query lists the nations in ascending alphabetical order and, for each nation, the year and profit in descending order by year (most recent first). "

    Press CTRL+C to copy
    mysql> SELECT nation, o_year, SUM(amount) AS sum_profit FROM (SELECT n_name AS nation, YEAR(o_ORDERdate) AS o_year, l_extendedprice * (1 - l_discount) - ps_supplycost * l_quantity AS amount FROM part STRAIGHT_JOIN partsupp STRAIGHT_JOIN lineitem STRAIGHT_JOIN supplier STRAIGHT_JOIN orders STRAIGHT_JOIN nation WHERE s_suppkey = l_suppkey AND ps_suppkey = l_suppkey AND ps_partkey = l_partkey AND p_partkey = l_partkey AND o_ORDERkey = l_ORDERkey AND s_nationkey = n_nationkey AND p_name LIKE '%green%') AS profit GROUP BY nation , o_year ORDER BY nation , o_year DESC;

Unloading tpch Tables

tpch tables can be unloaded from HeatWave using the following statements:

Note

For HeatWave on AWS, unload data from HeatWave using the HeatWave Console. See Manage Data in HeatWave with Workspaces in the HeatWave on AWS Service Guide.

Press CTRL+C to copy
mysql> USE tpch; mysql> ALTER TABLE customer SECONDARY_UNLOAD; mysql> ALTER TABLE lineitem SECONDARY_UNLOAD; mysql> ALTER TABLE nation SECONDARY_UNLOAD; mysql> ALTER TABLE orders SECONDARY_UNLOAD; mysql> ALTER TABLE part SECONDARY_UNLOAD; mysql> ALTER TABLE partsupp SECONDARY_UNLOAD; mysql> ALTER TABLE region SECONDARY_UNLOAD; mysql> ALTER TABLE supplier SECONDARY_UNLOAD;