Documentation Home
MySQL HeatWave User Guide
Related Documentation Download this Manual
PDF (US Ltr) - 1.0Mb
PDF (A4) - 1.0Mb


MySQL HeatWave User Guide  /  HeatWave Quickstarts  /  tpch Analytics Quickstart

6.1 tpch Analytics Quickstart

The tpch Analytics Quickstart walks you through importing data into the into the DB System using the MySQL Shell Parallel Table Import Utility, manually loading data into HeatWave, and running queries.

The following topics are described:

Prerequisites

Generating tpch Sample Data

Examples in this Quickstart use the tpch sample database, which is an ad-hoc decision support database derived from the TPC Benchmark™ H (TPC-H) specification. For an overview of the tpch schema, refer to the Logical Database Design section of the specification document.

The following instructions describe how to generate tpch sample data using the dbgen utility. The instructions assume you are on a Linux system that has gcc and make libraries installed.

To generate tpch sample data:

  1. Download the TPC-H tools zip file from TPC Download Current.

  2. Extract the zip file to a location on your system.

  3. Change to the dbgen directory and make a copy of the makefile template.

    $> cd 2.18.0/dbgen
    $> cp makefile.suite makefile
  4. Configure the following settings in the makefile:

    CC = gcc
    DATABASE= ORACLE
    MACHINE = LINUX
    WORKLOAD = TPCH
  5. Run make to build the dbgen utility:

    $> make
  6. Issue the following dbgen command to generate a 1GB set of data files for the tpch database:

    $> ./dbgen -s 1

    The operation may take a few minutes. When finished, the following data files appear in the working directory, one for each table in the tpch database:

    $> ls -1 *.tbl
    customer.tbl
    lineitem.tbl
    nation.tbl
    orders.tbl
    partsupp.tbl
    part.tbl
    region.tbl
    supplier.tbl

Creating the Sample Database and Importing Data

This topic describes how to create the tpch sample database on the MySQL DB System and import the sample data. The sample data must be available on the MySQL DB System before it can be loaded into the HeatWave Cluster.

Sample database creation and import operations are performed using MySQL Shell. The MySQL Shell Parallel Table Import Utility provides fast data import for large data files. The utility analyzes an input data file, divides it into chunks, and uploads the chunks to the target MySQL DB System using parallel connections. The utility is capable of completing a large data import many times faster than a standard single-threaded upload using a LOAD DATA statement. For additional information, see Parallel Table Import Utility.

To create the tpch sample database on the MySQL DB System and import data:

  1. Start MySQL Shell and connect to the MySQL DB System's endpoint:

    $> mysqlsh --mysql Username@DBSystem_IP_Address_or_Host_Name

    The --mysql option opens a ClassicSession, which is required when using the MySQL Shell Parallel Table Import Utility.

    MySQL Shell opens in JavaScript execution mode by default.

    MySQL>JS>
  2. Change the MySQL Shell execution mode from JavaScript to SQL:

    MySQL>JS> \sql
  3. Create the tpch sample database and tables:

    CREATE DATABASE tpch character set utf8mb4;
    USE tpch;
    
    CREATE TABLE nation  ( N_NATIONKEY INTEGER primary key,
        N_NAME       CHAR(25) NOT NULL,
        N_REGIONKEY  INTEGER NOT NULL,
        N_COMMENT    VARCHAR(152));
    							
    CREATE TABLE region  ( R_REGIONKEY INTEGER primary key,
        R_NAME       CHAR(25) NOT NULL,
        R_COMMENT    VARCHAR(152));
    							
    CREATE TABLE part  ( P_PARTKEY INTEGER primary key,
        P_NAME        VARCHAR(55) NOT NULL,
        P_MFGR        CHAR(25) NOT NULL,
        P_BRAND       CHAR(10) NOT NULL,
        P_TYPE        VARCHAR(25) NOT NULL,
        P_SIZE        INTEGER NOT NULL,
        P_CONTAINER   CHAR(10) NOT NULL,
        P_RETAILPRICE DECIMAL(15,2) NOT NULL,
        P_COMMENT     VARCHAR(23) NOT NULL );
    						  
    CREATE TABLE supplier  ( S_SUPPKEY INTEGER primary key,
        S_NAME        CHAR(25) NOT NULL,
        S_ADDRESS     VARCHAR(40) NOT NULL,
        S_NATIONKEY   INTEGER NOT NULL,
        S_PHONE       CHAR(15) NOT NULL,
        S_ACCTBAL     DECIMAL(15,2) NOT NULL,
        S_COMMENT     VARCHAR(101) NOT NULL);
    							 
    CREATE TABLE partsupp  ( PS_PARTKEY INTEGER NOT NULL,
        PS_SUPPKEY     INTEGER NOT NULL,
        PS_AVAILQTY    INTEGER NOT NULL,
        PS_SUPPLYCOST  DECIMAL(15,2)  NOT NULL,
        PS_COMMENT     VARCHAR(199) NOT NULL, primary key (ps_partkey, ps_suppkey) );
    						
    CREATE TABLE customer  ( C_CUSTKEY INTEGER primary key,
        C_NAME        VARCHAR(25) NOT NULL,
        C_ADDRESS     VARCHAR(40) NOT NULL,
        C_NATIONKEY   INTEGER NOT NULL,
        C_PHONE       CHAR(15) NOT NULL,
        C_ACCTBAL     DECIMAL(15,2)   NOT NULL,
        C_MKTSEGMENT  CHAR(10) NOT NULL,
        C_COMMENT     VARCHAR(117) NOT NULL);
    							 
    CREATE TABLE orders  ( O_ORDERKEY INTEGER primary key,
        O_CUSTKEY        INTEGER NOT NULL,
        O_ORDERSTATUS    CHAR(1) NOT NULL,
        O_TOTALPRICE     DECIMAL(15,2) NOT NULL,
        O_ORDERDATE      DATE NOT NULL,
        O_ORDERPRIORITY  CHAR(15) NOT NULL,
        O_CLERK          CHAR(15) NOT NULL,
        O_SHIPPRIORITY   INTEGER NOT NULL,
        O_COMMENT        VARCHAR(79) NOT NULL);
    						   
    CREATE TABLE lineitem ( L_ORDERKEY INTEGER NOT NULL,
        L_PARTKEY     INTEGER NOT NULL,
        L_SUPPKEY     INTEGER NOT NULL,
        L_LINENUMBER  INTEGER NOT NULL,
        L_QUANTITY    DECIMAL(15,2) NOT NULL,
        L_EXTENDEDPRICE  DECIMAL(15,2) NOT NULL,
        L_DISCOUNT    DECIMAL(15,2) NOT NULL,
        L_TAX         DECIMAL(15,2) NOT NULL,
        L_RETURNFLAG  CHAR(1) NOT NULL,
        L_LINESTATUS  CHAR(1) NOT NULL,
        L_SHIPDATE    DATE NOT NULL,
        L_COMMITDATE  DATE NOT NULL,
        L_RECEIPTDATE DATE NOT NULL,
        L_SHIPINSTRUCT CHAR(25) NOT NULL,
        L_SHIPMODE     CHAR(10) NOT NULL,
        L_COMMENT      VARCHAR(44) NOT NULL,
        primary key(L_ORDERKEY,L_LINENUMBER));
  4. Verify that the tpch schema and tables were created:

    MySQL>SQL> SHOW TABLES;
    +----------------+
    | Tables_in_tpch |
    +----------------+
    | customer       |
    | lineitem       |
    | nation         |
    | orders         |
    | part           |
    | partsupp       |
    | region         |
    | supplier       |
    +----------------+
  5. Change back to JavaScript execution mode to use the Parallel Table Import Utility:

    MySQL>SQL> \js
  6. Execute the following operations to import the data into the tpch database on the MySQL DB System.

    Note

    For information about the util.importTable() options used in the following commands, see Parallel Table Import Utility. The number of parallel threads specified using the threads option depends on the number of CPU cores of the shape. It is assumed that sample data fields are terminated by the pipe "|" character.

    MySQL>JS> util.importTable("nation.tbl",{table: "nation", fieldsTerminatedBy:"|", 
    bytesPerChunk:"100M", threads:16})
    
    MySQL>JS> util.importTable("region.tbl",{table: "region", fieldsTerminatedBy:"|", 
    bytesPerChunk:"100M", threads:16})
    
    MySQL>JS> util.importTable("part.tbl",{table: "part", fieldsTerminatedBy:"|", 
    bytesPerChunk:"100M", threads:16})
    
    MySQL>JS> util.importTable("supplier.tbl",{table: "supplier", fieldsTerminatedBy:"|" , 
    bytesPerChunk:"100M", threads:16})
    
    MySQL>JS> util.importTable("partsupp.tbl",{table: "partsupp", fieldsTerminatedBy:"|", 
    bytesPerChunk:"100M", threads:16})
    
    MySQL>JS> util.importTable("customer.tbl",{table: "customer", fieldsTerminatedBy:"|", 
    bytesPerChunk:"100M", threads:16})
    
    MySQL>JS> util.importTable("orders.tbl",{table: "orders", fieldsTerminatedBy:"|", 
    bytesPerChunk:"100M", threads:16})
    
    MySQL>JS> util.importTable("lineitem.tbl",{table: "lineitem", fieldsTerminatedBy:"|", 
    bytesPerChunk:"100M", threads:16})

Loading tpch Data Into HeatWave

To load the tpch sample data into the HeatWave Cluster:

  1. Start MySQL Shell and connect to the MySQL DB System's endpoint:

    $> mysqlsh Username@DBSystem_IP_Address_or_Host_Name

    MySQL Shell opens in JavaScript execution mode by default.

    MySQL>JS>
  2. Change the MySQL Shell execution mode to SQL:

    MySQL>JS> \sql
  3. Execute the following operations to prepare the tpch sample database tables and load them into the HeatWave Cluster. The operations performed include defining string column encodings, defining the secondary engine, and executing SECONDARY_LOAD operations.

    USE tpch;
    
    ALTER TABLE nation modify `N_NAME` CHAR(25) NOT NULL COMMENT 'RAPID_COLUMN=ENCODING=SORTED';
    ALTER TABLE nation modify `N_COMMENT` VARCHAR(152) COMMENT 'RAPID_COLUMN=ENCODING=SORTED';
    ALTER TABLE nation SECONDARY_ENGINE=RAPID;
    ALTER TABLE nation SECONDARY_LOAD;
    
    ALTER TABLE region modify `R_NAME` CHAR(25) NOT NULL COMMENT 'RAPID_COLUMN=ENCODING=SORTED';
    ALTER TABLE region modify `R_COMMENT` VARCHAR(152) COMMENT 'RAPID_COLUMN=ENCODING=SORTED';
    ALTER TABLE region SECONDARY_ENGINE=RAPID;
    ALTER TABLE region SECONDARY_LOAD;
    
    ALTER TABLE part modify `P_MFGR` CHAR(25) NOT NULL COMMENT 'RAPID_COLUMN=ENCODING=SORTED';
    ALTER TABLE part modify `P_BRAND` CHAR(10) NOT NULL COMMENT 'RAPID_COLUMN=ENCODING=SORTED';
    ALTER TABLE part modify `P_CONTAINER` CHAR(10) NOT NULL 
    COMMENT 'RAPID_COLUMN=ENCODING=SORTED';
    ALTER TABLE part modify `P_COMMENT` VARCHAR(23) NOT NULL 
    COMMENT 'RAPID_COLUMN=ENCODING=SORTED';
    ALTER TABLE part SECONDARY_ENGINE=RAPID;
    ALTER TABLE part SECONDARY_LOAD;
    
    ALTER TABLE supplier modify `S_NAME` CHAR(25) NOT NULL 
    COMMENT 'RAPID_COLUMN=ENCODING=SORTED';
    ALTER TABLE supplier modify `S_ADDRESS` VARCHAR(40) NOT NULL 
    COMMENT 'RAPID_COLUMN=ENCODING=SORTED';
    ALTER TABLE supplier modify `S_PHONE` CHAR(15) NOT NULL 
    COMMENT 'RAPID_COLUMN=ENCODING=SORTED';
    
    ALTER TABLE supplier SECONDARY_ENGINE=RAPID;
    ALTER TABLE supplier SECONDARY_LOAD;
    ALTER TABLE partsupp modify `PS_COMMENT` VARCHAR(199) NOT NULL 
    COMMENT 'RAPID_COLUMN=ENCODING=SORTED';
    ALTER TABLE partsupp SECONDARY_ENGINE=RAPID;
    ALTER TABLE partsupp SECONDARY_LOAD;
    
    ALTER TABLE customer modify `C_NAME` VARCHAR(25) NOT NULL 
    COMMENT 'RAPID_COLUMN=ENCODING=SORTED';
    ALTER TABLE customer modify `C_ADDRESS` VARCHAR(40) NOT NULL 
    COMMENT 'RAPID_COLUMN=ENCODING=SORTED';
    ALTER TABLE customer modify `C_MKTSEGMENT` CHAR(10) NOT NULL 
    COMMENT 'RAPID_COLUMN=ENCODING=SORTED';
    ALTER TABLE customer modify `C_COMMENT` VARCHAR(117) NOT NULL 
    COMMENT 'RAPID_COLUMN=ENCODING=SORTED';
    
    ALTER TABLE customer SECONDARY_ENGINE=RAPID;
    ALTER TABLE customer SECONDARY_LOAD;
    
    ALTER TABLE orders modify `O_ORDERSTATUS` CHAR(1) NOT NULL 
    COMMENT 'RAPID_COLUMN=ENCODING=SORTED';
    ALTER TABLE orders modify `O_ORDERPRIORITY` CHAR(15) NOT NULL 
    COMMENT 'RAPID_COLUMN=ENCODING=SORTED';
    ALTER TABLE orders modify `O_CLERK` CHAR(15) NOT NULL 
    COMMENT 'RAPID_COLUMN=ENCODING=SORTED';
    ALTER TABLE orders SECONDARY_ENGINE=RAPID;
    ALTER TABLE orders SECONDARY_LOAD;
    
    ALTER TABLE lineitem modify `L_RETURNFLAG` CHAR(1) NOT NULL 
    COMMENT 'RAPID_COLUMN=ENCODING=SORTED';
    ALTER TABLE lineitem modify `L_LINESTATUS` CHAR(1) NOT NULL 
    COMMENT 'RAPID_COLUMN=ENCODING=SORTED';
    ALTER TABLE lineitem modify `L_SHIPINSTRUCT` CHAR(25) NOT NULL 
    COMMENT 'RAPID_COLUMN=ENCODING=SORTED';
    ALTER TABLE lineitem modify `L_SHIPMODE` CHAR(10) NOT NULL 
    COMMENT 'RAPID_COLUMN=ENCODING=SORTED';
    ALTER TABLE lineitem modify `L_COMMENT` VARCHAR(44) NOT NULL 
    COMMENT 'RAPID_COLUMN=ENCODING=SORTED';
    ALTER TABLE lineitem SECONDARY_ENGINE=RAPID;
    ALTER TABLE lineitem SECONDARY_LOAD;
  4. Verify that the tpch sample database tables are loaded in the HeatWave Cluster by querying LOAD_STATUS data from the HeatWave Performance Schema tables. Loaded tables have an AVAIL_RPDGSTABSTATE load status.

    MySQL>SQL> USE performance_schema;
    MySQL>SQL> SELECT NAME, LOAD_STATUS FROM rpd_tables,rpd_table_id
               WHERE rpd_tables.ID = rpd_table_id.ID;
    +------------------------------+---------------------+
    | NAME                         | LOAD_STATUS         |
    +------------------------------+---------------------+
    | tpch.supplier                | AVAIL_RPDGSTABSTATE |
    | tpch.partsupp                | AVAIL_RPDGSTABSTATE |
    | tpch.orders                  | AVAIL_RPDGSTABSTATE |
    | tpch.lineitem                | AVAIL_RPDGSTABSTATE |
    | tpch.customer                | AVAIL_RPDGSTABSTATE |
    | tpch.nation                  | AVAIL_RPDGSTABSTATE |
    | tpch.region                  | AVAIL_RPDGSTABSTATE |
    | tpch.part                    | AVAIL_RPDGSTABSTATE |
    +------------------------------+---------------------+

Running tpch Queries

This topic describes how to query tpch data in the HeatWave Cluster. After tables are loaded into the HeatWave Cluster, queries that qualify are automatically offloaded to the HeatWave Cluster for accelerated processing. To run queries:

  1. Start MySQL Shell and connect to the MySQL DB System's endpoint:

    $> mysqlsh Username@DBSystem_IP_Address_or_Host_Name

    MySQL Shell opens in JavaScript execution mode by default.

    MySQL>JS>
  2. Change the MySQL Shell execution mode to SQL:

    MySQL>JS> \sql
  3. Change to the tpch database:

    MySQL>SQL> USE tpch;
    Default schema set to `tpch`.Fetching table and column names from `tpch` for 
    auto-completion... Press ^C to stop.
  4. Before running a query, use EXPLAIN to verify that the query can be offloaded to the HeatWave Cluster. For example:

    MySQL>SQL> EXPLAIN SELECT SUM(l_extendedprice * l_discount) AS revenue 
               FROM lineitem WHERE l_shipdate >= date '1994-01-01'\G
    *************************** 1. row ***************************
               id: 1
      select_type: SIMPLE
            table: lineitem
       partitions: NULL
             type: ALL
    possible_keys: NULL
              key: NULL
          key_len: NULL
              ref: NULL
             rows: 56834662
         filtered: 33.33
            Extra: Using where; Using secondary engine RAPID

    If the query can be offloaded, the Extra column in the EXPLAIN output reports "Using secondary engine RAPID".

  5. After verifying that the query can be offloaded, run the query and note the execution time.

    MySQL>SQL> SELECT SUM(l_extendedprice * l_discount) AS revenue 
               FROM lineitem WHERE l_shipdate >= date '1994-01-01';
    +------------------+
    | revenue          |
    +------------------+
    | 82752894454.9036 |
    +------------------+
    1 row in set (0.04 sec)
  6. To compare the HeatWave execution time with MySQL DB System execution time, disable the use_secondary_engine variable to see how long it takes to run the same query on the MySQL DB System. For example:

    MySQL>SQL> SET SESSION use_secondary_engine=OFF;
    MySQL>SQL> SELECT SUM(l_extendedprice * l_discount) AS revenue FROM lineitem
               WHERE l_shipdate >= date '1994-01-01';
    +------------------+
    | revenue          |
    +------------------+
    | 82752894454.9036 |
    +------------------+
    1 row in set (24.20 sec)

    For other tpch sample database queries that you can run, see Additional tpch Queries. For more information about running queries, refer to Section 2.3, “Running Queries”.

Additional tpch Queries

This topic provides additional tpch queries that you can run to test the HeatWave Cluster.

  • TPCH-Q1: Pricing Summary Report Query

    As described in the TPC Benchmark™ H (TPC-H) specification: "The Pricing Summary Report Query provides a summary pricing report for all lineitems shipped as of a given date. The date is within 60 - 120 days of the greatest ship date contained in the database. The query lists totals for extended price, discounted extended price, discounted extended price plus tax, average quantity, average extended price, and average discount. These aggregates are grouped by RETURNFLAG and LINESTATUS, and listed in ascending order of RETURNFLAG and LINESTATUS. A count of the number of lineitems in each group is included."

    SELECT
        l_returnflag,
        l_linestatus,
        SUM(l_quantity) AS sum_qty,
        SUM(l_extendedprice) AS sum_base_price,
        SUM(l_extendedprice * (1 - l_discount)) AS sum_disc_price,
        SUM(l_extendedprice * (1 - l_discount) * (1 + l_tax)) AS sum_charge,
        AVG(l_quantity) AS avg_qty,
        AVG(l_extendedprice) AS avg_price,
        AVG(l_discount) AS avg_disc,
        COUNT(*) AS count_order
    FROM
        lineitem
    WHERE
        l_shipdate <= DATE '1998-12-01' - INTERVAL '90' DAY
    GROUP BY l_returnflag , l_linestatus
    ORDER BY l_returnflag , l_linestatus;
  • TPCH-Q3: Shipping Priority Query

    As described in the TPC Benchmark™ H (TPC-H) specification: "The Shipping Priority Query retrieves the shipping priority and potential revenue, defined as the sum of l_extendedprice * (1-l_discount), of the orders having the largest revenue among those that had not been shipped as of a given date. Orders are listed in decreasing order of revenue. If more than 10 unshipped orders exist, only the 10 orders with the largest revenue are listed."

    SELECT
        l_orderkey,
        SUM(l_extendedprice * (1 - l_discount)) AS revenue,
        o_orderdate,
        o_shippriority
    FROM
        customer,
        orders,
        lineitem
    WHERE
        c_mktsegment = 'BUILDING'
            AND c_custkey = o_custkey
            AND l_orderkey = o_orderkey
            AND o_orderdate < DATE '1995-03-15'
            AND l_shipdate > DATE '1995-03-15'
    GROUP BY l_orderkey , o_orderdate , o_shippriority
    ORDER BY revenue DESC , o_orderdate
    LIMIT 10;
  • TPCH-Q9: Product Type Profit Measure Query

    As described in the TPC Benchmark™ H (TPC-H) specification: "The Product Type Profit Measure Query finds, for each nation and each year, the profit for all parts ordered in that year that contain a specified substring in their names and that were filled by a supplier in that nation. The profit is defined as the sum of [(l_extendedprice*(1-l_discount)) - (ps_supplycost * l_quantity)] for all lineitems describing parts in the specified line. The query lists the nations in ascending alphabetical order and, for each nation, the year and profit in descending order by year (most recent first). "

    SELECT
        nation, o_year, SUM(amount) AS sum_profit
    FROM
        (SELECT
            n_name AS nation,
                YEAR(o_ORDERdate) AS o_year,
                l_extendedprice * (1 - l_discount) - ps_supplycost * l_quantity AS amount
        FROM
            part
        STRAIGHT_JOIN partsupp
        STRAIGHT_JOIN lineitem
        STRAIGHT_JOIN supplier
        STRAIGHT_JOIN orders
        STRAIGHT_JOIN nation
        WHERE
            s_suppkey = l_suppkey
                AND ps_suppkey = l_suppkey
                AND ps_partkey = l_partkey
                AND p_partkey = l_partkey
                AND o_ORDERkey = l_ORDERkey
                AND s_nationkey = n_nationkey
                AND p_name LIKE '%green%') AS profit
    GROUP BY nation , o_year
    ORDER BY nation , o_year DESC;

Unloading tpch Tables

tpch tables can be unloaded from HeatWave using the following statements:

USE tpch;

ALTER TABLE customer SECONDARY_UNLOAD;
ALTER TABLE lineitem SECONDARY_UNLOAD;
ALTER TABLE nation SECONDARY_UNLOAD;
ALTER TABLE orders SECONDARY_UNLOAD;
ALTER TABLE part SECONDARY_UNLOAD;
ALTER TABLE partsupp SECONDARY_UNLOAD;
ALTER TABLE region SECONDARY_UNLOAD;
ALTER TABLE supplier SECONDARY_UNLOAD;