The tpch Analytics Quickstart shows how to import data into the MySQL DB System using the MySQL Shell Parallel Table Import Utility, manually loading data into HeatWave, and running queries.
This quickstart contains the following sections:
Review the HeatWave Quickstart Requirements.
Examples in this Quickstart use the tpch
sample database, which is an ad-hoc decision support database
derived from the TPC
Benchmark™ H (TPC-H) specification. For an overview of
the tpch
schema, refer to the
Logical Database Design section of the
specification
document.
The following instructions describe how to generate
tpch
sample data using the
dbgen
utility. The instructions assume you
are on a Linux system that has gcc
and
make
libraries installed.
To generate tpch
sample data:
Download the TPC-H tools zip file from TPC Download Current.
Extract the zip file to a location on your system.
-
Change to the
dbgen
directory and make a copy of the makefile template.$> cd 2.18.0/dbgen $> cp makefile.suite makefile
-
Configure the following settings in the makefile:
$> CC = gcc $> DATABASE= ORACLE $> MACHINE = LINUX $> WORKLOAD = TPCH
-
Run make to build the
dbgen
utility:$> make
-
Issue the following
dbgen
command to generate a 1GB set of data files for thetpch
database:$> ./dbgen -s 1
The operation may take a few minutes. When finished, the following data files appear in the working directory, one for each table in the
tpch
database:$> ls -1 *.tbl customer.tbl lineitem.tbl nation.tbl orders.tbl partsupp.tbl part.tbl region.tbl supplier.tbl
This topic describes how to create the tpch sample database on the MySQL DB System and import the sample data. The sample data must be available on the MySQL DB System before it can be loaded into the HeatWave Cluster.
Sample database creation and import operations are performed
using MySQL Shell. The MySQL Shell Parallel Table Import
Utility provides fast data import for large data files. The
utility analyzes an input data file, divides it into chunks, and
uploads the chunks to the target MySQL DB System using parallel
connections. The utility is capable of completing a large data
import many times faster than a standard single-threaded upload
using a LOAD DATA
statement. For additional
information, see
Parallel Table Import Utility.
To create the tpch sample database on the MySQL DB System and import data:
-
Start MySQL Shell and connect to the MySQL DB System endpoint:
$> mysqlsh --mysql Username@DBSystem_IP_Address_or_Host_Name
The
--mysql
option opens aClassicSession
, which is required when using the MySQL Shell Parallel Table Import Utility.MySQL Shell opens in JavaScript execution mode by default.
MySQL>JS>
-
Change the MySQL Shell execution mode from JavaScript to SQL:
MySQL>JS> \sql
-
Create the
tpch
sample database and tables:mysql> CREATE DATABASE tpch character set utf8mb4; mysql> USE tpch; mysql> CREATE TABLE nation ( N_NATIONKEY INTEGER primary key, N_NAME CHAR(25) NOT NULL, N_REGIONKEY INTEGER NOT NULL, N_COMMENT VARCHAR(152)); mysql> CREATE TABLE region ( R_REGIONKEY INTEGER primary key, R_NAME CHAR(25) NOT NULL, R_COMMENT VARCHAR(152)); mysql> CREATE TABLE part ( P_PARTKEY INTEGER primary key, P_NAME VARCHAR(55) NOT NULL, P_MFGR CHAR(25) NOT NULL, P_BRAND CHAR(10) NOT NULL, P_TYPE VARCHAR(25) NOT NULL, P_SIZE INTEGER NOT NULL, P_CONTAINER CHAR(10) NOT NULL, P_RETAILPRICE DECIMAL(15,2) NOT NULL, P_COMMENT VARCHAR(23) NOT NULL ); mysql> CREATE TABLE supplier ( S_SUPPKEY INTEGER primary key, S_NAME CHAR(25) NOT NULL, S_ADDRESS VARCHAR(40) NOT NULL, S_NATIONKEY INTEGER NOT NULL, S_PHONE CHAR(15) NOT NULL, S_ACCTBAL DECIMAL(15,2) NOT NULL, S_COMMENT VARCHAR(101) NOT NULL); mysql> CREATE TABLE partsupp ( PS_PARTKEY INTEGER NOT NULL, PS_SUPPKEY INTEGER NOT NULL, PS_AVAILQTY INTEGER NOT NULL, PS_SUPPLYCOST DECIMAL(15,2) NOT NULL, PS_COMMENT VARCHAR(199) NOT NULL, primary key (ps_partkey, ps_suppkey) ); mysql> CREATE TABLE customer ( C_CUSTKEY INTEGER primary key, C_NAME VARCHAR(25) NOT NULL, C_ADDRESS VARCHAR(40) NOT NULL, C_NATIONKEY INTEGER NOT NULL, C_PHONE CHAR(15) NOT NULL, C_ACCTBAL DECIMAL(15,2) NOT NULL, C_MKTSEGMENT CHAR(10) NOT NULL, C_COMMENT VARCHAR(117) NOT NULL); mysql> CREATE TABLE orders ( O_ORDERKEY INTEGER primary key, O_CUSTKEY INTEGER NOT NULL, O_ORDERSTATUS CHAR(1) NOT NULL, O_TOTALPRICE DECIMAL(15,2) NOT NULL, O_ORDERDATE DATE NOT NULL, O_ORDERPRIORITY CHAR(15) NOT NULL, O_CLERK CHAR(15) NOT NULL, O_SHIPPRIORITY INTEGER NOT NULL, O_COMMENT VARCHAR(79) NOT NULL); mysql> CREATE TABLE lineitem ( L_ORDERKEY INTEGER NOT NULL, L_PARTKEY INTEGER NOT NULL, L_SUPPKEY INTEGER NOT NULL, L_LINENUMBER INTEGER NOT NULL, L_QUANTITY DECIMAL(15,2) NOT NULL, L_EXTENDEDPRICE DECIMAL(15,2) NOT NULL, L_DISCOUNT DECIMAL(15,2) NOT NULL, L_TAX DECIMAL(15,2) NOT NULL, L_RETURNFLAG CHAR(1) NOT NULL, L_LINESTATUS CHAR(1) NOT NULL, L_SHIPDATE DATE NOT NULL, L_COMMITDATE DATE NOT NULL, L_RECEIPTDATE DATE NOT NULL, L_SHIPINSTRUCT CHAR(25) NOT NULL, L_SHIPMODE CHAR(10) NOT NULL, L_COMMENT VARCHAR(44) NOT NULL, primary key(L_ORDERKEY,L_LINENUMBER));
-
Verify that the
tpch
schema and tables were created:MySQL>SQL> SHOW TABLES; +----------------+ | Tables_in_tpch | +----------------+ | customer | | lineitem | | nation | | orders | | part | | partsupp | | region | | supplier | +----------------+
-
Change back to JavaScript execution mode to use the Parallel Table Import Utility:
MySQL>SQL> \js
-
Execute the following operations to import the data into the
tpch
database on the MySQL DB System.NoteFor information about the
util.importTable()
options used in the following commands, see Parallel Table Import Utility. The number of parallel threads specified using thethreads
option depends on the number of CPU cores of the shape. It is assumed that sample data fields are terminated by the pipe "|
" character.MySQL>JS> util.importTable("nation.tbl",{table: "nation", fieldsTerminatedBy:"|", bytesPerChunk:"100M", threads:16}) MySQL>JS> util.importTable("region.tbl",{table: "region", fieldsTerminatedBy:"|", bytesPerChunk:"100M", threads:16}) MySQL>JS> util.importTable("part.tbl",{table: "part", fieldsTerminatedBy:"|", bytesPerChunk:"100M", threads:16}) MySQL>JS> util.importTable("supplier.tbl",{table: "supplier", fieldsTerminatedBy:"|" , bytesPerChunk:"100M", threads:16}) MySQL>JS> util.importTable("partsupp.tbl",{table: "partsupp", fieldsTerminatedBy:"|", bytesPerChunk:"100M", threads:16}) MySQL>JS> util.importTable("customer.tbl",{table: "customer", fieldsTerminatedBy:"|", bytesPerChunk:"100M", threads:16}) MySQL>JS> util.importTable("orders.tbl",{table: "orders", fieldsTerminatedBy:"|", bytesPerChunk:"100M", threads:16}) MySQL>JS> util.importTable("lineitem.tbl",{table: "lineitem", fieldsTerminatedBy:"|", bytesPerChunk:"100M", threads:16})
To load the tpch
sample data into the
HeatWave Cluster:
For HeatWave on AWS, load data into HeatWave using the HeatWave Console. See Manage Data in HeatWave with Workspaces in the HeatWave on AWS Service Guide.
-
Start MySQL Shell and connect to the MySQL DB System endpoint:
$> mysqlsh Username@DBSystem_IP_Address_or_Host_Name
MySQL Shell opens in JavaScript execution mode by default.
MySQL>JS>
-
Change the MySQL Shell execution mode to SQL:
MySQL>JS> \sql
-
Execute the following operations to prepare the
tpch
sample database tables and load them into the HeatWave Cluster. The operations performed include defining string column encodings, defining the secondary engine, and executingSECONDARY_LOAD
operations.mysql> USE tpch; mysql> ALTER TABLE nation modify `N_NAME` CHAR(25) NOT NULL COMMENT 'RAPID_COLUMN=ENCODING=SORTED'; mysql> ALTER TABLE nation modify `N_COMMENT` VARCHAR(152) COMMENT 'RAPID_COLUMN=ENCODING=SORTED'; mysql> ALTER TABLE nation SECONDARY_ENGINE=RAPID; mysql> ALTER TABLE nation SECONDARY_LOAD; mysql> ALTER TABLE region modify `R_NAME` CHAR(25) NOT NULL COMMENT 'RAPID_COLUMN=ENCODING=SORTED'; mysql> ALTER TABLE region modify `R_COMMENT` VARCHAR(152) COMMENT 'RAPID_COLUMN=ENCODING=SORTED'; mysql> ALTER TABLE region SECONDARY_ENGINE=RAPID; mysql> ALTER TABLE region SECONDARY_LOAD; mysql> ALTER TABLE part modify `P_MFGR` CHAR(25) NOT NULL COMMENT 'RAPID_COLUMN=ENCODING=SORTED'; mysql> ALTER TABLE part modify `P_BRAND` CHAR(10) NOT NULL COMMENT 'RAPID_COLUMN=ENCODING=SORTED'; mysql> ALTER TABLE part modify `P_CONTAINER` CHAR(10) NOT NULL COMMENT 'RAPID_COLUMN=ENCODING=SORTED'; mysql> ALTER TABLE part modify `P_COMMENT` VARCHAR(23) NOT NULL COMMENT 'RAPID_COLUMN=ENCODING=SORTED'; mysql> ALTER TABLE part SECONDARY_ENGINE=RAPID; mysql> ALTER TABLE part SECONDARY_LOAD; mysql> ALTER TABLE supplier modify `S_NAME` CHAR(25) NOT NULL COMMENT 'RAPID_COLUMN=ENCODING=SORTED'; mysql> ALTER TABLE supplier modify `S_ADDRESS` VARCHAR(40) NOT NULL COMMENT 'RAPID_COLUMN=ENCODING=SORTED'; mysql> ALTER TABLE supplier modify `S_PHONE` CHAR(15) NOT NULL COMMENT 'RAPID_COLUMN=ENCODING=SORTED'; mysql> ALTER TABLE supplier SECONDARY_ENGINE=RAPID; mysql> ALTER TABLE supplier SECONDARY_LOAD; mysql> ALTER TABLE partsupp modify `PS_COMMENT` VARCHAR(199) NOT NULL COMMENT 'RAPID_COLUMN=ENCODING=SORTED'; mysql> ALTER TABLE partsupp SECONDARY_ENGINE=RAPID; mysql> ALTER TABLE partsupp SECONDARY_LOAD; mysql> ALTER TABLE customer modify `C_NAME` VARCHAR(25) NOT NULL COMMENT 'RAPID_COLUMN=ENCODING=SORTED'; mysql> ALTER TABLE customer modify `C_ADDRESS` VARCHAR(40) NOT NULL COMMENT 'RAPID_COLUMN=ENCODING=SORTED'; mysql> ALTER TABLE customer modify `C_MKTSEGMENT` CHAR(10) NOT NULL COMMENT 'RAPID_COLUMN=ENCODING=SORTED'; mysql> ALTER TABLE customer modify `C_COMMENT` VARCHAR(117) NOT NULL COMMENT 'RAPID_COLUMN=ENCODING=SORTED'; mysql> ALTER TABLE customer SECONDARY_ENGINE=RAPID; mysql> ALTER TABLE customer SECONDARY_LOAD; mysql> ALTER TABLE orders modify `O_ORDERSTATUS` CHAR(1) NOT NULL COMMENT 'RAPID_COLUMN=ENCODING=SORTED'; mysql> ALTER TABLE orders modify `O_ORDERPRIORITY` CHAR(15) NOT NULL COMMENT 'RAPID_COLUMN=ENCODING=SORTED'; mysql> ALTER TABLE orders modify `O_CLERK` CHAR(15) NOT NULL COMMENT 'RAPID_COLUMN=ENCODING=SORTED'; mysql> ALTER TABLE orders SECONDARY_ENGINE=RAPID; mysql> ALTER TABLE orders SECONDARY_LOAD; mysql> ALTER TABLE lineitem modify `L_RETURNFLAG` CHAR(1) NOT NULL COMMENT 'RAPID_COLUMN=ENCODING=SORTED'; mysql> ALTER TABLE lineitem modify `L_LINESTATUS` CHAR(1) NOT NULL COMMENT 'RAPID_COLUMN=ENCODING=SORTED'; mysql> ALTER TABLE lineitem modify `L_SHIPINSTRUCT` CHAR(25) NOT NULL COMMENT 'RAPID_COLUMN=ENCODING=SORTED'; mysql> ALTER TABLE lineitem modify `L_SHIPMODE` CHAR(10) NOT NULL COMMENT 'RAPID_COLUMN=ENCODING=SORTED'; mysql> ALTER TABLE lineitem modify `L_COMMENT` VARCHAR(44) NOT NULL COMMENT 'RAPID_COLUMN=ENCODING=SORTED'; mysql> ALTER TABLE lineitem SECONDARY_ENGINE=RAPID; mysql> ALTER TABLE lineitem SECONDARY_LOAD;
-
Verify that the
tpch
sample database tables are loaded in the HeatWave Cluster by queryingLOAD_STATUS
data from the HeatWave Performance Schema tables. Loaded tables have anAVAIL_RPDGSTABSTATE
load status.MySQL>SQL> USE performance_schema; MySQL>SQL> SELECT NAME, LOAD_STATUS FROM rpd_tables,rpd_table_id WHERE rpd_tables.ID = rpd_table_id.ID; +------------------------------+---------------------+ | NAME | LOAD_STATUS | +------------------------------+---------------------+ | tpch.supplier | AVAIL_RPDGSTABSTATE | | tpch.partsupp | AVAIL_RPDGSTABSTATE | | tpch.orders | AVAIL_RPDGSTABSTATE | | tpch.lineitem | AVAIL_RPDGSTABSTATE | | tpch.customer | AVAIL_RPDGSTABSTATE | | tpch.nation | AVAIL_RPDGSTABSTATE | | tpch.region | AVAIL_RPDGSTABSTATE | | tpch.part | AVAIL_RPDGSTABSTATE | +------------------------------+---------------------+
This topic describes how to query tpch
data
in the HeatWave Cluster. After tables are loaded into the
HeatWave Cluster, queries that qualify are automatically offloaded to
the HeatWave Cluster for accelerated processing. To run queries:
For HeatWave on AWS, run queries from the Query Editor in the HeatWave Console. See Running Queries in the HeatWave on AWS Service Guide.
-
Start MySQL Shell and connect to the MySQL DB System endpoint:
$> mysqlsh Username@DBSystem_IP_Address_or_Host_Name
MySQL Shell opens in JavaScript execution mode by default.
MySQL>JS>
-
Change the MySQL Shell execution mode to SQL:
MySQL>JS> \sql
-
Change to the
tpch
database:MySQL>SQL> USE tpch; Default schema set to `tpch`.Fetching table and column names from `tpch` for auto-completion... Press ^C to stop.
-
Before running a query, use
EXPLAIN
to verify that the query can be offloaded to the HeatWave Cluster. For example:MySQL>SQL> EXPLAIN SELECT SUM(l_extendedprice * l_discount) AS revenue FROM lineitem WHERE l_shipdate >= date '1994-01-01'; *************************** 1. row *************************** id: 1 select_type: SIMPLE table: lineitem partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 56834662 filtered: 33.33 Extra: Using where; Using secondary engine RAPID
If the query can be offloaded, the
Extra
column in theEXPLAIN
output reportsUsing secondary engine RAPID
. -
After verifying that the query can be offloaded, run the query and note the execution time.
MySQL>SQL> SELECT SUM(l_extendedprice * l_discount) AS revenue FROM lineitem WHERE l_shipdate >= date '1994-01-01'; +------------------+ | revenue | +------------------+ | 82752894454.9036 | +------------------+ 1 row in set (0.04 sec)
-
To compare the HeatWave execution time with MySQL DB System execution time, disable the
use_secondary_engine
variable to see how long it takes to run the same query on the MySQL DB System. For example:MySQL>SQL> SET SESSION use_secondary_engine=OFF; MySQL>SQL> SELECT SUM(l_extendedprice * l_discount) AS revenue FROM lineitem WHERE l_shipdate >= date '1994-01-01'; +------------------+ | revenue | +------------------+ | 82752894454.9036 | +------------------+ 1 row in set (24.20 sec)
For other
tpch
sample database queries that you can run, see Additional tpch Queries. For more information about running queries, refer to Section 2.3, “Running Queries”.
This topic provides additional tpch queries that you can run to test the HeatWave Cluster.
-
TPCH-Q1: Pricing Summary Report Query
As described in the TPC Benchmark™ H (TPC-H) specification: "The Pricing Summary Report Query provides a summary pricing report for all lineitems shipped as of a given date. The date is within 60 - 120 days of the greatest ship date contained in the database. The query lists totals for extended price, discounted extended price, discounted extended price plus tax, average quantity, average extended price, and average discount. These aggregates are grouped by
RETURNFLAG
andLINESTATUS
, and listed in ascending order ofRETURNFLAG
andLINESTATUS
. A count of the number of lineitems in each group is included."mysql> SELECT l_returnflag, l_linestatus, SUM(l_quantity) AS sum_qty, SUM(l_extendedprice) AS sum_base_price, SUM(l_extendedprice * (1 - l_discount)) AS sum_disc_price, SUM(l_extendedprice * (1 - l_discount) * (1 + l_tax)) AS sum_charge, AVG(l_quantity) AS avg_qty, AVG(l_extendedprice) AS avg_price, AVG(l_discount) AS avg_disc, COUNT(*) AS count_order FROM lineitem WHERE l_shipdate <= DATE '1998-12-01' - INTERVAL '90' DAY GROUP BY l_returnflag , l_linestatus ORDER BY l_returnflag , l_linestatus;
-
TPCH-Q3: Shipping Priority Query
As described in the TPC Benchmark™ H (TPC-H) specification: "The Shipping Priority Query retrieves the shipping priority and potential revenue, defined as the sum of
l_extendedprice * (1-l_discount)
, of the orders having the largest revenue among those that had not been shipped as of a given date. Orders are listed in decreasing order of revenue. If more than 10 unshipped orders exist, only the 10 orders with the largest revenue are listed."mysql> SELECT l_orderkey, SUM(l_extendedprice * (1 - l_discount)) AS revenue, o_orderdate, o_shippriority FROM customer, orders, lineitem WHERE c_mktsegment = 'BUILDING' AND c_custkey = o_custkey AND l_orderkey = o_orderkey AND o_orderdate < DATE '1995-03-15' AND l_shipdate > DATE '1995-03-15' GROUP BY l_orderkey , o_orderdate , o_shippriority ORDER BY revenue DESC , o_orderdate LIMIT 10;
-
TPCH-Q9: Product Type Profit Measure Query
As described in the TPC Benchmark™ H (TPC-H) specification: "The Product Type Profit Measure Query finds, for each nation and each year, the profit for all parts ordered in that year that contain a specified substring in their names and that were filled by a supplier in that nation. The profit is defined as the sum of
[(l_extendedprice*(1-l_discount)) - (ps_supplycost * l_quantity)]
for all lineitems describing parts in the specified line. The query lists the nations in ascending alphabetical order and, for each nation, the year and profit in descending order by year (most recent first). "mysql> SELECT nation, o_year, SUM(amount) AS sum_profit FROM (SELECT n_name AS nation, YEAR(o_ORDERdate) AS o_year, l_extendedprice * (1 - l_discount) - ps_supplycost * l_quantity AS amount FROM part STRAIGHT_JOIN partsupp STRAIGHT_JOIN lineitem STRAIGHT_JOIN supplier STRAIGHT_JOIN orders STRAIGHT_JOIN nation WHERE s_suppkey = l_suppkey AND ps_suppkey = l_suppkey AND ps_partkey = l_partkey AND p_partkey = l_partkey AND o_ORDERkey = l_ORDERkey AND s_nationkey = n_nationkey AND p_name LIKE '%green%') AS profit GROUP BY nation , o_year ORDER BY nation , o_year DESC;
tpch
tables can be unloaded from HeatWave
using the following statements:
For HeatWave on AWS, unload data from HeatWave using the HeatWave Console. See Manage Data in HeatWave with Workspaces in the HeatWave on AWS Service Guide.
mysql> USE tpch;
mysql> ALTER TABLE customer SECONDARY_UNLOAD;
mysql> ALTER TABLE lineitem SECONDARY_UNLOAD;
mysql> ALTER TABLE nation SECONDARY_UNLOAD;
mysql> ALTER TABLE orders SECONDARY_UNLOAD;
mysql> ALTER TABLE part SECONDARY_UNLOAD;
mysql> ALTER TABLE partsupp SECONDARY_UNLOAD;
mysql> ALTER TABLE region SECONDARY_UNLOAD;
mysql> ALTER TABLE supplier SECONDARY_UNLOAD;