Documentation Home
HeatWave User Guide
Related Documentation Download this Manual
PDF (US Ltr) - 2.2Mb
PDF (A4) - 2.2Mb


HeatWave User Guide  /  HeatWave MySQL  /  Bulk Ingest Data to MySQL Server

2.17 Bulk Ingest Data to MySQL Server

MySQL includes a bulk load extension to the LOAD DATA statement that is only available in HeatWave MySQL. It adds the following features:

  • The optimized loading of data sorted by primary key.

  • The optimized loading of unsorted data.

  • The optimized loading of data from an object store.

  • The optimized loading of data from a series of files.

MySQL 8.4.0 adds the following features:

  • Loading a MySQL Shell dump file.

  • Loading ZSTD compressed CSV files.

  • Monitor bulk load progress with the Performance Schema.

MySQL 9.0.0 adds the following features:

  • Large data support and additional data types.

MySQL 9.2.0 adds the following features:

  • Support for secondary indexes.

  • Improvements to CSV file parsing.

Use a second session to monitor bulk load progress:

  • If the data is sorted, there is a single stage: loading.

  • If the data is unsorted, there are two stages: sorting and loading.

See: Section 2.18.8, “Bulk Ingest Data to MySQL Server Limitations”.

Bulk Ingest Data Type Support

LOAD DATA with ALGORITHM=BULK supports the following data types:

As of MySQL 9.0.0, LOAD DATA with ALGORITHM=BULK supports the following data types:

For the data types that HeatWave supports, see: Section 2.10, “Supported Data Types”

See: Section 2.18.8, “Bulk Ingest Data to MySQL Server Limitations”.

Bulk Ingest Syntax

Press CTRL+C to copy
mysql> LOAD DATA [LOW_PRIORITY | CONCURRENT] [FROM] INFILE | URL | S3 'file_prefix' | 'options' [COUNT N] [IN PRIMARY KEY ORDER] INTO TABLE tbl_name [CHARACTER SET charset_name] [COMPRESSION = {'ZSTD'}] [{FIELDS | COLUMNS} [TERMINATED BY 'string'] [[OPTIONALLY] ENCLOSED BY 'char'] [ESCAPED BY 'char'] ] [LINES [TERMINATED BY 'string'] ] [IGNORE number {LINES | ROWS}] [PARALLEL = number] [MEMORY = M] [ALGORITHM = BULK] options: { JSON_OBJECT("key","value"[,"key","value"] ...) "key","value": { "url-prefix","prefix" ["url-sequence-start",0] ["url-suffix","suffix"] ["url-prefix-last-append","@"] ["is-dryrun",{true|false}] } }

The additional LOAD DATA clauses are:

  • FROM: Makes the statement more readable.

  • URL: A URL accessible with a HTTP GET request.

  • S3: The AWS S3 file location.

    This requires the user privilege LOAD_FROM_S3.

    See: Section 2.18.8, “Bulk Ingest Data to MySQL Server Limitations”.

  • COUNT: The number of files in a series of files.

    For COUNT 5 and file_prefix set to data.csv., the five files would be: data.csv.1, data.csv.2, data.csv.3, data.csv.4, and data.csv.5.

  • IN PRIMARY KEY ORDER: Use when the data is already sorted. The values should be in ascending order within the file.

    For a file series, the primary keys in each file must be disjoint and in ascending order from one file to the next.

  • PARALLEL: The number of concurrent threads to use. A typical value might be 16, 32 or 48. The default value is 16.

    PARALLEL does not require CONCURRENT.

  • MEMORY: The amount of memory to use. A typical value might be 512M or 4G. The default value is 1G.

  • ALGORITHM: Set to BULK for bulk load. The file format is CSV.

  • COMPRESSION: The file compression algorithm. Bulk load supports the ZSTD algorithm.

  • options is a JSON object literal that includes:

    • url-prefix: The common URL prefix for the files to load.

    • url-sequence-start: The sequence number for the first file.

      The default value is 1, and the minimum value is 0. The value cannot be a negative number. The value can be a string or a number, for example, "134", or "default".

    • url-suffix: The file suffix.

    • url-prefix-last-append: The string to append to the prefix of the last file.

      This supports MySQL Shell dump files.

    • is-dryrun: Set to true to run basic checks and report if bulk load is possible on the given table. The default value is false.

      To enable is-dryrun, use any of the following values: true, "true", "1", "on" or 1.

      To disable is-dryrun, use any of the following values: false, "false", "0", "off" or 0.

LOAD DATA with ALGORITHM=BULK does not support these clauses:

Press CTRL+C to copy
LOAD DATA [LOCAL] [REPLACE | IGNORE] [PARTITION (partition_name [, partition_name] ...)] ] [LINES [STARTING BY 'string'] ] [(col_name_or_user_var [, col_name_or_user_var] ...)] [SET col_name={expr | DEFAULT} [, col_name={expr | DEFAULT}] ...]

Syntax Examples

  • An example that loads unsorted data from AWS S3 with 48 concurrent threads and 4G of memory:

    Press CTRL+C to copy
    mysql> GRANT LOAD_FROM_S3 ON *.* TO load_user@localhost; mysql> LOAD DATA FROM S3 's3-us-east-1://innodb-bulkload-dev-1/lineitem.tbl' INTO TABLE lineitem FIELDS TERMINATED BY "|" OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\n' PARALLEL = 48 MEMORY = 4G ALGORITHM=BULK;
  • An example that loads eight files of sorted data from AWS S3. The file_prefix ends with a period. The files are lineitem.tbl.1, lineitem.tbl.2, ... lineitem.tbl.8:

    Press CTRL+C to copy
    mysql> GRANT LOAD_FROM_S3 ON *.* TO load_user@localhost; mysql> LOAD DATA FROM S3 's3-us-east-1://innodb-bulkload-dev-1/lineitem.tbl.' COUNT 8 IN PRIMARY KEY ORDER INTO TABLE lineitem FIELDS TERMINATED BY "|" OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\n' ALGORITHM=BULK;
  • An example that performs a dry run on a sequence of MySQL Shell dump files compressed with the ZSTD algorithm:

    Press CTRL+C to copy
    mysql> GRANT LOAD_FROM_URL ON *.* TO load_user@localhost; mysql> LOAD DATA FROM URL '{"url-prefix","https://example.com/bucket/test@lineitem@","url-sequence-start",0,"url-suffix",".tsv.zst","url-prefix-last-append","@","is-dryrun",true}' COUNT 20 INTO TABLE lineitem CHARACTER SET ???? COMPRESSION = {'ZSTD'} FIELDS TERMINATED BY "|" OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\n' IGNORE 20000 LINES ALGORITHM=BULK;
  • An example that monitors bulk load progress in a second session.

    • Review the list of stages with the following query:

      Press CTRL+C to copy
      mysql> SELECT NAME, ENABLED, TIMED FROM performance_schema.setup_instruments WHERE ENABLED='YES' AND NAME LIKE "stage/bulk_load%";
    • Enable the events_stages_current with the following query:

      Press CTRL+C to copy
      mysql> UPDATE performance_schema.setup_consumers SET ENABLED = 'YES' WHERE NAME LIKE 'events_stages_current';
    • Use one session to run bulk load, and monitor progress in a second session:

      Press CTRL+C to copy
      mysql> SELECT thread_id, event_id, event_name, WORK_ESTIMATED, WORK_COMPLETED FROM performance_schema.events_stages_current; -------------- SELECT thread_id, event_id, event_name, WORK_ESTIMATED, WORK_COMPLETED FROM performance_schema.events_stages_current -------------- +-----------+----------+----------------------------------+----------------+----------------+ | thread_id | event_id | event_name | WORK_ESTIMATED | WORK_COMPLETED | +-----------+----------+----------------------------------+----------------+----------------+ | 49 | 5 | stage/bulk_load_unsorted/sorting | 1207551343 | 583008145 | +-----------+----------+----------------------------------+----------------+----------------+ 1 row in set (0.00 sec)