MySQL includes a bulk load extension to the
LOAD DATA
statement that is only
available in HeatWave MySQL. It adds the following features:
The optimized loading of data sorted by primary key.
The optimized loading of unsorted data.
The optimized loading of data from an object store.
The optimized loading of data from a series of files.
MySQL 8.4.0 adds the following features:
Loading a MySQL Shell dump file.
Loading ZSTD compressed CSV files.
Monitor bulk load progress with the Performance Schema.
MySQL 9.0.0 adds the following features:
Large data support and additional data types.
Use a second session to monitor bulk load progress:
If the data is sorted, there is a single stage:
loading
.If the data is unsorted, there are two stages:
sorting
andloading
.
See: Section 2.18.8, “Bulk Ingest Data to MySQL Server Limitations”.
LOAD DATA
with
ALGORITHM=BULK
supports the following data
types:
As of MySQL 9.0.0, LOAD DATA
with ALGORITHM=BULK
supports the following
data types:
BIT
.ENUM
.JSON
.SET
.YEAR
.TINYBLOB
,BLOB
,MEDIUMBLOB
, andLONGBLOB
.TINYTEXT
,TEXT
,MEDIUMTEXT
, andLONGTEXT
.GEOMETRY
,GEOMETRYCOLLECTION
,POINT
,MULTIPOINT
,LINESTRING
,MULTILINESTRING
,POLYGON
, andMULTIPOLYGON
,
For the data types that HeatWave supports, see: Section 2.10, “Supported Data Types”
See: Section 2.18.8, “Bulk Ingest Data to MySQL Server Limitations”.
mysql> LOAD DATA
[LOW_PRIORITY | CONCURRENT]
[FROM]
INFILE | URL | S3 'file_prefix' | 'options' [COUNT N]
[IN PRIMARY KEY ORDER]
INTO TABLE tbl_name
[CHARACTER SET charset_name] [COMPRESSION = {'ZSTD'}]
[{FIELDS | COLUMNS}
[TERMINATED BY 'string']
[[OPTIONALLY] ENCLOSED BY 'char']
[ESCAPED BY 'char']
]
[LINES
[TERMINATED BY 'string']
]
[IGNORE number {LINES | ROWS}]
[PARALLEL = number]
[MEMORY = M]
[ALGORITHM = BULK]
options: {
JSON_OBJECT("key","value"[,"key","value"] ...)
"key","value": {
"url-prefix","prefix"
["url-sequence-start",0]
["url-suffix","suffix"]
["url-prefix-last-append","@"]
["is-dryrun",{true|false}]
}
}
The additional LOAD DATA
clauses are:
FROM
: Makes the statement more readable.URL
: A URL accessible with a HTTP GET request.-
S3
: The AWS S3 file location.This requires the user privilege
LOAD_FROM_S3
.See: Section 2.18.8, “Bulk Ingest Data to MySQL Server Limitations”.
-
COUNT
: The number of files in a series of files.For
COUNT 5
andfile_prefix
set todata.csv.
, the five files would be:data.csv.1
,data.csv.2
,data.csv.3
,data.csv.4
, anddata.csv.5
. -
IN PRIMARY KEY ORDER
: Use when the data is already sorted. The values should be in ascending order within the file.For a file series, the primary keys in each file must be disjoint and in ascending order from one file to the next.
-
PARALLEL
: The number of concurrent threads to use. A typical value might be 16, 32 or 48. The default value is 16.PARALLEL
does not requireCONCURRENT
. MEMORY
: The amount of memory to use. A typical value might be 512M or 4G. The default value is 1G.ALGORITHM
: Set toBULK
for bulk load. The file format is CSV.COMPRESSION
: The file compression algorithm. Bulk load supports the ZSTD algorithm.-
options
is a JSON object literal that includes:url-prefix
: The common URL prefix for the files to load.-
url-sequence-start
: The sequence number for the first file.The default value is 1, and the minimum value is 0. The value cannot be a negative number. The value can be a string or a number, for example, "134", or "default".
url-suffix
: The file suffix.-
url-prefix-last-append
: The string to append to the prefix of the last file.This supports MySQL Shell dump files.
-
is-dryrun
: Set totrue
to run basic checks and report if bulk load is possible on the given table. The default value isfalse
.To enable
is-dryrun
, use any of the following values:true
,"true"
,"1"
,"on"
or1
.To disable
is-dryrun
, use any of the following values:false
,"false"
,"0"
,"off"
or0
.
LOAD DATA
with
ALGORITHM=BULK
does not support these
clauses:
LOAD DATA
[LOCAL]
[REPLACE | IGNORE]
[PARTITION (partition_name [, partition_name] ...)]
]
[LINES
[STARTING BY 'string']
]
[(col_name_or_user_var
[, col_name_or_user_var] ...)]
[SET col_name={expr | DEFAULT}
[, col_name={expr | DEFAULT}] ...]
-
An example that loads unsorted data from AWS S3 with 48 concurrent threads and 4G of memory:
mysql> GRANT LOAD_FROM_S3 ON *.* TO load_user@localhost; mysql> LOAD DATA FROM S3 's3-us-east-1://innodb-bulkload-dev-1/lineitem.tbl' INTO TABLE lineitem FIELDS TERMINATED BY "|" OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\n' PARALLEL = 48 MEMORY = 4G ALGORITHM=BULK;
-
An example that loads eight files of sorted data from AWS S3. The
file_prefix
ends with a period. The files arelineitem.tbl.1
,lineitem.tbl.2
, ...lineitem.tbl.8
:mysql> GRANT LOAD_FROM_S3 ON *.* TO load_user@localhost; mysql> LOAD DATA FROM S3 's3-us-east-1://innodb-bulkload-dev-1/lineitem.tbl.' COUNT 8 IN PRIMARY KEY ORDER INTO TABLE lineitem FIELDS TERMINATED BY "|" OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\n' ALGORITHM=BULK;
-
An example that performs a dry run on a sequence of MySQL Shell dump files compressed with the ZSTD algorithm:
mysql> GRANT LOAD_FROM_URL ON *.* TO load_user@localhost; mysql> LOAD DATA FROM URL '{"url-prefix","https://example.com/bucket/test@lineitem@","url-sequence-start",0,"url-suffix",".tsv.zst","url-prefix-last-append","@","is-dryrun",true}' COUNT 20 INTO TABLE lineitem CHARACTER SET ???? COMPRESSION = {'ZSTD'} FIELDS TERMINATED BY "|" OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\n' IGNORE 20000 LINES ALGORITHM=BULK;
-
An example that monitors bulk load progress in a second session.
-
Review the list of stages with the following query:
mysql> SELECT NAME, ENABLED, TIMED FROM performance_schema.setup_instruments WHERE ENABLED='YES' AND NAME LIKE "stage/bulk_load%";
-
Enable the
events_stages_current
with the following query:mysql> UPDATE performance_schema.setup_consumers SET ENABLED = 'YES' WHERE NAME LIKE 'events_stages_current';
-
Use one session to run bulk load, and monitor progress in a second session:
mysql> SELECT thread_id, event_id, event_name, WORK_ESTIMATED, WORK_COMPLETED FROM performance_schema.events_stages_current; -------------- SELECT thread_id, event_id, event_name, WORK_ESTIMATED, WORK_COMPLETED FROM performance_schema.events_stages_current -------------- +-----------+----------+----------------------------------+----------------+----------------+ | thread_id | event_id | event_name | WORK_ESTIMATED | WORK_COMPLETED | +-----------+----------+----------------------------------+----------------+----------------+ | 49 | 5 | stage/bulk_load_unsorted/sorting | 1207551343 | 583008145 | +-----------+----------+----------------------------------+----------------+----------------+ 1 row in set (0.00 sec)
-