MySQL includes a bulk load extension to the
      LOAD DATA statement. It can do the
      following:
    
- Optimize the loading of data sorted by primary key. 
- Optimize the loading of unsorted data. 
- Optimize the loading of data from an object store. 
- Optimize the loading of data from a series of files. 
- Load a MySQL Shell dump file. 
- Load ZSTD compressed CSV files. 
- Monitor bulk load progress with the Performance Schema. 
- Large data support. 
Use a second session to monitor bulk load progress:
- If the data is sorted, there is a single stage: - loading.
- If the data is unsorted, there are two stages: - sortingand- loading.
        LOAD DATA with
        ALGORITHM=BULK supports tables with at least
        one column with the VECTOR data
        type. If you attempt to load a table without at least one column
        with the VECTOR data type, an
        error occurs.
      
        In addition to the requirement to have at least one
        VECTOR column,
        LOAD DATA with
        ALGORITHM=BULK supports the following data
        types:
      
mysql> LOAD DATA
  [LOW_PRIORITY | CONCURRENT]
  [FROM]
  INFILE | URL | S3 'file_prefix' | 'options' [COUNT N]
  [IN PRIMARY KEY ORDER]
  INTO TABLE tbl_name
  [CHARACTER SET charset_name] [COMPRESSION = {'ZSTD'}]
  [{FIELDS | COLUMNS}
      [TERMINATED BY 'string']
      [[OPTIONALLY] ENCLOSED BY 'char']
      [ESCAPED BY 'char']
  ]
  [LINES
      [TERMINATED BY 'string']
  ]
  [IGNORE number {LINES | ROWS}]
  [PARALLEL = number]
  [MEMORY = M]
  [ALGORITHM = BULK]
options: {
   JSON_OBJECT("key","value"[,"key","value"] ...)
        "key","value": {
        "url-prefix","prefix"
        ["url-sequence-start",0]
        ["url-suffix","suffix"]
        ["url-prefix-last-append","@"]
        ["is-dryrun",{true|false}]
        }
}
        The additional LOAD DATA clauses
        are:
      
- FROM: Makes the statement more readable.
- URL: A URL accessible with a HTTP GET request.
- 
S3: The AWS S3 file location.This requires the user privilege LOAD_FROM_S3.
- 
COUNT: The number of files in a series of files.For COUNT 5andfile_prefixset todata.csv., the five files would be:data.csv.1,data.csv.2,data.csv.3,data.csv.4, anddata.csv.5.
- 
IN PRIMARY KEY ORDER: Use when the data is already sorted. The values should be in ascending order within the file.For a file series, the primary keys in each file must be disjoint and in ascending order from one file to the next. 
- 
PARALLEL: The number of concurrent threads to use. A typical value might be 16, 32 or 48. The default value is 16.PARALLELdoes not requireCONCURRENT.
- MEMORY: The amount of memory to use. A typical value might be 512M or 4G. The default value is 1G.
- ALGORITHM: Set to- BULKfor bulk load. The file format is CSV.
- COMPRESSION: The file compression algorithm. Bulk load supports the ZSTD algorithm.
- 
optionsis a JSON object literal that includes:- url-prefix: The common URL prefix for the files to load.
- 
url-sequence-start: The sequence number for the first file.The default value is 1, and the minimum value is 0. The value cannot be a negative number. The value can be a string or a number, for example, "134", or "default". 
- url-suffix: The file suffix.
- 
url-prefix-last-append: The string to append to the prefix of the last file.This supports MySQL Shell dump files. 
- 
is-dryrun: Set totrueto run basic checks and report if bulk load is possible on the given table. The default value isfalse.To enable is-dryrun, use any of the following values:true,"true","1","on"or1.To disable is-dryrun, use any of the following values:false,"false","0","off"or0.
 
        LOAD DATA with
        ALGORITHM=BULK does not support these
        clauses:
      
LOAD DATA
  [LOCAL]
  [REPLACE | IGNORE]
  [PARTITION (partition_name [, partition_name] ...)]
  ]
  [LINES
      [STARTING BY 'string']
  ]
  [(col_name_or_user_var
      [, col_name_or_user_var] ...)]
  [SET col_name={expr | DEFAULT}
      [, col_name={expr | DEFAULT}] ...]- 
An example that loads unsorted data from AWS S3 with 48 concurrent threads and 4G of memory: mysql> GRANT LOAD_FROM_S3 ON *.* TO load_user@localhost; mysql> LOAD DATA FROM S3 's3-us-east-1://innodb-bulkload-dev-1/lineitem.tbl' INTO TABLE lineitem FIELDS TERMINATED BY "|" OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\n' PARALLEL = 48 MEMORY = 4G ALGORITHM=BULK;
- 
An example that loads eight files of sorted data from AWS S3. The file_prefixends with a period. The files arelineitem.tbl.1,lineitem.tbl.2, ...lineitem.tbl.8:mysql> GRANT LOAD_FROM_S3 ON *.* TO load_user@localhost; mysql> LOAD DATA FROM S3 's3-us-east-1://innodb-bulkload-dev-1/lineitem.tbl.' COUNT 8 IN PRIMARY KEY ORDER INTO TABLE lineitem FIELDS TERMINATED BY "|" OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\n' ALGORITHM=BULK;
- 
An example that performs a dry run on a sequence of MySQL Shell dump files compressed with the ZSTD algorithm: mysql> GRANT LOAD_FROM_URL ON *.* TO load_user@localhost; mysql> LOAD DATA FROM URL '{"url-prefix","https://example.com/bucket/test@lineitem@","url-sequence-start",0,"url-suffix",".tsv.zst","url-prefix-last-append","@","is-dryrun",true}' COUNT 20 INTO TABLE lineitem CHARACTER SET ???? COMPRESSION = {'ZSTD'} FIELDS TERMINATED BY "|" OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\n' IGNORE 20000 LINES ALGORITHM=BULK;
- 
An example that loads data with the URI keyword (supported as of MySQL 9.4.0): mysql> GRANT LOAD_FROM_URL ON *.* TO load_user@localhost; mysql> LOAD DATA FROM URI 'https://data_files.com/data_files_1.tbl' INTO TABLE lineitem FIELDS TERMINATED BY "|" OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\n' ALGORITHM=BULK;
- 
An example that monitors bulk load progress in a second session. - 
Review the list of stages with the following query: mysql> SELECT NAME, ENABLED, TIMED FROM performance_schema.setup_instruments WHERE ENABLED='YES' AND NAME LIKE "stage/bulk_load%";
- 
Enable the events_stages_currentwith the following query:mysql> UPDATE performance_schema.setup_consumers SET ENABLED = 'YES' WHERE NAME LIKE 'events_stages_current';
- 
Use one session to run bulk load, and monitor progress in a second session: mysql> SELECT thread_id, event_id, event_name, WORK_ESTIMATED, WORK_COMPLETED FROM performance_schema.events_stages_current; -------------- SELECT thread_id, event_id, event_name, WORK_ESTIMATED, WORK_COMPLETED FROM performance_schema.events_stages_current -------------- +-----------+----------+----------------------------------+----------------+----------------+ | thread_id | event_id | event_name | WORK_ESTIMATED | WORK_COMPLETED | +-----------+----------+----------------------------------+----------------+----------------+ | 49 | 5 | stage/bulk_load_unsorted/sorting | 1207551343 | 583008145 | +-----------+----------+----------------------------------+----------------+----------------+ 1 row in set (0.00 sec)
 
-