Documentation Home
MySQL HeatWave User Guide
Related Documentation Download this Manual
PDF (US Ltr) - 1.6Mb
PDF (A4) - 1.6Mb


MySQL HeatWave User Guide  /  HeatWave  /  Bulk Ingest Data to MySQL Server

2.17 Bulk Ingest Data to MySQL Server

MySQL 8.1.0 introduces a bulk load extension to the LOAD DATA statement which is only available in HeatWave. It adds the following features:

  • The optimized loading of data sorted by primary key.

  • The optimized loading of unsorted data.

  • The optimized loading of data from an object store.

  • The optimized loading of data from a series of files.

LOAD DATA
    [LOW_PRIORITY | CONCURRENT]
    [FROM]
    INFILE | URL | S3 'file_prefix' [COUNT N]
    [IN PRIMARY KEY ORDER]
    INTO TABLE tbl_name
    [CHARACTER SET charset_name]
    [{FIELDS | COLUMNS}
        [TERMINATED BY 'string']
        [[OPTIONALLY] ENCLOSED BY 'char']
        [ESCAPED BY 'char']
    ]
    [LINES
        [TERMINATED BY 'string']
    ]
    [IGNORE number {LINES | ROWS}]
    [PARALLEL = number]
    [MEMORY = M]
    [ALGORITHM = BULK]

The additional LOAD DATA clauses are:

  • FROM: Makes the statement more readable.

  • URL: The Oracle Cloud Infrastructure file location.

    This requires a pre-authenticated request, PAR. See: Section 4.5.1, “Pre-Authenticated Requests”. It also requires the user privilege LOAD_FROM_URL.

    See: Section 2.18.8, “Bulk Ingest Data to MySQL Server Limitations”.

  • S3: The AWS S3 file location.

    This requires the user privilege LOAD_FROM_S3.

    See: Section 2.18.8, “Bulk Ingest Data to MySQL Server Limitations”.

  • COUNT: The number of files in a series of files.

    For COUNT 5 and file_prefix set to data.csv., the five files would be: data.csv.1, data.csv.2, data.csv.3, data.csv.4, and data.csv.5.

  • IN PRIMARY KEY ORDER: Use when the data is already sorted. The values should be in ascending order within the file.

    For a file series, the primary keys in each file must be disjoint and in ascending order from one file to the next.

  • PARALLEL: The number of concurrent threads to use. A typical value might be 16, 32 or 48. The default value is 16.

    PARALLEL does not require CONCURRENT.

  • MEMORY: The amount of memory to use. A typical value might be 512M or 4G. The default value is 1G.

  • ALGORITHM: Set to BULK for bulk load. The file format is CSV.

LOAD DATA with ALGORITHM=BULK does not support these clauses:

LOAD DATA
    [LOCAL]
    [REPLACE | IGNORE]
    [PARTITION (partition_name [, partition_name] ...)]
    ]
    [LINES
        [STARTING BY 'string']
    ]
    [(col_name_or_user_var
        [, col_name_or_user_var] ...)]
    [SET col_name={expr | DEFAULT}
        [, col_name={expr | DEFAULT}] ...]

LOAD DATA with ALGORITHM=BULK supports the following data types:

For the data types that HeatWave supports, see: Section 2.10, “Supported Data Types”

See: Section 2.18.8, “Bulk Ingest Data to MySQL Server Limitations”.

Syntax Examples

  • An example that loads unsorted data from AWS S3 with 48 concurrent threads and 4G of memory.

    mysql> GRANT LOAD_FROM_S3 ON *.* TO load_user@localhost;
    
    mysql> LOAD DATA FROM S3 's3-us-east-1://innodb-bulkload-dev-1/lineitem.tbl'
              INTO TABLE lineitem
              FIELDS TERMINATED BY "|"
              OPTIONALLY ENCLOSED BY '"'
              LINES TERMINATED BY '\n'
              PARALLEL = 48
              MEMORY = 4G
              ALGORITHM=BULK;
  • An example that loads eight files of sorted data from AWS S3. The file_prefix ends with a period. The files are lineitem.tbl.1, lineitem.tbl.2, ... lineitem.tbl.8.

    mysql> GRANT LOAD_FROM_S3 ON *.* TO load_user@localhost;
    
    mysql> LOAD DATA FROM S3 's3-us-east-1://innodb-bulkload-dev-1/lineitem.tbl.' COUNT 8
              IN PRIMARY KEY ORDER
              INTO TABLE lineitem
              FIELDS TERMINATED BY "|"
              OPTIONALLY ENCLOSED BY '"'
              LINES TERMINATED BY '\n'
              ALGORITHM=BULK;