HeatWave on AWS  /  ...  /  Importing Data Using the Bulk Ingest Feature

7.2.2.2 Importing Data Using the Bulk Ingest Feature

Connect to a DB System and bulk ingest data from an Amazon S3 bucket to a DB System in the same region.
This task requires the following:
Do the following to bulk ingest data from an Amazon S3 bucket:
  1. Connect to a DB System. See Connecting to a DB System.
  2. Create a new database or use an existing database to bulk ingest data. For example, the following query creates a new database, airportdb, and selects the database for further actions:
    CREATE DATABASE airportdb;
    USE airportdb;
  3. Use an existing empty table or create a new table. For example, the following query creates a new table, booking:
    CREATE TABLE IF NOT EXISTS `booking` 
       (`booking_id` int NOT NULL AUTO_INCREMENT,
        `flight_id` int NOT NULL,
        `seat` char(4) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci DEFAULT NULL,
        `passenger_id` int NOT NULL,
        `price` decimal(10,2) NOT NULL,
         PRIMARY KEY (`booking_id`)) ENGINE=InnoDB AUTO_INCREMENT=55099799 DEFAULT CHARSET=utf8mb4
            COLLATE=utf8mb4_unicode_ci COMMENT='Verifying bulk ingest';
  4. Bulk ingest data from an Amazon S3 bucket into the table you created using the LOAD DATA query with the BULK algorithm:
    LOAD DATA FROM S3 '<Amazon-S3-URL>' INTO TABLE <Table-name>
            COLUMNS TERMINATED BY '\t' LINES TERMINATED BY '\n' ALGORITHM=BULK;
    • <Amazon-S3-URL>: Specify the Amazon S3 bucket URL present in the same region as the DB System in the following format:
      s3-<Region>://<Bucketname>/<FilenameOrPrefix>
      • <Region>: The AWS region that contains the Amazon S3 bucket to load.
      • <Bucketname>: The name of the Amazon S3 bucket that contains the data to load.
      • <FilenameOrPrefix>: The filename or prefix of one or more text files to load.
    • <Table-name>: Specify an already existing empty table or create a new table.
    • Specify the column and line terminators used by the in the data files you are loading. The statement above assumes the column terminator is \t and the line terminator is \n.
    It is recommended to split the data into multiple files to improve the performance of bulk ingest. For example, the following query loads data from an Amazon S3 bucket that is split across 25 files. The files are named, booking.tsv1, booking.tsv2, ..., booking.tsv25:
    LOAD DATA FROM S3 's3-us-east-1://mysql-heatwave-data-us-east-1/airportdb/booking.tsv.'
            COUNT 25 IN PRIMARY KEY ORDER INTO TABLE booking
            COLUMNS TERMINATED BY '\t' LINES TERMINATED BY '\n' ALGORITHM=BULK;

    See Bulk Ingest Data to MySQL Server.