Documentation Home
MySQL 5.7 Reference Manual
Related Documentation Download this Manual
PDF (US Ltr) - 37.9Mb
PDF (A4) - 37.9Mb
PDF (RPM) - 37.2Mb
HTML Download (TGZ) - 10.2Mb
HTML Download (Zip) - 10.3Mb
HTML Download (RPM) - 8.9Mb
Man Pages (TGZ) - 214.6Kb
Man Pages (Zip) - 327.5Kb
Info (Gzip) - 3.4Mb
Info (Zip) - 3.4Mb
Excerpts from this Manual

MySQL 5.7 Reference Manual  /  ...  /  ndb_import — Import CSV Data Into NDB

21.4.14 ndb_import — Import CSV Data Into NDB

ndb_import imports CSV-formatted data into NDB using the NDB API, which requires a connection to an NDB management server. A connection to a MySQL Server is not required.

Usage

ndb_import db_name file_name options

ndb_import requires two arguments. db_name is the name of the database in which the database where the table into which to import the data is found; file_name is the name of the CSV file from which to read the data, including the path to this file if it is not in the current directory. The name of the file must match the name of the table; the file's extention, if any, is not taken into consdieration. Additional arguments (described later in this section) are supported, including options for specifying field separators, escapes, and line terminators. ndb_import must be able to connect to an NDB Cluster management server; for this reason, there must be an unused [api] slot in the config.ini file.

To duplicate an existing table that uses a different storage engine, such as InnoDB, as an NDB table, use the mysql client to perform a SELECT INTO OUTFILE statement to export the existing table to a CSV file, then to execute a CREATE TABLE LIKE statement to create a new table having the same structure as the existing table, then perform ALTER TABLE ... ENGINE=NDB on the new table; after this, from the system shell, invoke ndb_import to load the data into the new NDB table. For example, an existing InnoDB table named myinnodb_table in a database named myinnodb can be exported into an NDB table named myndb_table in a database named myndb as shown here, assuming that you are already logged in as a MySQL user with the appropriate privileges:

  1. In the mysql client:

    mysql> USE myinnodb;
    
    mysql> SELECT * INTO OUTFILE '/tmp/myndb_table.csv'
    mysql>  FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' ESCAPED BY '\\'
    mysql>  LINES TERMINATED BY '\n'
    mysql>  FROM myinnodbtable;
    
    mysql> CREATE DATABASE myndb;
    
    mysql> USE myndb;
    
    mysql> CREATE TABLE myndb_table LIKE myinnodb.myinnodb_table;
    
    mysql> ALTER TABLE myndb_table ENGINE=NDB;
    
    mysql> EXIT;
    Bye
    shell>

    Once the target database and table have been created, a running mysqld is no longer required. You can stop it using mysqladmin shutdown or another method before proceeding, if you wish.

  2. In the system shell:

    # if you are not already in the MySQL bin directory:
    shell> cd path-to-mysql-bin-dir
    
    shell> ndb_import myndb /tmp/myndb_table.csv --fields-optionally-enclosed-by='"' \
        --fields-separated-by="," --fields-escaped-by='\\'

    The output should resemble what is shown here:

    job-1 import myndb.myndb_table from /tmp/myndb_table.csv
    job-1 [running] import myndb.myndb_table from /tmp/myndb_table.csv
    job-1 [success] import myndb.myndb_table from /tmp/myndb_table.csv
    job-1 imported 19984 rows in 0h0m9s at 2277 rows/s
    jobs summary: defined: 1 run: 1 with success: 1 with failure: 0
    shell>

The following table includes options that are specific to ndb_import. Additional descriptions follow the table. For options common to most NDB Cluster programs (including ndb_import), see Section 21.4.32, “Options Common to NDB Cluster Programs — Options Common to NDB Cluster Programs”.

Table 21.97 This table describes command-line options for the ndb_import program

Format Description Added or Removed

--abort-on-error

Dump core on any fatal error; used for debugging

ADDED: NDB 7.6.2

--ai-increment=#

For table with hidden PK, specify autoincrement increment. See mysqld

ADDED: NDB 7.6.2

--ai-offset=#

For table with hidden PK, specify autoincrement offset. See mysqld

ADDED: NDB 7.6.2

--ai-prefetch-sz=#

For table with hidden PK, specify number of autoincrement values that are prefetched. See mysqld

ADDED: NDB 7.6.2

--connections=#

Number of cluster connections to create

ADDED: NDB 7.6.2

--continue

When job fails, continue to next job

ADDED: NDB 7.6.2

--db-workers=#

Number of threads, per data node, executing database operations

ADDED: NDB 7.6.2

--errins-type=name

Error insert type, for testing purposes; use "list" to obtain all possible values

ADDED: NDB 7.6.2

--errins-delay=#

Error insert delay in milliseconds; random variation is added

ADDED: NDB 7.6.2

--fields-enclosed-by=char

Same as FIELDS ENCLOSED BY option for LOAD DATA statements. For CSV input this is same as using --fields-optionally-enclosed-by

ADDED: NDB 7.6.2

--fields-escaped-by=name

Same as FIELDS ESCAPED BY option for LOAD DATA statements

ADDED: NDB 7.6.2

--fields-optionally-enclosed-by=char

Same as FIELDS OPTIONALLY ENCLOSED BY option for LOAD DATA statements

ADDED: NDB 7.6.2

--fields-terminated-by=char

Same as FIELDS TERMINATED BY option for LOAD DATA statements.

ADDED: NDB 7.6.2

--idlesleep=#

Number of milliseconds to sleep waiting for more to do

ADDED: NDB 7.6.2

--idlespin=#

Number of times to re-try before idlesleep

ADDED: NDB 7.6.2

--ignore-lines=#

Ignore first # lines in input file. Used to skip a non-data header.

ADDED: NDB 7.6.2

--input-type=name

Input type: random or csv

ADDED: NDB 7.6.2

--input-workers=#

Number of threads processing input. Must be 2 or more if --input-type is csv.

ADDED: NDB 7.6.2

--lines-terminated-by=name

Same as LINES TERMINATED BY option for LOAD DATA statements

ADDED: NDB 7.6.2

--max-rows=#

Import only this number of input data rows; default is 0, which imports all rows

ADDED: NDB 7.6.2

--monitor=#

Periodically print status of running job if something has changed (status, rejected rows, temporary errors). Value 0 disables. Value 1 prints any change seen. Higher values reduce status printing exponentially up to some pre-defined limit.

ADDED: NDB 7.6.2

--no-asynch

Run database operations as batches, in single transactions

ADDED: NDB 7.6.2

--no-hint

Do not use distribution key hint to select data node (TC)

ADDED: NDB 7.6.2

--opbatch=#

A db execution batch is a set of transactions and operations sent to NDB kernel. This option limits NDB operations (including blob operations) in a db execution batch. Therefore it also limits number of asynch transactions. Value 0 is not valid

ADDED: NDB 7.6.2

--opbytes=#

Limit bytes in execution batch (default 0 = no limit)

ADDED: NDB 7.6.2

--output-type=name

Output type: ndb is default, null used for testing

ADDED: NDB 7.6.2

--output-workers=#

Number of threads processing output or relaying database operations

ADDED: NDB 7.6.2

--pagesize=#

Align I/O buffers to given size

ADDED: NDB 7.6.2

--pagecnt=#

Size of I/O buffers as multiple of page size. CSV input worker allocates a double-sized buffer

ADDED: NDB 7.6.2

--polltimeout=#

Timeout per poll for completed asynchonous transactions; polling continues until all polls are completed, or error occurs

ADDED: NDB 7.6.2

--rejects=#

Limit number of rejected rows (rows with permanent error) in data load. Default is 0 which means that any rejected row causes a fatal error. The row exceeding the limit is also added to *.rej

ADDED: NDB 7.6.2

--resume

If job aborted (temporary error, user interrupt), resume with rows not yet processed

ADDED: NDB 7.6.2

--rowbatch=#

Limit rows in row queues (default 0 = no limit); must be 1 or more if --input-type is random

ADDED: NDB 7.6.2

--rowbytes=#

Limit bytes in row queues (0 = no limit)

ADDED: NDB 7.6.2

--state-dir=name

Where to write state files; currect directory is default

ADDED: NDB 7.6.2

--tempdelay=#

Number of milliseconds to sleep between temporary errors

ADDED: NDB 7.6.2

--temperrors=#

Number of times a transaction can fail due to a temporary error, per execution batch; 0 means any temporary error is fatal. Such errors do not cause any rows to be written to .rej file

ADDED: NDB 7.6.2

--verbose=#,

-v

Verbosity level for debug messages (maximum is 4 for debug builds)

ADDED: NDB 7.6.2


  • --abort-on-error

    Introduced5.7.18-ndb-7.6.2
    Command-Line Format--abort-on-error
    Permitted Values (>= 5.7.18-ndb-7.6.2)Typeboolean
    DefaultFALSE

    Dump core on any fatal error; used for debugging only.

  • --ai-increment=#

    Introduced5.7.18-ndb-7.6.2
    Command-Line Format--ai-increment=#
    Permitted Values (>= 5.7.18-ndb-7.6.2)Typeinteger
    Default1
    Min Value1
    Max Value4294967295

    For a table with a hidden primary key, specify the autoincrement increment, like the the auto_increment_increment system variable does in the MySQL Server.

  • --ai-offset=#

    Introduced5.7.18-ndb-7.6.2
    Command-Line Format--ai-offset=#
    Permitted Values (>= 5.7.18-ndb-7.6.2)Typeinteger
    Default1
    Min Value1
    Max Value4294967295

    For a table with hidden primary key, specify the autoincrement offset. Similar to the auto_increment_offset system variable.

  • --ai-prefetch-sz=#

    Introduced5.7.18-ndb-7.6.2
    Command-Line Format--ai-prefetch-sz=#
    Permitted Values (>= 5.7.18-ndb-7.6.2)Typeinteger
    Default1024
    Min Value1
    Max Value4294967295

    For a table with a hidden primary key, specify the number of autoincrement values that are prefetched. Behaves like the ndb_autoincrement_prefetch_sz system variable does in the MySQL Server.

  • --connections=#

    Introduced5.7.18-ndb-7.6.2
    Command-Line Format--connections=#
    Permitted Values (>= 5.7.18-ndb-7.6.2)Typeinteger
    Default1
    Min Value1
    Max Value4294967295

    Number of cluster connections to create.

  • --continue

    Introduced5.7.18-ndb-7.6.2
    Command-Line Format--continue
    Permitted Values (>= 5.7.18-ndb-7.6.2)Typeboolean
    DefaultFALSE

    When a job fails, continue to the next job.

  • --db-workers=#

    Introduced5.7.18-ndb-7.6.2
    Command-Line Format--db-workers=#
    Permitted Values (>= 5.7.18-ndb-7.6.2)Typeinteger
    Default1
    Min Value1
    Max Value4294967295

    Number of threads, per data node, executing database operations.

  • --errins-type=name

    Introduced5.7.18-ndb-7.6.2
    Command-Line Format--errins-type=name
    Permitted Values (>= 5.7.18-ndb-7.6.2)Typeenumeration
    Default[none]
    Valid Valuesstopjob
    stopall
    sighup
    sigint
    list

    Error insert type; use list as the name value to obtain all possible values. This option is used for testing purposes only.

  • --errins-delay=#

    Introduced5.7.18-ndb-7.6.2
    Command-Line Format--errins-delay=#
    Permitted Values (>= 5.7.18-ndb-7.6.2)Typeinteger
    Default1000
    Min Value0
    Max Value4294967295

    Error insert delay in milliseconds; random variation is added. This option is used for testing purposes only.

  • --fields-enclosed-by=char

    Introduced5.7.18-ndb-7.6.2
    Command-Line Format--fields-enclosed-by=char
    Permitted Values (>= 5.7.18-ndb-7.6.2)Typestring
    Default[none]

    This works in the same way as the FIELDS ENCLOSED BY option does for the LOAD DATA statement, specifying a character to be interpeted as quoting field values. For CSV input, this is the same as --fields-optionally-enclosed-by.

  • --fields-escaped-by=name

    Introduced5.7.18-ndb-7.6.2
    Command-Line Format--fields-escaped-by=name
    Permitted Values (>= 5.7.18-ndb-7.6.2)Typestring
    Default\

    Specify an escape character in the same way as the FIELDS ESCAPED BY option does for the SQL LOAD DATA statement.

  • --fields-optionally-enclosed-by=char

    Introduced5.7.18-ndb-7.6.2
    Command-Line Format--fields-optionally-enclosed-by=char
    Permitted Values (>= 5.7.18-ndb-7.6.2)Typestring
    Default[none]

    This works in the same way as the FIELDS OPTIONALLY ENCLOSED BY option does for the LOAD DATA statement, specifying a character to be interpeted as optionally quoting field values. For CSV input, this is the same as --fields-enclosed-by.

  • --fields-terminated-by=char

    Introduced5.7.18-ndb-7.6.2
    Command-Line Format--fields-terminated-by=char
    Permitted Values (>= 5.7.18-ndb-7.6.2)Typestring
    Default\t

    This works in the same way as the FIELDS TERMINATED BY option does for the LOAD DATA statement, specifying a character to be interpeted as the field separator.

  • --idlesleep=#

    Introduced5.7.18-ndb-7.6.2
    Command-Line Format--idlesleep=#
    Permitted Values (>= 5.7.18-ndb-7.6.2)Typeinteger
    Default1
    Min Value1
    Max Value4294967295

    Number of milliseconds to sleep waiting for more work to perform.

  • --idlespin=#

    Introduced5.7.18-ndb-7.6.2
    Command-Line Format--idlespin=#
    Permitted Values (>= 5.7.18-ndb-7.6.2)Typeinteger
    Default0
    Min Value0
    Max Value4294967295

    Number of times to retry before sleeping.

  • --ignore-lines=#

    Introduced5.7.18-ndb-7.6.2
    Command-Line Format--ignore-lines=#
    Permitted Values (>= 5.7.18-ndb-7.6.2)Typeinteger
    Default0
    Min Value0
    Max Value4294967295

    Cause ndb_import to ignore the first # lines of the input file. This can be employed to skip a file header that does not contain any data.

  • --input-type=name

    Introduced5.7.18-ndb-7.6.2
    Command-Line Format--input-type=name
    Permitted Values (>= 5.7.18-ndb-7.6.2)Typeenumeration
    Defaultcsv
    Valid Valuesrandom
    csv

    Set the type of input type. The default is csv; random is intended for testing purposes only. .

  • --input-workers=#

    Introduced5.7.18-ndb-7.6.2
    Command-Line Format--input-workers=#
    Permitted Values (>= 5.7.18-ndb-7.6.2)Typeinteger
    Default2
    Min Value1
    Max Value4294967295

    Set the number of threads processing input.

  • --lines-terminated-by=name

    Introduced5.7.18-ndb-7.6.2
    Command-Line Format--lines-terminated-by=name
    Permitted Values (>= 5.7.18-ndb-7.6.2)Typestring
    Default\n

    This works in the same way as the LINES TERMINATED BY option does for the LOAD DATA statement, specifying a character to be interpeted as end-of-line.

  • --max-rows=#

    Introduced5.7.18-ndb-7.6.2
    Command-Line Format--max-rows=#
    Permitted Values (>= 5.7.18-ndb-7.6.2)Typeinteger
    Default0
    Min Value0
    Max Value4294967295

    Import only this number of input data rows; the default is 0, which imports all rows.

  • --monitor=#

    Introduced5.7.18-ndb-7.6.2
    Command-Line Format--monitor=#
    Permitted Values (>= 5.7.18-ndb-7.6.2)Typeinteger
    Default2
    Min Value0
    Max Value4294967295

    Periodically print the status of a running job if something has changed (status, rejected rows, temporary errors). Set to 0 to disable this reporting. Setting to 1 prints any change that is seen. Higher values reduce the frequency of this status reporting.

  • --no-asynch

    Introduced5.7.18-ndb-7.6.2
    Command-Line Format--no-asynch
    Permitted Values (>= 5.7.18-ndb-7.6.2)Typeboolean
    DefaultFALSE

    Run database operations as batches, in single transactions.

  • --no-hint

    Introduced5.7.18-ndb-7.6.2
    Command-Line Format--no-hint
    Permitted Values (>= 5.7.18-ndb-7.6.2)Typeboolean
    DefaultFALSE

    Do not use distribution key hinting to select a data node.

  • --opbatch=#

    Introduced5.7.18-ndb-7.6.2
    Command-Line Format--opbatch=#
    Permitted Values (>= 5.7.18-ndb-7.6.2)Typeinteger
    Default256
    Min Value1
    Max Value4294967295

    Set a limit on the number of operations (including blob operations), and thus the number of asynchronous transactions, per execution batch.

  • --opbytes=#

    Introduced5.7.18-ndb-7.6.2
    Command-Line Format--opbytes=#
    Permitted Values (>= 5.7.18-ndb-7.6.2)Typeinteger
    Default0
    Min Value0
    Max Value4294967295

    Set a limit on the number of bytes per execution batch. Use 0 for no limit.

  • --output-type=name

    Introduced5.7.18-ndb-7.6.2
    Command-Line Format--output-type=name
    Permitted Values (>= 5.7.18-ndb-7.6.2)Typeenumeration
    Defaultndb
    Valid Valuesnull

    Set the output type. ndb is the default. null is used only for testing.

  • --output-workers=#

    Introduced5.7.18-ndb-7.6.2
    Command-Line Format--output-workers=#
    Permitted Values (>= 5.7.18-ndb-7.6.2)Typeinteger
    Default2
    Min Value1
    Max Value4294967295

    Set the number of threads processing output or relaying database operations.

  • --pagesize=#

    Introduced5.7.18-ndb-7.6.2
    Command-Line Format--pagesize=#
    Permitted Values (>= 5.7.18-ndb-7.6.2)Typeinteger
    Default4096
    Min Value1
    Max Value4294967295

    Align I/O buffers to the given size.

  • --pagecnt=#

    Introduced5.7.18-ndb-7.6.2
    Command-Line Format--pagecnt=#
    Permitted Values (>= 5.7.18-ndb-7.6.2)Typeinteger
    Default64
    Min Value1
    Max Value4294967295

    Set the size of I/O buffers as multiple of page size. The CSV input worker allocates buffer that is doubled in size.

  • --polltimeout=#

    Introduced5.7.18-ndb-7.6.2
    Command-Line Format--polltimeout=#
    Permitted Values (>= 5.7.18-ndb-7.6.2)Typeinteger
    Default1000
    Min Value1
    Max Value4294967295

    Set a timeout per poll for completed asynchonous transactions; polling continues until all polls are completed, or until an error occurs.

  • --rejects=#

    Introduced5.7.18-ndb-7.6.2
    Command-Line Format--rejects=#
    Permitted Values (>= 5.7.18-ndb-7.6.2)Typeinteger
    Default0
    Min Value0
    Max Value4294967295

    Limit the number of rejected rows (rows with permanent errors) in the data load. The default is 0, which means that any rejected row causes a fatal error. The row causing the limit to be exceeded is added to the .rej file.

  • --resume

    Introduced5.7.18-ndb-7.6.2
    Command-Line Format--resume
    Permitted Values (>= 5.7.18-ndb-7.6.2)Typeboolean
    DefaultFALSE

    If a job is aborted (due to a temporary db error or when interrupted by the user), resume with any rows not yet processed.

  • --rowbatch=#

    Introduced5.7.18-ndb-7.6.2
    Command-Line Format--rowbatch=#
    Permitted Values (>= 5.7.18-ndb-7.6.2)Typeinteger
    Default0
    Min Value0
    Max Value4294967295

    Set a limit on the number of rows per row queue. Use 0 for no limit.

  • --rowbytes=#

    Introduced5.7.18-ndb-7.6.2
    Command-Line Format--rowbytes=#
    Permitted Values (>= 5.7.18-ndb-7.6.2)Typeinteger
    Default262144
    Min Value0
    Max Value4294967295

    Set a limit on the number of bytes per row queue. Use 0 for no limit.

  • --state-dir=name

    Introduced5.7.18-ndb-7.6.2
    Command-Line Format --state-dir=name
    Permitted Values (>= 5.7.18-ndb-7.6.2)Typestring
    Default.

    Where to write the state files (tbl_name.map, tbl_name.rej, tbl_name.res, and tbl_name.stt) produced by a run of the program; the default is the current directory.

  • --tempdelay=#

    Introduced5.7.18-ndb-7.6.2
    Command-Line Format--tempdelay=#
    Permitted Values (>= 5.7.18-ndb-7.6.2)Typeinteger
    Default10
    Min Value0
    Max Value4294967295

    Number of milliseconds to sleep between temporary errors.

  • --temperrors=#

    Introduced5.7.18-ndb-7.6.2
    Command-Line Format--temperrors=#
    Permitted Values (>= 5.7.18-ndb-7.6.2)Typeinteger
    Default0
    Min Value0
    Max Value4294967295

    Number of times a transaction can fail due to a temporary error, per execution batch. The default is 0, which means that any temporary error is fatal. Temporary errors do not cause any rows to be added to the .rej file.

  • --verbose=#, -v

    Introduced5.7.18-ndb-7.6.2
    Command-Line Format--verbose=#
    Permitted Values (>= 5.7.18-ndb-7.6.2)Typeinteger
    Default0
    Min Value0
    Max Value2

    Verbosity level for debugging messages.

As with LOAD DATA INFILE, options for field and line formatting much match those used to create the CSV file, whether this was done using SELECT INTO OUTFILE, or by some other means. There is no equivalent to the LOAD DATA INFILE statement's STARTING WITH option.

ndb_import was added in NDB 7.6.2.


User Comments
Sign Up Login You must be logged in to post a comment.