Do not use the external_tables
option
as of MySQL 8.4.0. This option will be deprecated in a
future release. Use the input_list
JSON
array instead. See
Section 2.2.4.2, “Auto Parallel Load Syntax” and
refer to the syntax and full descriptions for
input_list
.
The Auto Parallel Load external_tables
option is a
JSON array that includes one or more
db_object
items. It allows you to set the
details for schemas and tables to load.
While it is possible to define the entire load command on
a single line, for readability the configuration is
divided into option definitions using
SET
.
Define the following option sets:
-
Define the name of the database which will store the data.
Press CTRL+C to copymysql>SET @db_list = '["tpch"]';
This assumes that Lakehouse Auto Parallel Load will analyze the data, infer the table structure, and create the database and all tables. See: Section 5.2.4.2, “Lakehouse Auto Parallel Load with the external_tables Option”.
-
Define the
db_object
parameters that will load data from three external sources with Avro, CSV and Parquet format files:Press CTRL+C to copymysql>SET @ext_tables = '[ { "db_name": "tpch", "tables": [{ "table_name": "supplier_pq", "dialect": { "format": "parquet" }, "file": [{ "prefix": "src_data/parquet/tpch/supplier/", "bucket": "myBucket", "namespace": "myNamespace", "region": "myRegion" }] }, { "table_name": "nation_csv", "dialect": { "format": "csv", "field_delimiter": "|", "record_delimiter": "|\\n", "has_header": true }, "file": [{ "par": "https://objectstorage.../nation.csv" }] }, { "table_name": "region_avro", "dialect": { "format": "avro" }, "file": [{ "par": "https://objectstorage.../region.avro" }] }] } ]';
-
Define the
@options
variable withSET
:Press CTRL+C to copymysql>SET @options = JSON_OBJECT('external_tables', CAST(@ext_tables AS JSON));
Setting
mode
todryrun
generates the load script but does not create or load the external tables. For example:Press CTRL+C to copymysql>SET @options = JSON_OBJECT('mode', 'dryrun', 'external_tables', CAST(@ext_tables AS JSON));
To implement the changes as part of the load command, set
mode
tonormal
. This is the default, and it is not necessary to add it to the command.
Exclude columns from the loading process with the
exclude_list
option. See
Section 2.2.4.2, “Auto Parallel Load Syntax”.
Lakehouse Auto Parallel Load infers the column names for Avro and Parquet files,
and also for CSV files if has_header
is
true
. For these situations, use the
column names with the exclude_list
option.
If the table already exists, but no data has been loaded,
use the existing column names with the
exclude_list
option.
For CSV files if has_header
is
false
, use the generated schema names
with the exclude_list
option. These
are: col_1
, col_2
,
col_3
...