NaN values in Parquet files are loaded as NULL.
-
As of MySQL 9.3.2, Lakehouse supports the
VECTOR
data type. Consider the following limitations:-
To load Parquet files with the
VECTOR
data type (from Parquet strings or from Parquet (numeric) Lists), you must load the table manually withis_strict_mode
set tofalse
. For loading simple lists asJSON
, you can load the data manually or automatically. To learn more, see Vector Data Type and External Table Syntax.TipIf you need to manually create a table but do not have the appropriate table definition, you can load the table automatically in
dryrun
mode and review the load script. You also need to convert columns fromJSON
toVECTOR
as needed. See the example commands below to do this.mysql> SET @input_list = '[{ "db_name": "data_db", "tables": [{ "table_name": "table_1", "engine_attribute": {"dialect": {"format": "parquet" "embed_model_id": "minilm"}, "file": [{"par": "https://objectstorage.us-ashburn-1.oraclecloud.com/p/.../n/tenancy_1/b/bucket_1/o/data_file_1.parquet"}]} }]}]'; mysql> CALL sys.HEATWAVE_LOAD(CAST(@input_list AS JSON), JSON_OBJECT('mode', 'dryrun')); mysql> SELECT log->>"$.sql" AS "Load Script" FROM sys.heatwave_autopilot_report WHERE type = "sql" ORDER BY id;
You cannot give the
VECTOR
data type a default value other thanNULL
.-
You cannot directly convert
VECTOR
toJSON
orJSON
toVECTOR
.-
To convert
VECTOR
toJSON
, first convertVECTOR
to a STRING data type by using the VECTOR_TO_STRING() function. Then, cast this string asJSON
.For example, if
col_1
intable_1
is aVECTOR
column that you want to convert toJSON
, you can do the following:SELECT CAST(VECTOR_TO_STRING(col_1) as JSON) FROM table_1;
-
To convert
JSON
toVECTOR
, first convertJSON
to a STRING data type, and then use the STRING_TO_VECTOR() function to convert toVECTOR
.For example, if
col_1
intable_1
is aJSON
column that you want to convert toVECTOR
, you can do the following:SELECT STRING_TO_VECTOR(CAST(col_1 as NCHAR)) FROM table_1;
-
If you run
Vector_to_string(col)
as a Vector_to_string() function, it fails if there are any0x
(default for invalid vectors) values. To avoid errors, filter out these values.
-
As of MySQL 9.1.2, you can load Parquet files with row groups that total up to 10GB with large HeatWave Cluster shapes.
-
Lakehouse does not support the following data types in Parquet files. Lakehouse marks columns with these data types as
NOT SECONDARY
, and does not load them.BSON
ENUM
Interval
-
List
Not supported before MySQL 9.3.2. As of MySQL 9.3.2, limited support is available for Parquet Lists in Lakehouse. Simple (non-nested) lists of the following Parquet data types are now supported:
INT_8
,INT_16
,INT_32
,INT_64
,UINT_8
,UINT_16
,UINT_32
,UINT_64
,FLOAT
(32-bit),DOUBLE
,STRING
, andBOOL
.
Map
Unknown
UUID
Do not use strict SQL mode if the inferred schema differs from the table schema.