Documentation Home
HeatWave User Guide
Related Documentation Download this Manual
PDF (US Ltr) - 2.5Mb
PDF (A4) - 2.5Mb


HeatWave User Guide  /  ...  /  Lakehouse Limitations for the Parquet File Format

5.10.5 Lakehouse Limitations for the Parquet File Format

  • NaN values in Parquet files are loaded as NULL.

  • As of MySQL 9.3.2, Lakehouse supports the VECTOR data type. Consider the following limitations:

    • To load Parquet files with the VECTOR data type (from Parquet strings or from Parquet (numeric) Lists), you must load the table manually with is_strict_mode set to false. For loading simple lists as JSON, you can load the data manually or automatically. To learn more, see Vector Data Type and External Table Syntax.

      Tip

      If you need to manually create a table but do not have the appropriate table definition, you can load the table automatically in dryrun mode and review the load script. You also need to convert columns from JSON to VECTOR as needed. See the example commands below to do this.

      mysql> SET @input_list = '[{
        "db_name": "data_db",
        "tables": [{
          "table_name": "table_1",
          "engine_attribute": {"dialect": {"format": "parquet" "embed_model_id": "minilm"},
                               "file": [{"par": "https://objectstorage.us-ashburn-1.oraclecloud.com/p/.../n/tenancy_1/b/bucket_1/o/data_file_1.parquet"}]}
      }]}]';
      mysql> CALL sys.HEATWAVE_LOAD(CAST(@input_list AS JSON), JSON_OBJECT('mode', 'dryrun'));
      mysql> SELECT log->>"$.sql" AS "Load Script" FROM sys.heatwave_autopilot_report WHERE type = "sql" ORDER BY id;
    • You cannot give the VECTOR data type a default value other than NULL.

    • You cannot directly convert VECTOR to JSON or JSON to VECTOR.

    • If you run Vector_to_string(col)as a Vector_to_string() function, it fails if there are any 0x (default for invalid vectors) values. To avoid errors, filter out these values.

  • As of MySQL 9.1.2, you can load Parquet files with row groups that total up to 10GB with large HeatWave Cluster shapes.

  • Lakehouse does not support the following data types in Parquet files. Lakehouse marks columns with these data types as NOT SECONDARY, and does not load them.

    • BSON

    • ENUM

    • Interval

    • List

      • Not supported before MySQL 9.3.2. As of MySQL 9.3.2, limited support is available for Parquet Lists in Lakehouse. Simple (non-nested) lists of the following Parquet data types are now supported: INT_8, INT_16, INT_32, INT_64, UINT_8, UINT_16, UINT_32, UINT_64, FLOAT (32-bit), DOUBLE, STRING, and BOOL.

    • Map

    • Unknown

    • UUID

  • Do not use strict SQL mode if the inferred schema differs from the table schema.