WL#10778: Parser: output deprecation warnings on utf8 references, where utf8mb3 is an alias of utf8

Affects: Server-8.0   —   Status: Complete   —   Priority: Medium

Currently the utf8 charset is the alias for utf8mb3.

Since we are going to completely replace utf8mb3 with utf8mb4, this is logical to notify customers, that the utf8 character set alias and related syntax constructs will change their meaning soon:

  • utf8 charset alias itself,

  • _utf8 charset prefix,

  • N'...' string literals,

  • NATIONAL, NCHAR etc. data types.

The current WL is intended to investigate all affected cases in the grammar and force deprecation warnings on them where applicable.

  • NF-1 A warning will be output whenever the parser sees NATIONAL/N[VAR]CHAR/N'...'/.
  • NF-2 A warning will be output whenever the parser sees 'utf8' used as a character set name, or _utf8'...'

Affected grammar cases

There is a number of places in the current MySQL grammar where utf8mb3 is intended as an alias of UTF8 instead of utf8mb4, so it would be nice to warn there:

1. String literals (both DML and DDL)

  • String constants with the "national" charset: N'...'.

  • Charset-prefixed string constants: _utf8 '...'.

2. "National" data types (DDL, DML)

There is a number of "national" data types with the utf8mb3 charset default (utf8_general_ci collation):

  • NATIONAL (CHAR VARYING | VARCHAR)

  • NCHAR [VARCHAR | VARYING]

  • NVARCHAR

Note: UNICODE is for UCS2, not UTF8.

3. utf8 charset references

The UTF8 charset can be referenced from DML and DDL as a part of:

  • type declarations (columns, SP parameters/variables/return data types),

  • table and schema declarations,

  • string conversion function calls,

  • administrative statements -- connection/client charset manipulation.

In those contexts the charset can be references in a form of:

  • identifier: utf8,

  • quoted identifier (MySQL extention): `utf8`,

  • standard quoted identifier/MySQL string: "utf8",

  • standard string: 'utf8'.

Examples:

  • Data type declarations:

    CHAR [BINARY] CHARSET utf8,
    VARCHAR [BINARY] CHARSET `utf8`,
    TEXT [BINARY] CHARSET "utf8",
    TINYTEXT [BINARY] CHARSET 'utf8',
    

    etc.

  • Some of them can be used as cast types in string conversion functions:

    CAST(... AS <cast type>)
    CONVERT(..., <cast type>)
    
  • Direct utf8 charset references in CHARSET/CHARACTER SET/COLLATION clauses of various DDL and administrative statements:

    CREATE DATABASE ... CHARSET utf8
    SET CHARSET utf8
    
  • Direct charset references (without a type definition) in function calls:

    CHAR(... USING utf8)
    CONVERT(... USING utf8)
    
  • SET NAMES utf8