LDML Syntax Supported in MySQL

This section describes the LDML syntax that MySQL recognizes. This is a subset of the syntax described in the LDML specification available at http://www.unicode.org/reports/tr35/, which should be consulted for further information. The rules described here are all supported except that character sorting occurs only at the primary level. Rules that specify differences at secondary or higher sort levels are recognized (and thus can be included in collation definitions) but are treated as equality at the primary level.

Character Representation

Characters named in LDML rules can be written in \unnnn format, where nnnn is the hexadecimal Unicode code point value. Within hexadecimal values, the digits A through F are not case sensitive; \u00E1 and \u00e1 are equivalent. Basic Latin letters A-Z and a-z can also be written literally (this is a MySQL limitation; the LDML specification permits literal non-Latin1 characters in the rules). Only characters in the Basic Multilingual Plane can be specified. This notation does not apply to characters outside the BMP range of 0000 to FFFF.

The Index.xml file itself should be written using ASCII encoding.

Syntax Rules

LDML has reset rules and shift rules to specify character ordering. Orderings are given as a set of rules that begin with a reset rule that establishes an anchor point, followed by shift rules that indicate how characters sort relative to the anchor point.

  • A <reset> rule does not specify any ordering in and of itself. Instead, it resets the ordering for subsequent shift rules to cause them to be taken in relation to a given character. Either of the following rules resets subsequent shift rules to be taken in relation to the letter 'A':

  • The <p>, <s>, and <t> shift rules define primary, secondary, and tertiary differences of a character from another character:

    • Use primary differences to distinguish separate letters.

    • Use secondary differences to distinguish accent variations.

    • Use tertiary differences to distinguish lettercase variations.

    Either of these rules specifies a primary shift rule for the 'G' character:


Download this Manual
User Comments
Sign Up Login You must be logged in to post a comment.