WL#5170: Swedish collation
Affects: Server-Prototype Only
—
Status: Un-Assigned
Adopt a Swedish collation which follows standards and is consistent across many character sets. Start with a new Swedish collation for latin1.
The main recommendation for all new MySQL collations is: follow the Default Unicode Collation Element table (DUCET). This worklog task description is only about the departures from DUCET which are necessary for simplicity and Swedishness, in other words "tailoring". Why not fix latin1_swedish_ci? ------------------------------ We already have "Swedish" collations, but they have imperfections. In particular: latin1_swedish_ci is not only "wrong" (i.e. non-standard/unexpected) for the backslash character mentioned in BUG#46659, it's just as "wrong" for real alphabetic characters (oe ligature, z caron, sharp s, y diaeresis, thorn, s caron, o with stroke) and for many punctuation or special characters http://www.collation-charts.org/mysql60/mysql604.latin1_swedish_ci.html However, we cannot fix an existing collation without affecting indexes drastically. We tried to do that with a small change in utf8_general_ci for SHARP S. The results were terrible. We won't do that again. So we have to have one or more new Swedish collations. The old collations will remain, they will not be deprecated, and latin1_swedish_ci will continue to be the default for the latin1 character set. Name ---- WL#2673 "Unicode Collation Algorithm new version" proposes names like utf8_swedish_500_ci. Since the Unicode version will probably be 5.2 not 5.0, a name like utf8_swedish_520_ci is likely. So a latin1 "equivalent" would be latin1_swedish_520_ci. We once had a complaint from somebody who thought utf8_general_ci must have the same behaviour as latin1_general_ci, since they're both named 'general'. Somebody might apply the same pseudo-logic to our Swedish collation names. Simple Collation ---------------- We will follow current rules described in WL#2673 Unicode Collation Algorithm new version, section "Simple collations". That section includes the rules for expansions and ignorables. CLDR ---- The CLDR (Common Locale Data Repository) is available on the Unicode site for download. CLDR is clearly based on official standards, CLDR seems to be more up to date than e.g. Posix, CLDR is the basis for major products like ICU, CLDR is easy to acquire and read. So in general MySQL should take CLDR seriously. The particular document we're taking from the CLDR repository is collation/sv.xml. The "sv" means "Swedish". It describes two types, "standard" (where w=v) and "reformed" (where w<>v). We care about "reformed". Quotation from the 1982 Swedish standard ---------------------------------------- 4.Swed.1982. Svensk Standard 03 81 04: Dokumentation – Administrativ filering – Alfanumerisk sortering (Documentation – Administrative filing rules – Alphanumerical ordering). 1. ed. (1982-06-25), 4.2.4: “Specialbokstäver i språkskrivna med det latinska alfabetet konverteras vid filering till en eller flera av de latinska bokstäverna enligt följande tabell: isländskt ð, Ð = d ... polskt ł, Ł = l ... serbokroatiskt đ, Đ = d ... samiskt ŋ = n ... turkiskt ı = i ... isländskt þ = th ... grönländskt ĸ = k ... tyskt ü = y ... danskt, norskt ø = ö ... ungerskt ő = y ... ungerskt ű = y”] quoted in http://www.evertype.com/standards/wynnyogh/thorn.html Translating the tabular part alone, we have: ETH "Icelandic ð, Ð" = d D WITH STROKE "Serbocroatian đ, Đ" = d DOTLESS I "Turkish ı" = i KRA "Greenlandic ĸ" U0138 = k O WITH STROKE "Danish, Norwegian ø" = ö L WITH STROKE "Polish ł, Ł" = l ENG "Sami ŋ" U0148, U014A = n THORN "Icelandic þ" = th U WITH DIAERESIS "German ü" = y O WITH DOUBLE ACUTE "Hungarian ő" = y U WITH DOUBLE ACUTE "Hungarian ű" = y The suggestions for DOTLESS I and KRA and ENG and O WITH DOUBLE ACUTE are different from sv.xml. The suggestion for O WITH DOUBLE ACUTE may be a typo. We don't see all the rules here, for example the recent change concerning w <> v. We should get a full copy of the current standard from sis.se (about 60 euros). Also could somebody please check these references re V <> W: http://www.svenskaakademien.se/web/Svenska_Akademiens_ordlista.aspx http://www.dn.se/dnbok/alfabetet-blir-langre-vaxer-med-w-1.666436 Other implementations --------------------- Alexander Barkov maintains a site showing how other DBMS vendors or OS vendors collate. The Oracle10g Swedish collation is: http://www.collation-charts.org/oracle10g/ora10g.WE8MSWIN1252.SWEDISH.html The Microsoft Vista Swedish collation is: http://www.collation-charts.org/vista/vista.041D.CP1252.Swedish_Sweden.html The Rules --------- These are all the sv.xml rules, expressed as Unicode names with the symbol "=" meaning "primary-level equivalent". The columns "Swed 1982", "Microsoft" and "Oracle" contain "yes" when there is agreement with sv.xml, "no" when there is disagreement, and "-" when there is no comparison (because we don't have the complete standard, and because the Microsoft/Oracle collations are only for an 8-bit character set). Swed 1982 Microsoft Oracle --------- --------- ------ A RING BEFORE EZH - yes yes A DIAERESIS AFTER A RING - yes yes O DIAERESIS AFTER A DIAERESIS - yes yes ETH = D yes yes yes D STROKE = D yes - - THORN = TH yes yes (?) no, = T O STROKE = O DIAERESIS yes yes yes AE = A DIAERESIS - no yes O DOUBLE ACUTE = O DIAERESIS yes - - U DIAERESIS = Y yes yes yes U DOUBLE ACUTE = Y no (typo?) - - L STROKE = L yes - - A DIAERESIS = E OGONEK no - - OE = O DIAERESIS no no no O CIRCUMFLEX = O DIAERESIS no no no The first three rules -- A RING BEFORE EZH, A DIAERESIS BEFORE A RING, O DIAERESIS AFTER A DIAERESIS -- are basic; this is what every Swede would agree to without question. EZH is the first real letter after all the variants of Z in the DUCET, so "before EZH" is just a formal way to say "after Z, and after Z caron, and after Visigothic Z, etc.". ETH = D is also generally agreed, indeed it's in the DUCET now. D WITH STROKE = D is probably here because D WITH STROKE looks very much like ETH, not because it's Swedish. THORN = TH is an expansion, so we cannot follow this rule with a simple collation (with a simple collation we take the first expanded letter which is 'T'). There does appear to be agreement that THORN should be with 'T', or a separate letter between 'T' and 'U', or = 'TH'. O STROKE = O DIAERESIS treats a Danish/Norwegian letter, everyone agrees. AE = A DIAERESIS treats a Danish/Norwegian letter, and one could have expected that everyone would agree about this one too. It is a mystery that Microsoft treats AE in the non-Scandinavian way. O DOUBLE ACUTE = O DIAERESIS may be taking into account that, in Swedish handwriting, ö may look like ő, according to http://en.wikipedia.org/wiki/Double_acute_accent U DIAERESIS = Y treats a German letter, everyone agrees. U DOUBLE ACUTE = Y is consistent with the "Swedish handwriting" considerations described for O DOUBLE ACUTE, and the U DIAERESIS rule. L STROKE = L is just basic DUCET, nowadays. It's shown separately because the Swedish standard document mentioned it, and it wasn't in the older versions of the DUCET. A DIAERESIS = E OGONEK will confuse you if you only look at the official Unicode name for the character, because ogoneks are Polish. In reality this concerns E CAUDATA http://en.wikipedia.org/wiki/E_caudata which looks the same as E OGONEK but has a different heritage, from Old Norse. OE = O DIAERESIS might be controversial because most people, probably including most Swedes nowadays, think of the OE ligature as something French. But actually this is the letter "ethel", and again we can turn to Wikipedia for explanation: "Œ is used in the modern scholarly orthography of Old West Norse, representing the long vowel /øː/, contrasting with ø, which represents the short vowel /ø/." http://en.wikipedia.org/wiki/%C5%92 Well, given that ø i.e. O STROKE is accepted to be equal to O DIAERESIS, this makes sense after all. O CIRCUMFLEX = O DIAERESIS defies easy explanation. This time Wikipedia has only a vague mention that might justify: "In Swedish, when transcribing dialectal speech, the circumflex is often used to denote an a or o which is pronounced dialectally as if it has been written ä [æ] or ö [ø]." http://en.wikipedia.org/wiki/Circumflex Okay, but if that's the explanation, then it's odd that we don't see the same thing for A CIRCUMFLEX. Most of the above rules, including the one for O CIRCUMFLEX, are also in sv_SE.UTF-8.src from posix.zip. But not all of them are in the "real" Posix list http://www.collation-charts.org/fc6/fc6.sv_SE.iso88591.html. Existing collations ------------------- Existing Swedish collations like latin1_swedish_ci will continue to be supported indefinitely. In fact latin1_swedish_ci will continue to be the default collation for latin1. BUG#36144 "Add latin1_swedish_cs collation" won't happen. Non-Swedish ----------- This will not become the collation for Finnish. Although in the past it was common to have a Swedish/Finnish combined collation, the CLDR fi.xml differs from the CLDR sv.xml in significant ways. The new rules for OE and O CIRCUMFLEX are inappropriate for French. This new collation, unlike latin1_swedish_ci, will not be recommended for general use outside Sweden. The complete character list --------------------------- The rest of this document is latin1_swedish_5xx_ci detail. Here is the character order, showing UCA 5.2 primary weights, (for a simple collation we have to replace them with 1-byte weights), with '*' meaning optional ignorable with '!' meaning there is a difference from UCA 5.2, e.g. Swedish tailoring. latin1 ucs2 weight name 00 0000 ! 0000 (control) 01 0001 ! 0001 (control) 02 0002 ! 0002 (control) 03 0003 ! 0003 (control) 04 0004 ! 0004 (control) 05 0005 ! 0005 (control) 06 0006 ! 0006 (control) 07 0007 ! 0007 (control) 08 0008 ! 0008 (control) 0e 000e !*000e (control) 0f 000f !*000f (control) 10 0010 ! 0010 (control) 11 0011 ! 0011 (control) 12 0012 ! 0012 (control) 13 0013 ! 0013 (control) 14 0014 ! 0014 (control) 15 0015 ! 0015 (control) 16 0016 ! 0016 (control) 17 0017 ! 0017 (control) 18 0018 ! 0018 (control) 19 0019 !*0019 (control) 1a 001a !*001a (control) 1b 001b !*001b (control) 1c 001c !*001c (control) 1d 001d !*001d (control) 1e 001e !*001e (control) 1f 001f !*001f (control) 7f 007F !*0020 DELETE 81 0081 !*0021 (control) 8d 008d ! 0022 PARTIAL LINE FEED 8f 008f ! 0023 SINGLE SHIFT THREE 90 0090 ! 0024 DEVICE CONTROL STRING 9d 009d ! 0025 OPERATING SYSTEM COMMAND 9d 009d ! 0025 OPERATING SYSTEM COMMAND 09 0009 !*0201 (control) HORIZONTAL TABULATION 0a 000a !*0202 (control) LINE FEED 0b 000b !*0203 (control) VERTICAL TABULATION 0c 000c !*0204 (control) FORM FEED 0d 000d !*0205 (control) CARRIAGE RETURN 20 0020 *020a SPACE a0 00a0 *020A NO-BREAK SPACE 60 0060 *020E GRAVE ACCENT b4 00B4 *020F ACUTE ACCENT 98 02dc *0210 SMALL TILDE 5e 005E *0211 CIRCUMFLEX ACCENT af 00af *0212 MACRON a8 00a8 *0216 DIAERESIS b8 00B8 *021B CEDILLA 5f 005F *021D LOW LINE ad 00ad *0222 SOFT HYPHEN 2d 002D *0223 HYPHEN-MINUS 96 2013 *022B EN DASH 97 2014 *022C EM DASH 2c 002C *0234 COMMA 3b 003B *0243 SEMICOLON 3a 003A *0247 COLON 21 0021 *026E EXCLAMATION MARK a1 00a1 *026F INVERTED EXCLAMATION MARK 3f 003F *0273 QUESTION MARK bf 00BF *0274 INVERTED QUESTION MARK 2e 002E *0281 FULL STOP 85 2026 !*0281+1 HORIZONTAL ELLIPSIS [expansion] b7 00B7 *0292 MIDDLE DOT 27 0027 *02EE APOSTROPHE 91 2018 *02EF LEFT SINGLE QUOTATION MARK 92 2019 *02F0 RIGHT SINGLE QUOTATION MARK 82 201a *02F1 SINGLE LOW-9 QUOTATION MARK 8b 2039 *02F3 SINGLE LEFT-POINTING ANGLE QUOTATION MARK 9b 203a *02F4 SINGLE RIGHT-POINTING ANGLE QUOTATION MARK 22 0022 *02F5 QUOTATION MARK 93 201C *02F6 LEFT DOUBLE QUOTATION MARK 94 201d *02F7 RIGHT DOUBLE QUOTATION MARK 84 201e *02F8 DOUBLE LOW-9 QUOTATION MARK ab 00ab *02FD LEFT-POINTING DOUBLE ANGLE QUOTATION MARK bb 00BB *02FE RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK 28 0028 *02FF LEFT PARENTHESIS 29 0029 *0300 RIGHT PARENTHESIS 5b 005B *0301 LEFT SQUARE BRACKET 5d 005D *0302 RIGHT SQUARE BRACKET 7b 007B *0303 LEFT CURLY BRACKET 7d 007D *0304 RIGHT CURLY BRACKET a7 00a7 *0351 SECTION SIGN b6 00B6 *0352 PILCROW SIGN a9 00a9 *0354 COPYRIGHT SIGN ae 00ae *0355 REGISTERED SIGN 40 0040 *0356 COMMERCIAL AT 2a 002A *0357 ASTERISK 2f 002F *035C SOLIDUS 5c 005C *035E REVERSE SOLIDUS 26 0026 *035F AMPERSAND 23 0023 *0362 NUMBER SIGN 25 0025 *0363 PERCENT SIGN 89 2030 *0365 PER MILLE SIGN 86 2020 *036A DAGGER 87 2021 *036B DOUBLE DAGGER 95 2022 *036C BULLET 88 02c6 *03F0 MODIFIER LETTER CIRCUMFLEX ACCENT b0 00B0 *044B DEGREE SIGN 2b 002B *0550 PLUS SIGN b1 00B1 *0551 PLUS-MINUS SIGN f7 00f7 *0552 DIVISION SIGN d7 00d7 *0553 MULTIPLICATION SIGN 3c 003C *0554 LESS-THAN SIGN 3d 003D *0555 EQUALS SIGN 3e 003E *0556 GREATER-THAN SIGN ac 00ac *0557 NOT SIGN 7c 007C *0558 VERTICAL LINE a6 00a6 *0559 BROKEN BAR 7e 007E *055B TILDE a4 00a4 11DF CURRENCY SIGN a2 00a2 11E0 CENT SIGN 24 0024 11E1 DOLLAR SIGN a3 00a3 11E2 POUND SIGN a5 00a5 11E3 YEN SIGN 80 20AC 11F8 EURO SIGN 30 0030 1205 DIGIT ZERO 31 0031 1206 DIGIT ONE b9 00B9 1206 SUPERSCRIPT ONE bc 00BC ! 1206+1/2 VULGAR FRACTION ONE QUARTER [expansion] bd 00BD ! 1206+1/2 VULGAR FRACTION ONE HALF [expansion] 32 0032 1207 DIGIT TWO b2 00B2 1207 SUPERSCRIPT TWO b3 00B3 1208 SUPERSCRIPT THREE 33 0033 1208 DIGIT THREE be 00BE ! 1208+1/2 VULGAR FRACTION THREE QUARTERS [expansion] 34 0034 1209 DIGIT FOUR 35 0035 120A DIGIT FIVE 36 0036 120B DIGIT SIX 37 0037 120C DIGIT SEVEN 38 0038 120D DIGIT EIGHT 39 0039 120E DIGIT NINE aa 00aa 120F FEMININE ORDINAL INDICATOR 41 0041 120F LATIN CAPITAL LETTER A 61 0061 120F LATIN SMALL LETTER A c0 00c0 120F LATIN CAPITAL LETTER A WITH GRAVE c1 00c1 120F LATIN SMALL LETTER A WITH ACUTE c2 00c2 120F LATIN CAPITAL LETTER A WITH CIRCUMFLEX c3 00c3 120F LATIN CAPITAL LETTER A WITH TILDE e0 00e0 120F LATIN SMALL LETTER A WITH GRAVE e1 00e1 120F LATIN SMALL LETTER A WITH ACUTE e2 00e2 120F LATIN SMALL LETTER A WITH CIRCUMFLEX e3 00e3 120F LATIN SMALL LETTER A WITH TILDE 42 0042 1225 LATIN CAPITAL LETTER B 62 0062 1225 LATIN SMALL LETTER B 43 0043 123D LATIN CAPITAL LETTER C 63 0063 123D LATIN SMALL LETTER C c7 00c7 123D LATIN CAPITAL LETTER C WITH CEDILLA e7 00e7 123D LATIN SMALL LETTER C WITH CEDILLA 44 0044 1250 LATIN CAPITAL LETTER D 64 0064 1250 LATIN SMALL LETTER D d0 00d0 1250 LATIN CAPITAL LETTER ETH f0 00f0 1250 LATIN SMALL LETTER ETH 45 0045 126B LATIN CAPITAL LETTER E 65 0065 126B LATIN SMALL LETTER E c8 00c8 126B LATIN CAPITAL LETTER E WITH GRAVE; c9 00c9 126B LATIN CAPITAL LETTER E WITH ACUTE ca 00ca 126B LATIN CAPITAL LETTER E WITH CIRCUMFLEX; cb 00cb 126B LATIN CAPITAL LETTER E WITH DIAERESIS e8 00e8 126B LATIN SMALL LETTER E WITH GRAVE e9 00e9 126B LATIN SMALL LETTER E WITH ACUTE ea 00ea 126B LATIN SMALL LETTER E WITH CIRCUMFLEX eb 00eb 126B LATIN SMALL LETTER E WITH DIAERESIS 46 0046 12A3 LATIN CAPITAL LETTER F 66 0066 12A3 LATIN SMALL LETTER F 83 0192 12AA LATIN SMALL LETTER F WITH HOOK 47 0047 12B0 LATIN CAPITAL LETTER G 67 0067 12B0 LATIN SMALL LETTER G 48 0048 12D3 LATIN CAPITAL LETTER H 68 0068 12D3 LATIN SMALL LETTER H 49 0049 12EC LATIN CAPITAL LETTER I 69 0069 12EC LATIN SMALL LETTER I cc 00cc 12EC LATIN CAPITAL LETTER I WITH GRAVE cd 00cd 12EC LATIN CAPITAL LETTER I WITH ACUTE ce 00ce 12EC LATIN CAPITAL LETTER I WITH CIRCUMFLEX cf 00cf 12EC LATIN CAPITAL LETTER I WITH DIAERESIS ec 00ec 12EC LATIN SMALL LETTER I WITH GRAVE ed 00ed 12EC LATIN SMALL LETTER I WITH ACUTE ee 00ee 12EC LATIN SMALL LETTER I WITH CIRCUMFLEX ef 00ef 12EC LATIN SMALL LETTER I WITH DIAERESIS 4a 004A 1305 LATIN CAPITAL LETTER J 6a 006a 1305 LATIN SMALL LETTER J 4b 004B 131E LATIN CAPITAL LETTER K 6b 006b 131E LATIN SMALL LETTER K 4c 004C 1330 LATIN CAPITAL LETTER L 6c 006c 1330 LATIN SMALL LETTER L 4d 004D 135F LATIN CAPITAL LETTER M 6d 006d 135F LATIN SMALL LETTER M 4e 004E 136D LATIN CAPITAL LETTER N 6e 006e 136D LATIN SMALL LETTER N d1 00d1 136D LATIN CAPITAL LETTER N WITH TILDE f1 00f1 136D LATIN SMALL LETTER N WITH TILDE 4f 004F 138E LATIN CAPITAL LETTER O 6f 006f 138E LATIN SMALL LETTER O d2 00d2 138E LATIN CAPITAL LETTER O WITH GRAVE d3 00d3 138E LATIN CAPITAL LETTER O WITH ACUTE d5 00d5 138E LATIN CAPITAL LETTER O WITH TILDE f2 00f2 138E LATIN SMALL LETTER O WITH GRAVE f3 00f3 138E LATIN SMALL LETTER O WITH ACUTE f5 00f5 138E LATIN SMALL LETTER O WITH TILDE ba 00BA 138E MASCULINE ORDINAL INDICATOR 50 0050 13B3 LATIN CAPITAL LETTER P 70 0070 13B3 LATIN SMALL LETTER P 51 0051 13C8 LATIN CAPITAL LETTER Q 71 0071 13C8 LATIN SMALL LETTER Q 52 0052 13DA LATIN CAPITAL LETTER R 72 0072 13DA LATIN SMALL LETTER R 53 0053 1410 LATIN CAPITAL LETTER S 73 0073 1410 LATIN SMALL LETTER S 9a 0161 1410 LATIN SMALL LETTER S WITH CARON df 00df ! 1410 LATIN SMALL LETTER SHARP S [expansion] 8a 0160 1410 LATIN CAPITAL LETTER S WITH CARON 54 0054 1433 LATIN CAPITAL LETTER T 74 0074 1433 LATIN SMALL LETTER T de 00de 1433 LATIN CAPITAL LETTER THORN [tailoring] fe 00fe 1433 LATIN SMALL LETTER THORN [tailoring] 99 2122 ! 1433+1 TRADE MARK SIGN [expansion] 55 0055 1453 LATIN CAPITAL LETTER U 75 0075 1453 LATIN SMALL LETTER U d9 00d9 1453 LATIN CAPITAL LETTER U WITH GRAVE da 00da 1453 LATIN CAPITAL LETTER U WITH ACUTE db 00db 1453 LATIN CAPITAL LETTER U WITH CIRCUMFLEX f9 00f9 1453 LATIN SMALL LETTER U WITH GRAVE fa 00fa 1453 LATIN SMALL LETTER U WITH ACUTE fb 00fb 1453 LATIN SMALL LETTER U WITH CIRCUMFLEX 56 0056 147B LATIN CAPITAL LETTER V 76 0076 147B LATIN SMALL LETTER V 57 0057 148D LATIN CAPITAL LETTER W 77 0077 148D LATIN SMALL LETTER W 58 0058 1497 LATIN CAPITAL LETTER X 78 0078 1497 LATIN SMALL LETTER X 59 0059 149C LATIN CAPITAL LETTER Y 79 0079 149C LATIN SMALL LETTER Y dd 00dd 149C LATIN CAPITAL LETTER Y WITH ACUTE fd 00fd 149C LATIN SMALL LETTER Y WITH ACUTE 9f 0178 149C LATIN CAPITAL LETTER Y WITH DIAERESIS ff 00ff 149C LATIN SMALL LETTER Y WITH DIAERESIS dc 00dc 149C LATIN CAPITAL LETTER U WITH DIAERESIS [tailoring] fc 00fc 149C LATIN SMALL LETTER U WITH DIAERESIS [tailoring] 5a 005A 14AD LATIN CAPITAL LETTER Z 7a 007a 14AD LATIN SMALL LETTER Z 8e 017d 14AD LATIN CAPITAL LETTER Z WITH CARON 9e 017e 14AD LATIN SMALL LETTER Z WITH CARON c5 00c5 14AD+1 LATIN CAPITAL LETTER A WITH RING ABOVE [tailoring] e5 00e5 14AD+1 LATIN SMALL LETTER A WITH RING ABOVE [tailoring] c6 00c6 ! 14AD+2 LATIN CAPITAL LETTER AE [tailoring] c4 00c4 14AD+2 LATIN CAPITAL LETTER A WITH DIAERESIS [tailoring] e4 00e4 14AD+2 LATIN SMALL LETTER A WITH DIAERESIS [tailoring] e6 00e6 14AD+2 LATIN SMALL LETTER AE [tailoring] d6 00d6 14AD+3 LATIN CAPITAL LETTER O WITH DIAERESIS [tailoring] f6 00f6 14AD+3 LATIN SMALL LETTER O WITH DIAERESIS [tailoring] 8c 0152 ! 14AD+3 LATIN CAPITAL LIGATURE OE [tailoring] 9c 0153 ! 14AD+3 LATIN SMALL LIGATURE OE [tailoring] d8 00d8 14AD+3 LATIN CAPITAL LETTER O WITH STROKE [tailoring] f8 00f8 14AD+3 LATIN SMALL LETTER O WITH STROKE [tailoring] d4 00d4 14AD+3 LATIN CAPITAL LETTER O WITH CIRCUMFLEX [tailoring] f4 00f4 14AD+3 LATIN SMALL LETTER O WITH CIRCUMFLEX [tailoring] b5 00B5 1557 MICRO SIGN References ---------- dev-private thread "Re: WL#2673 Unicode Collation Algorithm new version"
Copyright (c) 2000, 2024, Oracle Corporation and/or its affiliates. All rights reserved.