WL#1875: Case insensitive Czech collation
Affects: Server-6.1
—
Status: Assigned
We have only case sensitive Czech collations for cp1250 and latin2. We want to add case insensitive collations too. See also Feature Request BUG#3444. See also a contributed patch from Pavel Stehule implementing Czech case insensitive collation for cp1250: http://lists.mysql.com/internals/34318
Assume we progress with WL#5170 Swedish collation. Assume we do the same sort of thing for Czech. Then we'll follow UCA DUCET as described in WL#2673 "Unicode Collation Algorithm new version", and tailor according to CLDR. The CLDR (Unicode Common Locale Data Repository) http://unicode.org/repos/cldr/trunk/docs/web/repository_access.html has one Czech-rule file cs.xml with three sets of rules, "standard" and "digits-after" and "search". Tailoring at the primary level is: For collation type="standard": C BEFORE C WITH CARON H BEFORE CH R BEFORE R WITH CARON S BEFORE S WITH CARON Z BEFORE Z WITH CARON This corresponds to the Czech Wikipedia page "Abecední řazení": A B C Č D E F G H Ch I J K L M N O P Q R Ř S Š T U V W X Y Z Ž http://cs.wikipedia.org/wiki/Abecedn%C3%AD_%C5%99azen%C3%AD MySQL's current utf8_czech_ci collation http://www.collation-charts.org/mysql60/mysql604.utf8_czech_ci.html already is "standard". For collation type="digits-after": same as collation type="standard", except that the digits 0123456789 come after letters. For collation type="search": (looking only at Czech-specific rules) A BEFORE A WITH ACUTE C BEFORE C WITH CARON D BEFORE D WITH CARON E BEFORE E WITH ACUTE E WITH ACUTE BEFORE E WITH CARON H BEFORE CH I BEFORE I WITH ACUTE N BEFORE N WITH CARON O BEFORE O WITH ACUTE R BEFORE R WITH CARON S BEFORE S WITH CARON T BEFORE T WITH CARON U BEFORE U WITH ACUTE U WITH ACUTE BEFORE U WITH RING ABOVE Y BEFORE Y WITH ACUTE Z BEFORE Z WITH CARON Two bug reports (BUG#32404, BUG#61615) are asking for sensitivity with vowel accents, which only "search" delivers. Rules can be applicable to Unicode character sets. Suggested name = utf8_czech_600_ci etc. The expectation is that 8-bit character sets matter less. It's unclear whether Czech with standard rules is one of the collations that we call "tricky". References ---------- Follow the same sort of ideas as seen in WL#5170 Swedish collation. BUG#3444 Case sensitivity in czech comparisons BUG#8644 make the cp1250_czech_cs act like latin2_czech_cs BUG#32404 Cannot obtain accent sensitive czech collation BUG#34371 Czech collation ("not a bug") BUG#61615 Mysql evaluates chars with a comma and chars without a comma as the same ("duplicate") Email thread "utf8_czech_ci" [ mysql intranet archive ]/secure/mailarchive/mail.php?folder=4&mail=13244
Copyright (c) 2000, 2024, Oracle Corporation and/or its affiliates. All rights reserved.