The world's most popular open source database

Documentation Downloads MySQL.com

Developer Zone

Section Menu:

About Worklog
MySQL Worklogs are design specifications for changes that may define past work, or be considered for future development.

WL#2598: Make field->max_length work better with unicode

Affects: Benchmarks-3.0 — Status: Un-Assigned

Description

In the MySQL C API, you can access field->max_length to get the maximum length for 
a particular column in a data set. 

The problem is that with unicode strings there are really three kinds of length:

1) the length in bytes
2) the length in characters 
3) the display width

(3) is different than (2) because some characters are zero-width and some 
characters can be double-width. max_length is often used in existing code for 
purpose (3).  e.g. the mysql command line client accesses field->max_length to 
calculate how wide columns should be when formatting the pretty tables in the 
output.

If we ever want the mysql command-line client to work well with unicode strings, 
then we have to either 

(A) rearchitect it (mysql_store_result() vs. mysql_use_result()) and do some 
inefficient looping over the result set 
 
or

(B) somehow provide support in the client protocol and in the C API to get the 
"display width" analogue of max_length 

I am not sure whether A or B is the better solution but here I am proposing B.

[notes from mark]

The JDBC driver already handles this in one way, in that when someone asks for
the display length, it issues a 'SHOW CHARACTER SET' for the charset of the
field in question, and then caches the value connection-wide so that it can
calcuate length-in-chars. This works okay for everything but utf-8, which can be
varying length. I think one would _have_ to scan the result set to get
length-in-chars for utf-8.

This works okay for the JDBC driver, as the display-length in-chars method isn't
called very often.