WL#2598: Make field->max_length work better with unicode
Affects: Benchmarks-3.0
—
Status: Un-Assigned
In the MySQL C API, you can access field->max_length to get the maximum length for
a particular column in a data set.
The problem is that with unicode strings there are really three kinds of length:
1) the length in bytes
2) the length in characters
3) the display width
(3) is different than (2) because some characters are zero-width and some
characters can be double-width. max_length is often used in existing code for
purpose (3). e.g. the mysql command line client accesses field->max_length to
calculate how wide columns should be when formatting the pretty tables in the
output.
If we ever want the mysql command-line client to work well with unicode strings,
then we have to either
(A) rearchitect it (mysql_store_result() vs. mysql_use_result()) and do some
inefficient looping over the result set
or
(B) somehow provide support in the client protocol and in the C API to get the
"display width" analogue of max_length
I am not sure whether A or B is the better solution but here I am proposing B.
[notes from mark]
The JDBC driver already handles this in one way, in that when someone asks for
the display length, it issues a 'SHOW CHARACTER SET' for the charset of the
field in question, and then caches the value connection-wide so that it can
calcuate length-in-chars. This works okay for everything but utf-8, which can be
varying length. I think one would _have_ to scan the result set to get
length-in-chars for utf-8.
This works okay for the JDBC driver, as the display-length in-chars method isn't
called very often.
Copyright (c) 2000, 2025, Oracle Corporation and/or its affiliates. All rights reserved.