WL#2637: Byte Order Mark for LOAD DATA INFILE and mysqlimport
Affects: Server-7.0
—
Status: Assigned
Sometimes Unicode files begin with a Byte Order Mark (BOM).
We want to ignore the BOM when reading files using LOAD
DATA INFILE or mysqlimport.
We could support a new clause "IGNORE n BYTES"
(analogous to "IGNORE n LINES") but that would
not work well: sometimes there might be a BOM,
sometimes not, and we only want to ignore the
bytes if they are there.
For a new LOAD DATA INFILE clause, we'll use the
Oracle10g SQL Loader syntax "BYTEORDERMARK CHECK | NOCHECK",
It's described in the Oracle10g Utilities Manual:
http://dbis.informatik.uni-freiburg.de/oracle-docs/doc1001/server.101/b10825/ldr_field_list.htm#i1011032
(Search for the word "BYTEORDERMARK".)
We won't worry about Oracle's clause order, though,
so put "BYTEORDER" clause after "INTO TABLE tbl_name".
The default is "BYTEORDERMARK NOCHECK".
Thus, if there is some MySQL-supported character set
where the BOM characters are meaningful, no problem.
(This is not how Oracle handles the default.)
For mysqlimport, the flag is --byteordermark=check
or --byteordermark=nocheck (the default)
SELECT ... INTO OUTFILE does not generate BOMs.
"BYTEORDERMARK CHECK" doesn't really mean that we
check. All it means is: we skip the first 2 or 3 bytes of
the file if they are equal to any of the following:
0xEFBBBF UTF8
0xFEFF UCS2 bigendian
0xFFFE UCS2 littleendian
Even if BYTEORDERMARK CHECK is on, the absence of a BOM
will not be an error. Just treat the first bytes as data.
See also: WL#993 "LOAD DATA INFILE and character sets".
Feature requests:
BUG#4960 Mysql cmdline client fails on byte order mark
See also:
http://bugs.mysql.com/bug.php?id=10573
http://bugs.mysql.com/bug.php?id=29323
Copyright (c) 2000, 2025, Oracle Corporation and/or its affiliates. All rights reserved.