WL#2637: Byte Order Mark for LOAD DATA INFILE and mysqlimport
Affects: Server-7.0 — Status: Assigned — Priority: Medium
Sometimes Unicode files begin with a Byte Order Mark (BOM). We want to ignore the BOM when reading files using LOAD DATA INFILE or mysqlimport. We could support a new clause "IGNORE n BYTES" (analogous to "IGNORE n LINES") but that would not work well: sometimes there might be a BOM, sometimes not, and we only want to ignore the bytes if they are there. For a new LOAD DATA INFILE clause, we'll use the Oracle10g SQL Loader syntax "BYTEORDERMARK CHECK | NOCHECK", It's described in the Oracle10g Utilities Manual: http://dbis.informatik.uni-freiburg.de/oracle-docs/doc1001/server.101/b10825/ldr_field_list.htm#i1011032 (Search for the word "BYTEORDERMARK".) We won't worry about Oracle's clause order, though, so put "BYTEORDER" clause after "INTO TABLE tbl_name". The default is "BYTEORDERMARK NOCHECK". Thus, if there is some MySQL-supported character set where the BOM characters are meaningful, no problem. (This is not how Oracle handles the default.) For mysqlimport, the flag is --byteordermark=check or --byteordermark=nocheck (the default) SELECT ... INTO OUTFILE does not generate BOMs. "BYTEORDERMARK CHECK" doesn't really mean that we check. All it means is: we skip the first 2 or 3 bytes of the file if they are equal to any of the following: 0xEFBBBF UTF8 0xFEFF UCS2 bigendian 0xFFFE UCS2 littleendian Even if BYTEORDERMARK CHECK is on, the absence of a BOM will not be an error. Just treat the first bytes as data. See also: WL#993 "LOAD DATA INFILE and character sets". Feature requests: BUG#4960 Mysql cmdline client fails on byte order mark See also: http://bugs.mysql.com/bug.php?id=10573 http://bugs.mysql.com/bug.php?id=29323
Copyright (c) 2000, 2015, Oracle Corporation and/or its affiliates. All rights reserved.