WL#8920: Improve usability of UUID manipulations
Affects: Server-8.0
—
Status: Complete
The goal of this WL is to improve the experience of users working with UUID while not implementing a new data type. * Some additional insight into UUIDs the standard format can be found here: https://www.ietf.org/rfc/rfc4122.txt Problem ======= 1) The column type binary(16) is extensively used as a compact storage for UUID's. But in order to transform the text (human-readable) format into binary format, users must use some text manipulation functions like unhex() and replace(). Example: unhex(replace('aab5d5fd-70c1-11e5-a4fb-b026b977eb28', '-', '')) 2) in order to do the reverse transformation, from the binary(16) to human-readable format, users must use even more text manipulation functions. Example: insert( insert( insert( insert(hex(uuid_bin),9,0,'-'), 14,0,'-'), 19,0,'-'), 24,0,'-') 3) A valid UUID text must have a certain length and be made up of only certain characters. In MySQL there is no simple way to validate text UUIDs. 4) UUIDs version 1 store the time-low at the beginning of the string making the index inserts very slow (since the variations between two consecutive generated UUIDs are very high at the beginning of the string). The way this problem will be solved is by providing the user with 3 new SQL functions: - two functions will take care of the conversion to/from binary/text format, with an optional twist to help indexing, - one function will validate the text format.
F-1.1: UUID_TO_BIN(arg1) should return valid VARBINARY(16) data for any valid UUID-string in. All specified UUID-formats should be supported: - UUID without dashes, ex 12345678123456781234567812345678 - UUID with braces, ex {12345678-1234-5678-1234-567812345678} - standard UUID, ex 12345678-1234-5678-1234-567812345678, uppercase/lowercase letters should be accepted. For all those examples, the result should be a string starting with a byte of hex ASCII code 0x12, then another byte of code 0x34, etc. The validation will be simple, the version number bit will not be check, neither the point in time of the timestamp will not be checked. The binary data stored should be in the same order as the text. F-1.2: UUID_TO_BIN(arg1,arg2) should behave as in F-1.1 if is FALSE. If instead it is true, the binary data stored should be shuffled so that the time- low (first group) and time-high parts (third group) are swapped. The rest of the bytes will remain in the same order as the text version. This can give better indexing as the rapidly-varying part is moved to the right. The function assumes that the UUID is v1 without complaining if it's not. F-1.3: if the function UUID_TO_BIN is called with invalid UUIDs, an error will be thrown. F-1.4: if the function UUID_TO_BIN is called with NULL argument for the UUID string, the function should return NULL without any warning/error. F-2.1 BIN_TO_UUID should return a valid UUID string for the VARBINARY(16) UUID provided as argument. It should be in the standard format, with downcase letters. The validation will be simple, it will check that the binary data has the size 16: the version number bit will not be tested, nor the timestamp. The UUID string result should have the same byte-order. This is the inverse of UUID_TO_BIN. F-2.2 BIN_TO_UUID(arg1,arg2) should behave as in F-2.1 if is FALSE. If instead it is true, the function should return a valid UUID string having the time-low and time-high parts swapped back to their original position.The function assumes that the UUID is v1 without complaining if it's not. F-2.3: if the function BIN_TO_UUID is called with invalid arguments, an error will be thrown. F-2.3: if the function BIN_TO_UUID is called with NULL argument, the NULL value will be returned without any warnings/errors. F-3.1: IS_UUID should return TRUE for valid UUIDs. The function will only validate that the string is correctly formatted: correct size, if it uses the format with dashes- all dashes are in the right place and all the groups have the right amount of characters per group, if it contains curly brackets- both exists and are in the right place. No other validation is done. F-3.2: IS_UUID should return FALSE if the argument is not a valid UUID, and NULL if the argument is NULL. No warnings/errors will be thrown.
binary_log::Uuid will be changed to parse more UUID formats. Previously the only allowed formats were XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX now: XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX or XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX or {XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX} can be validated and parsed. This required some changes on the interface: - static bool is_valid(const char *string); became static bool is_valid(const char *string, size_t len); - int parse(const char *string); became int parse(const char *string, size_t len); - and a new function was added: static int parse(const char* string, size_t len, char *out_string); Three new classes were added to the Item tree: - class Item_func_uuid_to_bin : public Item_str_func - class Item_func_bin_to_uuid : public Item_str_ascii_func - class Item_func_is_uuid : public Item_bool_func Also classes used to create the functions were added - class Create_func_uuid_to_bin : public Create_native_func (function has variable number of arguments) - class Create_func_bin_to_uuid : public Create_native_func (since function has variable number of arguments) - class Create_func_is_uuid : public Create_func_arg1
Copyright (c) 2000, 2024, Oracle Corporation and/or its affiliates. All rights reserved.