WL#5510: Functions to_base64 and from_base64

Affects: Server-5.6   —   Status: Complete   —   Priority: Very High

Expose to_base64() and from_base64() as SQL functions for users.
In 2005 Lars Thalmann coded base64 functions as described in
WL#2835 "base64". Now let us expose those functions so that
users can say TO_BASE64(...) or FROM_BASE64(...) and get
results via the base64 calculations described in WL#2835.

Syntax  
------  
  
to_base64 (input)  
The input data type should be any string data type
(if input data type is not a string data type,
then automatic to-string conversion happens first).
The output data type is VARCHAR, or TEXT if it's long,
with connection character set and connection collation.
If the input is NULL, then the output is NULL.

from_base64 (input)  
The input data type should be any string data type
(if input data type is not a string data type,
then automatic to-string conversion happens first).
The output data type is BLOB.  
If the input is NULL, then the output is NULL.  
If the input is a badly formed base64 string, then the output is NULL.
(A badly formed base64 string is any string which
from_base64 is unable to interpret as a base64 string.)

Incompatibility (low risk)
--------------------------
In order to provide NULL result for badly formed from_base64() input,
we'll fix the low level function base64_decod() to reject bad input.
That means some BINLOG statements which previously returned:

mysql> BINLOG 'A=';
ERROR 1149 (42000): You have an error in your SQL syntax; check the manual that
corresponds to your MySQL server version for the right syntax to use

will now return another error:

mysql> BINLOG 'A=';
ERROR 1575 (HY000): Decoding of base64 string failed

This is because wrong input is now detected earlier
on base64 stage, rather than on binlog event execution.
This should not be a problem for replication purposes
because BINLOG is always executed with well formed base64 string,
produced with mysqlbinlog.


Encoding/decoding details
-------------------------
- Encoding for alphabet value 62 is '+'
- Encoding for alphabet value 63 is '/'
- Pad character is '='
- to_base64 delimits every 76 characters with NL
- from_base64 recognizes NL, as well as these delimiters: TAB, SPACE, CR
- from_base64 does not require delimiters after 76 characters.
  It will work with delimited input, with non-delimited input.
  In other words, from_base64 allows and ignores all above delimiters
  anywhere in input.
- from_base64 returns NULL if any invalid character is met.
  Valid characters include: alphabet, delimiters, padding.
  All other characters are invalid.
- from_base64 returns NULL if padding is missing,
  or if there are more non-delimiter characters after padding.


New Reserved Words
------------------

None.


Testing
-------
Tests must include "RFC4648 Test Vectors":

BASE64("") = ""
BASE64("f") = "Zg=="
BASE64("fo") = "Zm8="
BASE64("foo") = "Zm9v"
BASE64("foob") = "Zm9vYg=="
BASE64("fooba") = "Zm9vYmE="
BASE64("foobar") = "Zm9vYmFy"


References
----------

Feature request:
BUG#39328 provide base64 decoding function for blob data 

Background information is in the seven emails in dev-private thread
"[Fwd: RE: Adding native base64 functions to MySQL server?]"
starting with
https://intranet.mysql.com/secure/mailarchive/mail.php?folder=4&mail=32758

There is a user contribution. Look at the thread starting with
http://lists.mysql.com/internals/37668
but there's no statement that Jan Steeman submitted a final patch.
We should accept it as a user contribution anyway.