WL#7387: Unreliable Failure Detector support in Connector/Python

Affects: Connector/Python-1.2   —   Status: Complete

GOAL
====
The goal is to make the connector python report errors to Fabric while accessing a 
MySQL Instance. The reported data will then be used to update the backing store 
and trigger a failover operation provided the faulty MySQL Instance is a primary 
and Fabric has gotten enough complaints from different connectors.

REMARKS
========
. See WL#7455 for information on how we are planning to handle security issues.

User Documentation
==================

http://dev.mysql.com/doc/relnotes/connector-python/en/news-1-2-1.html
http://dev.mysql.com/doc/mysql-utilities/1.4/en/connector-python-fabric-
connect.html
Requirements
============

1) It shall be possible to configure whether a connection will report errors back 
to Fabric or not.

2) It shall be possible to dynamically extend the set of errors that will trigger 
a notification to Fabric.

3) By default the set in item 2 must contain the following errors: CR_SERVER_LOST, 
CR_SERVE_GONE_ERROR, etc.

4) There will be a distinction between errors that are reported back to Fabric and 
errors that invalidate the connector's cache.

5) If the report function fails, an error is reported but no exception shall be raised.
Avoid thundering herds
======================
The ability to report errors back to Fabric must be used wisely, otherwise Fabric 
may suffer the thundering herd effect. If all connections attempt to report an 
error after a server failure, Fabric will swamped with several request around at 
the same time. To avoid this problem, we advise users to define key connection(s) 
in the application to report errors or devise a routine to periodically check the 
servers and report errors. 

This routine would work as distributed failure detector and might be spawned in a 
different thread within the application's context or as separate process.


Handling Errors
===============
Any error has the Error class as its base class. In the context of this work, 
there are two important errors that deserve attention:

InterfaceError - This exception is raised whenever the connector is not able to 
establish a connection to a server. For example, this may be raised because Fabric 
is not accessible and there is no valid cache entry.

MySQLFabricError - This exception is raised whenever there is an error while 
processing a request (i.e. statement) and the error triggers a cache invalidation. 
The connector catches the original exception, invalidates the cache and raises the  
MySQLFabricError.

This makes it easy to develop fault tolerant applications as the developer knows 
that after getting such error an issue was reported back to Fabric, the cache was 
invalidate and the faulty server might have been replaced or at least tagged as 
faulty.


Security issues
===============
Security issues are handled as described in WL#7455.
User Interface
==============

Making a connection report errors
---------------------------------
The option to report errors is part of the Fabric configuration and can be set 
as follows:

    fabric_config = {
        'host': ..,
        'report_errors': True,
    }

    cnx = mysql.connector.connect(fabric=fabric_config)


Defining which errors to report
-------------------------------
Errors which may be reported are stored be dynamically update as follows:

    from mysql.connector.fabric import extra_failure_report
    extra_failure_report([error_code_0, error_code_1, ...])

Defining which errors trigger a cache invalidation
--------------------------------------------------
There is no function to change the set of errors that trigger a cache 
invalidation. However, the RESET_CACHE_ON_ERROR global variable which store such 
information can be updated as follows:

    from mysql.connector.fabric import RESET_CACHE_ON_ERROR
    RESET_CACHE_ON_ERROR.append(error_code_0)


Inside the Connector Python
===========================

Defining which errors to report
-------------------------------
Two global variables are used to store the set of errors that are reported back 
to Fabric: REPORT_ERRORS and REPORT_ERRORS_EXTRA. The extract_failure_report, 
previously described, is implemented as follows:
    def extra_failure_report(error_codes):
        global REPORT_ERRORS_EXTRA

        if not error_codes:
            REPORT_ERRORS_EXTRA = []
            return

        if not isinstance(error_codes, (list, tuple)):
            error_codes = [error_codes]

        for code in error_codes:
            if not isinstance(code, int) or not (code >= 1000 and code < 3000):
                raise AttributeError("Unknown or invalid error code.")
            REPORT_ERRORS_EXTRA.append(code)

The REPORT_ERRORS though have a pre-defined set of errors and cannot be changed:

    REPORT_ERRORS = (
         errorcode.CR_SERVER_LOST,
         errorcode.CR_SERVER_GONE_ERROR,
         errorcode.CR_CONN_HOST_ERROR,
         errorcode.CR_CONNECTION_ERROR,
         errorcode.CR_IPSOCK_ERROR,
         errorcode.ER_OPTION_PREVENTS_STATEMENT,
    )

Handling errors
---------------
The following function handles the error and the cache invalidation:

    class MySQLFabricConnection(object):

        ...

        def handle_mysql_error(self, exc):
            if exc.errno in RESET_CACHE_ON_ERROR:
                self.disconnect()
                self._fabric.report_error(mysqlserver.uuid, exc.errno)
                self.reset_cache()
                raise MySQLFabricError(
                    "Temporary error ({error}); "
                    "retry transaction".format(error=str(exc)))
            self._fabric.report_error(mysqlserver.uuid, exc.errno)
            raise exc

It is called whenever there is an error while processing a statement. However, 
errors while trying to get a connection are handled as follows:

    class MySQLFabricConnection(object):

        ...

        while True:
            counter++

            ...

            dbconfig['host'] = mysqlserver.host
            dbconfig['port'] = mysqlserver.port
            try:
                self._mysql_cnx = mysql.connector.connect(**dbconfig)
            except Error as exc:
                if counter == attempts:
                    self._fabric.report_error(mysqlserver.uuid, exc.errno)
                    self.reset_cache(mysqlserver.group)
                    raise InterfaceError(
                        "Reported faulty server to Fabric ({0})".format(exc))
                if attempt_delay > 0:
                    time.sleep(attempt_delay)
                continue
            else:
                self._fabric_mysql_server = mysqlserver
                break

Report error function
---------------------
The report error function is implemented as follows:

    class Fabric(object):

        ...

        def report_error(self, server_uuid, errno):
            if not self._report_errors:
                return

            errno = int(errno)
            current_host = socket.getfqdn()

            if errno in REPORT_ERRORS or errno in REPORT_ERRORS_EXTRA:
                inst = self.get_instance()
                try:
                    inst.proxy.threat.report_error(server_uuid, current_host,
                                                   errno)
                except Fault, socket.error:
                    pass