WL#5825: Using C++ Standard Library with MySQL code

Affects: Server-Prototype Only   —   Status: Complete

The goal of this worklog is to allow the use of the C++ Standard Library
inside the code and to enable exceptions and RTTI for the MyQL code
base. The goal is *not* to start using the the standard C++ library
throughout the code base, just to ensure that it is possible.

Motivation
~~~~~~~~~~

There are a number of advantages to using the standard C++ library.
Chiefly, it is already written code that has been tested and tuned over
several years, which in various cases provide better performance and
maintainability than the "homegrown" alternatives. The STL in particular
provides a wide range of well-documented and standardized containers and
algorithms that can be applied interchangeably in many scenarios.

In particular, it can be immediately applied in the following ways:

- Gracefully handle out of memory conditions with std::no_throw.
- A associative container (map or similar) which is needed for WL#3584.
- Potential gain in performance by using std::sort instead of my_qsort.
- Improve maintainability by using std::vector instead of Dynamic_array.
- Remove the non-working overloading of new and delete operators.
- Demangled stack backtrace on crashes.

When-Which-How 
~~~~~~~~~~~~~~~~

When, which and how parts of the C++ Standard Library are to be used
will be regulated by the Coding Standard Committee.

http://forge.mysql.com/wiki/MySQL_Internals_Coding_Guidelines#How_we_maintain_th
e_server_coding_guidelines

Using a C++ Standard Library function or class 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

As an experiment, I replaced the DYNAMIC_ARRAY instance saved_table_locks
in sql_test.cc with an std::vector instance. The goal of this test was
to take an isolated use and see if a normal use of DYNAMIC_VECTOR could
be replaced with a C++ Standard Library container and what effects it
would have on the build.

Note that the goal is *not* to evaluate the performance of the C++
Standard Containers.

It proved to not be controversial to use this container instead of
DYNAMIC_ARRAY, but there is one construction that is widely used in
the server which caused a problem: the min and max macros.

These are macros defined in my_global.h and clash with the
definition of std::min and std::max in .  To handle
this, it is necessary to remove the macros from all C++ code.

It is not possible to just replace the uses of min and max with
std::min and std::max because these macros are used in three different
ways:

1. The macros rely on the standard conversions.
2. They are used in C code.
3. They are used in constant expressions, where function calls are not
   allowed.

Note that not all cases of using min and max in the current code base
is correct since there are comparisons between unsigned and signed
integral values. When using standard conversions, negative signed
values will be converted to an unsigned value in an
implementation-defined manner, which potentially can have unexpected
side-effects.  As an example, consider "max(some_ulong, some_int)",
and suppose that "some_int" happens to be negative. In this case, it
will (probably) be converted using two-complements arithmetic to a
very large number (since the other type is unsigned), which may lead
to strange results.

To handle the conflict with the macros and the standard functions,
there are two ways:

- Replace all instances of min and max with std::min and std::max.

  This has the advantage of being best way to switch to the standard
  library, but it requires a search-and-replace patch, which can have
  potential conflicts with existing code (just takes time to resolve
  the conflicts, nothing that is likely to introduce problems).

  It would require the type to be explicitly stated, for example:
  "std::max(int_value, ulong_value)".

- Write our own version of min and max that support the correct usage.

  This approach would allow us to not change any of the existing code
  (except to handle the last case below), but does require us to
  maintain our own version of min and max. See the example code I used
  below.

To handle the use in constant expressions, I think the best path is to
introduce macros MY_MIN and MY_MAX (or maybe just MIN and MAX) and use
those. The alternative is to expand the expressions in-place.

The introduction of MIN and MAX can also be used to provide the min
and max functions in the C code, with the alternative of introducing
inline functions min and max.

Code for creating a min/max that honors standard conversions. 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

template  struct MaxType;                                        
template  struct MaxType {
  typedef T Type;
};
template  struct MaxType {                                   
  typedef typename MaxType::Type Type;                                      
};                                                                             
#define MAX_TYPE(A, B, C) template <> struct MaxType {typedef C Type;}
MAX_TYPE( double,int,double);                                                
MAX_TYPE( double, unsigned int, double);                                       
MAX_TYPE( int, unsigned char, int);                                            
MAX_TYPE( long long, int, long long);                                          
MAX_TYPE( long, int, long);                                                    
MAX_TYPE( unsigned int,  unsigned short, unsigned int);                        
MAX_TYPE( unsigned int, char, unsigned int);                                   
MAX_TYPE( unsigned int, int, unsigned int);                                    
MAX_TYPE( unsigned int, short, unsigned int);                                  
MAX_TYPE( unsigned int, unsigned char, unsigned int);                          
MAX_TYPE( unsigned long long, int, unsigned long long);                        
MAX_TYPE( unsigned long long, unsigned char, unsigned long long);              
MAX_TYPE( unsigned long long, unsigned int, unsigned long long);               
MAX_TYPE( unsigned long long, unsigned long, unsigned long long);              
MAX_TYPE( unsigned long, int, unsigned long);                                  
MAX_TYPE( unsigned long, unsigned char, unsigned long);                        
MAX_TYPE( unsigned long, unsigned int, unsigned long);                         
#undef MAX_TYPE                                                                
template                                                     
typename MaxType::Type min(A a, B b) {                                    
  typedef typename MaxType::Type ReturnType;                                
  return ReturnType(a) > ReturnType(b) ? b : a;                                  
}                                                                              
template                                                     
typename MaxType::Type max(A a, B b) {                                    
  typedef typename MaxType::Type ReturnType;                                
  return ReturnType(a) > ReturnType(b) ? a : b;                                  
}                                                                              
Linking with the C++ standard libraries
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Making use of the standard C++ library inside the server code requires
linking the server with the library implementations of each supported
platform. When linking dynamically, care must be taken so that the final
binary is compatible with the most widely available C++ standard library
binary of the platform.

GCC (GNU Compiler Collection) (Linux/FreeBSD/Mac OS X)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Given that GCC is the main compiler platform for MySQL on Linux, FreeBSD
and Mac OS X, it's only natural to make use of GCC's implementation of the
standard C++ library. The GNU Standard C++ Library (hereby abbreviated
as libstdc++) is also the most common and available implementation of
the standard C++ library on the aforementioned operating systems.

One major issue associated with the use of libstdc++ is making binaries
that will work properly across the supported Linux distributions. Linking
statically is not a option due to the restrictions it imposes, such as
license related concerns and not being able to load dynamic libraries
(e.g. plugins) linked with libstdc++. Linking dynamically poses the
problem of binary compatibility with regard to varying libstdc++ versions
across Linux distributions.

Historically, the libstdc++ ABI used to change quite a bit, making it
incompatible with previous versions. But since gcc-3.4.0 (libstdc++
version 6.0.x), the ABI has somewhat stabilized¹ and now guarantees
forward compatibility, but not backwards compatibility. The default ABI
version (-fabi-version=2) introduced in gcc-3.4.x is forward compatible
up to gcc-4.[0-5].x, but incompatible with previous versions.

Consequently, and in order to provide portable binaries, MySQL should be
linked dynamically with libstdc++ version 6.x (-fabi-version=2) in order
to maximize compatibility across the supported Linux distributions. In
addition, the use of flags that may change the ABI as a side-effect (as
stated in the ABI Policy and Guidelines document), such as -fno-exceptions,
should be avoided.

One byproduct of linking libstdc++ dynamically is that it also
causes libgcc_s (the GCC low-level runtime library) to be linked
dynamically. According to the documentation, GCC generates calls to
routines in this library automatically whenever it needs to perform
some operation that is too complicated to emit inline code for. Since
the release versioning of libgcc_s follows closely that of libstdc++,
this extra library shouldn't pose any problem.

¹ http://gcc.gnu.org/onlinedocs/libstdc++/manual/abi.html

Microsoft Visual C++ (Windows)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

If one of the C++ Standard Library headers (e.g. ) is included in
the server code, the Standard C++ Library will be linked in automatically
by Visual C++ at compile time. The library will be linked statically as
the build system forces static² runtime libraries via the /MT and /MTd
options. The linked libraries are LIBCPMT.LIB (Multithreaded, static
link, /MT option) or LIBCPMTD.LIB (Multithreaded, static link, debug,
/MTd option). These libraries are already linked with the server given
that the header , which is part of the standard library, is used
throughout the code base (through my_global.h).

² Static linking is used to avoid having to ship DLLs and due to the
license restrictions on redistributing the debug versions of the runtime
libraries.

Oracle Solaris Studio (Oracle Solaris)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The server is already dynamically linked with the C++ standard library
(libCstd). Certain restrictions may apply. See ³ for details.

³ http://developers.sun.com/solaris/articles/cmp_stlport_libCstd.html

The default C++ library for the SunPro compiler is really old, and not
standards compliant. 
See http://developers.sun.com/solaris/articles/cmp_stlport_libCstd.html

So we need to use libstlport rather than libCstd (which is based on an old
library from Rogue Wave).

libstlport is not installed on Solars by default, but it is re-distributable.
So during packaging of MySQL binaries, we put libstlport.so in the
/lib directory together with with MySQL libraries. 
All C++ executables must be linked in such a way that they can find libstlport
at runtime. See

http://www.oracle.com/technetwork/articles/servers-storage-dev/redistrib-libs-344133.html
http://developers.sun.com/sunstudio/documentation/techart/stdlibdistr.html
http://developers.sun.com/solaris/articles/cmp_stlport_libCstd.html
http://developers.sun.com/sunstudio/documentation/ss12/mr/READMEs/runtime.libraries.html


Current status
~~~~~~~~~~~~~~

In summary, the server already links with the Standard C++ Library in
all cases, except when the compiler used is GCC.

Exceptions, RTTI, and other general features of C++
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

C++ compilers have improved a lot in recent years, especially their
handling of advanced C++ features such as exceptions and run-time type
information. Although once associated with significant run-time overhead,
nowadays these features have a negligible (if any at all) performance
impact if the specific statements (e.g. try, catch, trow, etc.) are not
used. For example, exception handling tends to be optimized for the case
where exceptions are not thrown simply because it's the more common use.

Since the finer details of C++ usage in MySQL are being revisited,
and in light of the implications of disabling certain C++ features (see
remarks above with respect to exceptions), it makes sense to start with
a clean slate by using the default behavior provided by the compilers
and/or established for the C++ language, unless they pose a negative
impact on performance. This means not explicitly turning off exceptions
and other C++ features.

Later down the road this also allows these features to be used in a
isolated manner (e.g. inside plugins) and makes it simpler to use or
incorporate external packages/modules that make use of these features.

Experimental evaluation
~~~~~~~~~~~~~~~~~~~~~~~

A separate test of removing the -fno-exceptions and the -fno-rtti flags
shows that there is no significant difference in execution time between
having and not having these flags.

Benchmark is Sysbench, oltp_complex_ro.

Run on ndbamd-6, a box with 12 2.8 GHz Opteron cores and 32GB of RAM.

  Threads   vanilla   stdc++   % Change  
 ---------+---------+--------+----------
       16      4573     4609       0.79  
       32      4547     4561       0.31  
       64      4519     4548       0.64  
      128      4493     4501       0.18  
      256      4455     4456       0.02  


Server linked with g++ (libstdc++) and compiled without the flags
-fno-implicit-templates, -fno-exceptions and -fno-rtti.
- Tweak build system to link the server using g++.

  Remove associated hacks (e.g. gunit's CMakeLists, etc).
  Use and enforce the required C++ ABI (version 2).

- Rename min/max macros to MY_MIN/MY_MAX.

- Remove the MySQL specific new/delete operators.

  Use std::nothrow where applicable.  

- Do not disable specific C++ features.

  Remove flags -fno-exceptions, -fno-rtti, etc.

- Set terminate and unexpected handler functions if necessary.

- Showcase usage with a new example UDF.

  Add a UDF to udf_example that makes use of containers
  and algorithms of the C++ Standard Library. The intent
  is to ensure that the server links fine with the C++
  library and that it is able to load plugins (UDF) that
  make use of it too.

- PushBuild2 and Release integration.

  PB2 must no longer set CXX to gcc.

  Generic Linux packages should be built with gcc-3.4 [*],
  or packages should be produced for each major distribution.

  * A GCC version (with respective libstdc++ version) that
    is most widely available across Linux distributions.
    There might be some performance implications in using
    a older GCC version.