WL#5825: Using C++ Standard Library with MySQL code
Affects: Server-Prototype Only
—
Status: Complete
The goal of this worklog is to allow the use of the C++ Standard Library inside the code and to enable exceptions and RTTI for the MyQL code base. The goal is *not* to start using the the standard C++ library throughout the code base, just to ensure that it is possible. Motivation ~~~~~~~~~~ There are a number of advantages to using the standard C++ library. Chiefly, it is already written code that has been tested and tuned over several years, which in various cases provide better performance and maintainability than the "homegrown" alternatives. The STL in particular provides a wide range of well-documented and standardized containers and algorithms that can be applied interchangeably in many scenarios. In particular, it can be immediately applied in the following ways: - Gracefully handle out of memory conditions with std::no_throw. - A associative container (map or similar) which is needed for WL#3584. - Potential gain in performance by using std::sort instead of my_qsort. - Improve maintainability by using std::vector instead of Dynamic_array. - Remove the non-working overloading of new and delete operators. - Demangled stack backtrace on crashes. When-Which-How ~~~~~~~~~~~~~~~~ When, which and how parts of the C++ Standard Library are to be used will be regulated by the Coding Standard Committee. http://forge.mysql.com/wiki/MySQL_Internals_Coding_Guidelines#How_we_maintain_th e_server_coding_guidelines Using a C++ Standard Library function or class ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ As an experiment, I replaced the DYNAMIC_ARRAY instance saved_table_locks in sql_test.cc with an std::vector instance. The goal of this test was to take an isolated use and see if a normal use of DYNAMIC_VECTOR could be replaced with a C++ Standard Library container and what effects it would have on the build. Note that the goal is *not* to evaluate the performance of the C++ Standard Containers. It proved to not be controversial to use this container instead of DYNAMIC_ARRAY, but there is one construction that is widely used in the server which caused a problem: the min and max macros. These are macros defined in my_global.h and clash with the definition of std::min and std::max in. To handle this, it is necessary to remove the macros from all C++ code. It is not possible to just replace the uses of min and max with std::min and std::max because these macros are used in three different ways: 1. The macros rely on the standard conversions. 2. They are used in C code. 3. They are used in constant expressions, where function calls are not allowed. Note that not all cases of using min and max in the current code base is correct since there are comparisons between unsigned and signed integral values. When using standard conversions, negative signed values will be converted to an unsigned value in an implementation-defined manner, which potentially can have unexpected side-effects. As an example, consider "max(some_ulong, some_int)", and suppose that "some_int" happens to be negative. In this case, it will (probably) be converted using two-complements arithmetic to a very large number (since the other type is unsigned), which may lead to strange results. To handle the conflict with the macros and the standard functions, there are two ways: - Replace all instances of min and max with std::min and std::max. This has the advantage of being best way to switch to the standard library, but it requires a search-and-replace patch, which can have potential conflicts with existing code (just takes time to resolve the conflicts, nothing that is likely to introduce problems). It would require the type to be explicitly stated, for example: "std::max (int_value, ulong_value)". - Write our own version of min and max that support the correct usage. This approach would allow us to not change any of the existing code (except to handle the last case below), but does require us to maintain our own version of min and max. See the example code I used below. To handle the use in constant expressions, I think the best path is to introduce macros MY_MIN and MY_MAX (or maybe just MIN and MAX) and use those. The alternative is to expand the expressions in-place. The introduction of MIN and MAX can also be used to provide the min and max functions in the C code, with the alternative of introducing inline functions min and max. Code for creating a min/max that honors standard conversions. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ template struct MaxType; template struct MaxType { typedef T Type; }; template struct MaxType { typedef typename MaxType::Type Type; }; #define MAX_TYPE(A, B, C) template <> struct MaxType {typedef C Type;} MAX_TYPE( double,int,double); MAX_TYPE( double, unsigned int, double); MAX_TYPE( int, unsigned char, int); MAX_TYPE( long long, int, long long); MAX_TYPE( long, int, long); MAX_TYPE( unsigned int, unsigned short, unsigned int); MAX_TYPE( unsigned int, char, unsigned int); MAX_TYPE( unsigned int, int, unsigned int); MAX_TYPE( unsigned int, short, unsigned int); MAX_TYPE( unsigned int, unsigned char, unsigned int); MAX_TYPE( unsigned long long, int, unsigned long long); MAX_TYPE( unsigned long long, unsigned char, unsigned long long); MAX_TYPE( unsigned long long, unsigned int, unsigned long long); MAX_TYPE( unsigned long long, unsigned long, unsigned long long); MAX_TYPE( unsigned long, int, unsigned long); MAX_TYPE( unsigned long, unsigned char, unsigned long); MAX_TYPE( unsigned long, unsigned int, unsigned long); #undef MAX_TYPE template typename MaxType::Type min(A a, B b) { typedef typename MaxType::Type ReturnType; return ReturnType(a) > ReturnType(b) ? b : a; } template typename MaxType::Type max(A a, B b) { typedef typename MaxType::Type ReturnType; return ReturnType(a) > ReturnType(b) ? a : b; }
Linking with the C++ standard libraries ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Making use of the standard C++ library inside the server code requires linking the server with the library implementations of each supported platform. When linking dynamically, care must be taken so that the final binary is compatible with the most widely available C++ standard library binary of the platform. GCC (GNU Compiler Collection) (Linux/FreeBSD/Mac OS X) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Given that GCC is the main compiler platform for MySQL on Linux, FreeBSD and Mac OS X, it's only natural to make use of GCC's implementation of the standard C++ library. The GNU Standard C++ Library (hereby abbreviated as libstdc++) is also the most common and available implementation of the standard C++ library on the aforementioned operating systems. One major issue associated with the use of libstdc++ is making binaries that will work properly across the supported Linux distributions. Linking statically is not a option due to the restrictions it imposes, such as license related concerns and not being able to load dynamic libraries (e.g. plugins) linked with libstdc++. Linking dynamically poses the problem of binary compatibility with regard to varying libstdc++ versions across Linux distributions. Historically, the libstdc++ ABI used to change quite a bit, making it incompatible with previous versions. But since gcc-3.4.0 (libstdc++ version 6.0.x), the ABI has somewhat stabilized¹ and now guarantees forward compatibility, but not backwards compatibility. The default ABI version (-fabi-version=2) introduced in gcc-3.4.x is forward compatible up to gcc-4.[0-5].x, but incompatible with previous versions. Consequently, and in order to provide portable binaries, MySQL should be linked dynamically with libstdc++ version 6.x (-fabi-version=2) in order to maximize compatibility across the supported Linux distributions. In addition, the use of flags that may change the ABI as a side-effect (as stated in the ABI Policy and Guidelines document), such as -fno-exceptions, should be avoided. One byproduct of linking libstdc++ dynamically is that it also causes libgcc_s (the GCC low-level runtime library) to be linked dynamically. According to the documentation, GCC generates calls to routines in this library automatically whenever it needs to perform some operation that is too complicated to emit inline code for. Since the release versioning of libgcc_s follows closely that of libstdc++, this extra library shouldn't pose any problem. ¹ http://gcc.gnu.org/onlinedocs/libstdc++/manual/abi.html Microsoft Visual C++ (Windows) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ If one of the C++ Standard Library headers (e.g.) is included in the server code, the Standard C++ Library will be linked in automatically by Visual C++ at compile time. The library will be linked statically as the build system forces static² runtime libraries via the /MT and /MTd options. The linked libraries are LIBCPMT.LIB (Multithreaded, static link, /MT option) or LIBCPMTD.LIB (Multithreaded, static link, debug, /MTd option). These libraries are already linked with the server given that the header , which is part of the standard library, is used throughout the code base (through my_global.h). ² Static linking is used to avoid having to ship DLLs and due to the license restrictions on redistributing the debug versions of the runtime libraries. Oracle Solaris Studio (Oracle Solaris) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The server is already dynamically linked with the C++ standard library (libCstd). Certain restrictions may apply. See ³ for details. ³ http://developers.sun.com/solaris/articles/cmp_stlport_libCstd.html The default C++ library for the SunPro compiler is really old, and not standards compliant. See http://developers.sun.com/solaris/articles/cmp_stlport_libCstd.html So we need to use libstlport rather than libCstd (which is based on an old library from Rogue Wave). libstlport is not installed on Solars by default, but it is re-distributable. So during packaging of MySQL binaries, we put libstlport.so in the /lib directory together with with MySQL libraries. All C++ executables must be linked in such a way that they can find libstlport at runtime. See http://www.oracle.com/technetwork/articles/servers-storage-dev/redistrib-libs-344133.html http://developers.sun.com/sunstudio/documentation/techart/stdlibdistr.html http://developers.sun.com/solaris/articles/cmp_stlport_libCstd.html http://developers.sun.com/sunstudio/documentation/ss12/mr/READMEs/runtime.libraries.html Current status ~~~~~~~~~~~~~~ In summary, the server already links with the Standard C++ Library in all cases, except when the compiler used is GCC. Exceptions, RTTI, and other general features of C++ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ C++ compilers have improved a lot in recent years, especially their handling of advanced C++ features such as exceptions and run-time type information. Although once associated with significant run-time overhead, nowadays these features have a negligible (if any at all) performance impact if the specific statements (e.g. try, catch, trow, etc.) are not used. For example, exception handling tends to be optimized for the case where exceptions are not thrown simply because it's the more common use. Since the finer details of C++ usage in MySQL are being revisited, and in light of the implications of disabling certain C++ features (see remarks above with respect to exceptions), it makes sense to start with a clean slate by using the default behavior provided by the compilers and/or established for the C++ language, unless they pose a negative impact on performance. This means not explicitly turning off exceptions and other C++ features. Later down the road this also allows these features to be used in a isolated manner (e.g. inside plugins) and makes it simpler to use or incorporate external packages/modules that make use of these features. Experimental evaluation ~~~~~~~~~~~~~~~~~~~~~~~ A separate test of removing the -fno-exceptions and the -fno-rtti flags shows that there is no significant difference in execution time between having and not having these flags. Benchmark is Sysbench, oltp_complex_ro. Run on ndbamd-6, a box with 12 2.8 GHz Opteron cores and 32GB of RAM. Threads vanilla stdc++ % Change ---------+---------+--------+---------- 16 4573 4609 0.79 32 4547 4561 0.31 64 4519 4548 0.64 128 4493 4501 0.18 256 4455 4456 0.02 Server linked with g++ (libstdc++) and compiled without the flags -fno-implicit-templates, -fno-exceptions and -fno-rtti.
- Tweak build system to link the server using g++. Remove associated hacks (e.g. gunit's CMakeLists, etc). Use and enforce the required C++ ABI (version 2). - Rename min/max macros to MY_MIN/MY_MAX. - Remove the MySQL specific new/delete operators. Use std::nothrow where applicable. - Do not disable specific C++ features. Remove flags -fno-exceptions, -fno-rtti, etc. - Set terminate and unexpected handler functions if necessary. - Showcase usage with a new example UDF. Add a UDF to udf_example that makes use of containers and algorithms of the C++ Standard Library. The intent is to ensure that the server links fine with the C++ library and that it is able to load plugins (UDF) that make use of it too. - PushBuild2 and Release integration. PB2 must no longer set CXX to gcc. Generic Linux packages should be built with gcc-3.4 [*], or packages should be produced for each major distribution. * A GCC version (with respective libstdc++ version) that is most widely available across Linux distributions. There might be some performance implications in using a older GCC version.
Copyright (c) 2000, 2024, Oracle Corporation and/or its affiliates. All rights reserved.