WL#11541: Allow switching the SSL options for a running server
Affects: Server-8.0
—
Status: Complete
For long running servers the certificate validity may expire and one must be able to change it without restarting the server. This worklog will make all of the SSL options dynamic by preparing a new SSL_CTX for the listening socket and substituting it instead of the old one.
FR1: We shall add a command to refresh the SSL key material, e.g. ALTER INSTANCE RELOAD TLS. When called this will re-initialize the SSL_CTX used to accept connections from the same sources as the config variables are pointing to. FR 1.1: All sessions that existed at the time RELOAD TLS is called will continue to use the old SSL context and operate normally. FR 1.2: All sessions created after RELOAD TLS gets in effect will be using the new SSL context. FR 1.3: the resources of the old SSL context will be freed after the last active session using it is closed. FR 1.4: RELOAD TLS will not kill any of the existing sessions. If one wants to do this they can use KILL. FR 1.5: The eventual new values for the SSL context related system variables (-ssl-ca, --ssl-cert, ssl-key, --ssl-capath, --ssl-crl, --ssl-crlpath, --ssl-cipher, --tls-version) will become effective only after a subsequent call to ALTER INSTANCE RELOAD TLS. FR 1.6: ALTER INSTANCE RELOAD TLS will not be replicated: SSL config is local and resides in files that need to be maintained too. FR 1.7: CONNECTION_ADMIN will be required to execute ALTER INSTANCE RELOAD TLS FR 1.8: A call to ALTER INSTANCE RELOAD TLS will not disable SSL connections if the parameter values in the ssl system variables do not form an acceptable set of options and/or some of the functions called to create the new SSL_CTX fails. It will roll back to the previous set of values. If the optional clause NO ROLLBACK ON ERROR is specified the rollback will not be performed and the SSL will be disabled in case of errors setting up the context. FR 1.9: A call to ALTER INSTANCE RELOAD TLS will enable SSL connections if the parameter values in the ssl system variables form an acceptable set of options and all of the functions called to create the new SSL_CTX succeed. FR2. We will make (--ssl-ca, --ssl-cert, ssl-key, --ssl-capath, --ssl-crl, --ssl-crlpath, --ssl-cipher, --tls-version) system variables settable at runtime. FR2.1. Changing the values for the SSL context related status variables (-ssl-ca, --ssl-cert, ssl-key, --ssl-capath, --ssl-crl, --ssl-crlpath, --ssl-cipher, --tls-version) won't have any effect until ALTER INSTANCE RELOAD TLS is called: e.g. to change a certificate you need to change the related key too. So changing on every SET is not going to work well. NF3. The current SSL context variable will be converted into an atomic pointer and all operations with it will be atomic. This might have some performance impact (TBD). FR4. This worklog will *only* affect the server's context for the mysql protocol handler. All other SSL server contexts (X protocol etc) will need to undergo similar operation. NF5. By default (if one doesn't call ALTER INSTANCE RELOAD TLS) the operation of the server is backward compatible. A slight delay may be observed at connect/SHOW STATUS time due to the atomic read of the SSL context pointer. FR6. All the SSL context related system variables (-ssl-ca, --ssl-cert, ssl-key, --ssl-capath, --ssl-crl, --ssl-crlpath, --ssl-cipher, --tls-version) are to be mirrored by a set of status variables named Current_tls_* (e.g. the currently effective --ssl-ca value is to be found in "show status like 'Current_tls_ca'") reflecting the *currently active* values. These change as you start the server or call ALTER INSTANCE RELOAD TLS. FR7. The --ssl option will only have an effect at server startup in that it will not prepare the server to accept SSL connections. Subsequent calls to ALTER INSTANCE RELOAD TLS will not take the --ssl option into account anymore. FR8. Both Group replication and the X plugin will copy the ssl variables values from the system variables at the time they are activated only. Subsequent changes to the values of the system variables or calls to ALTER INSTANCE RELOAD TLS will currently not result in changing the SSL parameters for these components.
Current state --------------- Currently the SSL context is kept into two global variables: ssl_acceptor_fd : the SSL_CTX itself ssl_acceptor: One fake SSL handle created at startup so some status vars will work Both are initialized in init_ssl_communications, called by mysqld_main right before network_init. Both are deinitialized in end_ssl, called by clean_up right before vio_end(). The ssl_acceptor_fd is used in the following status vars: * ssl_ctx_sess_accept * ssl_ctx_sess_accept_good * ssl_ctx_sess_connect_good * ssl_ctx_sess_accept_renegotiate * ssl_ctx_sess_connect_renegotiate * ssl_ctx_sess_cb_hits * ssl_ctx_sess_hits * ssl_ctx_sess_cache_full * ssl_ctx_sess_cache_misses * ssl_ctx_sess_cache_timeouts * ssl_ctx_sess_number * ssl_ctx_sess_get_cache_size * ssl_ctx_get_verify_mode * ssl_ctx_get_verify_dept * ssl_ctx_get_session_cache_mode The ssl_acceptor ssl context is used for the following status vars: * ssl_server_not_before * ssl_server_not_after The ssl_acceptor_fd is used also in: * as a flag if SSL is configured in: ** send_server_handshake_packet to set CLIENT_SSL capabilities. ** validate_user_plugin_records when there are sha256 users * to call SSL_accept in send_server_handshake_packet if it's time to set up SSL layering. There's also a flag called have_ssl. It's : * initialized to YES if compiled with SSL library in mysql_init_variables * initialized to NO (impossible) if compiled without an SSL library in mysql_init_variables * set to DISABLED in init_ssl_communications if allocating an SSL_CTX fails * checked in check_secure_transport() * exposed as have_openssl * exposed as have_ssl Proposed design --------------- class ssl_acceptor_context { protected: struct st_VioSSLFd * acceptor_fd; SSL *acceptor; std::string current_ca_, current_key_, ... ssl_acceptor_context(); ~ssl_acceptor_context(); // the pointer to hold the current SSL context static std::atomicsingleton; // Partial RCU implementation: a guard to ensure there's no readers. // this works since we expect that the readers will be very brief and // most of the time will be spent servicing the session // we only need to mark a thread as a reader when it's reading the // singleton value until it allocates the new SSL for the session. // After that SSL has its own locking scheme. static std::atomic rcu_readers; public: // operations static bool singleton_init(... ssl_params ...); // the current init_ssl_communications(); static void singleton_deinit(); // the current end_ssl static bool singleton_flush(... ssl params ...); // info functions, to be called for the session vars static ssl_ctx_sess_*(...); static ssl_ctx_get_*(...); static ssl_server_not_*(...); static bool have_ssl(); } Syntax SUGAR to parser "ALTER INSTANCE RELOAD TLS [NO ROLLBACK ON ERROR]" to call singleton_flush(). All SSL variables settable at runtime. singleton_flush() will: 1. create a new ssl_acceptor_context instance complete with SSL_CTX, SSL and copy the SSL variables values. 2. atomically swap the new instance with the old into the static atomic singleton pointer. 3. Free the old instance when there's no readers Note that SSL_CTX has an (atomic) reference count. And SSL_free calls SSL_CTX_free for the SSL_CTX it was created on. So the last SSL_free for the last session using the old CTX will free the CTX. The info functions can safely use the SSL_CTX data due to the reference count bump by the acceptor SSL. The home grown RCU (Read-Change-Update) lockless implementation ------------------------------------------------------------------ We want to impose a minimal penalty on reading the SSL_CTX. That may come at the cost of some extra effort spent on updating it, since that's going to be very un-frequent compared to the reads. The best algorithm for that seems to be an adapted version of RCU. See the test driver attachment for comparison benchmarking. This compares the implementation with a mutex and a read write lock. And also has a proof of concept SSL_CTX/SSL implementation too. We have a single global atomic pointer to hold the SSL context. We can safely prepare a new one and replace the old one atomically. But the issue is that we need to make sure nobody's using the old ssl context before we can call SSL_CTX_free() on it. openssl has its own reference counters for SSL_CTX. Thus we only need to make sure nobody is reading the global atomic pointer without the help of an SSL. Because if we allocate an SSL from the SSL_CTX the openssl's reference count kicks in and the SSL_CTX is safe and disposed of properly. So we're left with having to account ourselves for a very small window: from the moment the SSL_CTX global atomic pointer is read to the moment the SSL_new completes. Since the window is so small there's a high chance that, even on a busy system, there will be lots of moments when no thread will be in that state. Thus all we need is a stable reading if there's a reader in that window presently. We achieve this via a simple atomic reference count. And we just busywait on it to be zero in the writer before disposing of the old CTX. In the unlikely event this wait fails (too many sessions reading) we could either leak the CTX or add it on a global queue for the next writer to dispose of. Results from running the test driver on a windows laptop with openssl 1.1 (times in secs): Starting type rcu all threads started in t=20 all readers ended t=20 all threads ended in t=20 Stats: reads=100000000 writes=50 Starting type mutex all threads started in t=34 all readers ended t=34 all threads ended in t=34 Stats: reads=100000000 writes=50 Starting type smutex all threads started in t=78 all readers ended t=78 all threads ended in t=79 Stats: reads=100000000 writes=50 Starting type SSL_rcu all threads started in t=144 all readers ended t=145 all threads ended in t=145 Stats: reads=0 writes=50 Results from running the test driver on a linux server with openssl 1.0 (times in secs): Starting type rcu all threads started in t=2 all readers ended t=6 all threads ended in t=6 Stats: reads=100000000 writes=50 Starting type mutex all threads started in t=0 all readers ended t=18 all threads ended in t=18 Stats: reads=100000000 writes=50 Starting type smutex all threads started in t=0 all readers ended t=25 all threads ended in t=25 Stats: reads=100000000 writes=50 Starting type SSL_rcu all threads started in t=1 all readers ended t=387 all threads ended in t=387 Stats: reads=0 writes=50 Effects on other users of the server SSL parameters --------------------------------------------------- Currently X and GR plugins take a freeload on the server parameters for their own SSL config. After this work is done there will be no change there, except that they'd take the startup parameters and freeze into these: i.e. if the server values change the GR and X will not notice this. Separate work needs to be done by the relevant teams to ensure that server changes are handled properly. Or that they do not depend on the server parameters (which is the preferred mode).
Copyright (c) 2000, 2024, Oracle Corporation and/or its affiliates. All rights reserved.