WL#12364: Kill administration for system users

Affects: Server-8.0   —   Status: Complete

WL#12098 introduced a new dynamic privilege 'SYSTEM_USER'. The user who is
granted this privilege can be treated as power user. In other words, a user
can modify another user with SYSTEM_USER privilege only if former too has 
SYSTEM_USER privilege. 

This also means that a user who is not granted 'SYSTEM_USER' privilege should
not be able to kill another user's session/query who is granted 'SYSTEM_USER'
privilege (i.e. a power user).

Currently one can use the KILL command to kill other user's session/query only if
they have the CONNECTION_ADMIN/SUPER privilege. There's no discrimination of what
users they can use it on.

The goal of this worklog is to disallow the KILL command terminating the power
user's session/query from another user's session who is granted 
CONNECTION_ADMIN/SUPER but not SYSTEM_USER privilege. Only one power user should 
be able to kill session/query of another power user.
Definitions
===========
Power_user 
----------
A user who is granted at least SYSTEM_USER privilege through GRANT statement.

Regular_user
------------
A user who is granted at least CONNECTION_ADMIN privilege but not the
SYSTEM_USER privilege.

Power_session
-------------
A user session that gets SYSTEM_USER privilege through role activation. 
                        OR 
User already had SYSTEM_USER privilege before the session was established.
                  
Regular_session
---------------
A user session that is not power_session.

Functional Requirements
=======================

FR-01: Regular_user MUST NOT be able to KILL the power_session.
        
    FR-01.1: Regular_user MAY be able to KILL the EXISTING connections of a
             user who is just elevated as power_user through GRANT statement.

    FR-01.2: Regular_user MAY NOT be able to KILL the EXISTING connections of
             a user who is just demoted from power_user through REVOKE
             statement.
             
FR-02: Regular user MUST NOT be able to KILL the queries running in
       power_session.
      
FR-03: A regular_session MUST become power_session by activating the role
       that has SYSTEM_USER privilege granted.
       
FR-04: A power_session MUST become regular_session after deactivating the
       role has SYSTEM_USER privilege granted.
       
FR-05: A regular_session MAY NOT become power_session by simply granting
       SYSTEM_USER privilege to the user unless connection is re-established
       or a role is activated/deactivated.
       
FR-06: A power_session MAY NOT become regular_session by simply revoking
       SYSTEM_USER privilege from user unless connection is re-established or
       a role is activated/deactivated.
       
FR-07: Running a SQL statement as definer who has SYSTEM_USER privilege MUST
       NOT elevate the regular_session as power_session even temporarily.
       
Interface Specification
=======================
No changes required. 

High-Level Specification
========================

S-1: Design Considerations:
     
     We evaluated the following design approaches.

S-1.1: Probe, the security context of current session and the security context
       of the session asked to be killed, for the SYSTEM_USER privilege. 
       If former does not have SYSTEM_USER privilege but latter has then former
       throws error. 
       This seems easiest and simplest approach but it would have a potential
       synchronization issues. Security_context may change temporarily while
       executing SQL statements.
       Imagine, there are two sessions t1 and t2; t2 wants to kill t1. t1 is
       in the middle of executing a query that has changed the
       Security_context temporarily. The changed security context does not have
       SYSTEM_USER privilege granted. When t2 probes the Security_context of t1
       it sees that t1 doesn't have SYSTEM_USER privilege hence, it can kill
       the t1 session which is not correct.

S-1.2 THD maintains the original Security_context as m_main_security_ctx
      member variable. we could consider probing the original Security_context
      instead of the current security context. But we identified at least
      following circumstances when the original security_context may be changed
      as well.
      - The client connection may change the current user by issuing the
        COM_CHANGE_USER command.
      - SET ROLE statement may change the original security_context
      - Plugin/component may also change the original security_context through
        security_context services. 
      
      Hence, this approach would not solve the synchronization issue identified
      with the S-1.1 completely.
   
S-1.3 Add an atomic flag in THD class. Turn ON the flag when user's session
      is authenticated or a role that has SYSTEM_USER privilege is activated.
      Turn OFF the flag when a role that has SYSTEM_USER privilege is
      deactivated.
      This approach solves the synchronization issues identified in the above
      two approaches. But, it has following two limitations:
      
       - The existing sessions of a user are not updated if user is granted
         SYSTEM_USER privilege through other user. It could mean that
         connections, before SYSTEM_USER privilege was granted to an user,
         could still be killed by the user who does not have SYSTEM_USER
         privilege.
         
       - There could be small window of race conditions in between of 
         SQL statement, changing the security_context and updating the atomic
         flag in the THD. For instance:
         
         mysql_set_active_role() {
            Security_context *sctx = thd->security_context();
            ret = sctx->activate_role(role->user, role->host, true);
            
            /* 
              Here, session could still be killed just before the user got 
              system_user privilege but flag in the THD not yet updated 
            */
            
            thd->set_system_user_flag();
         }
          
Either of S-1.1 or S-1.2 would require to take the lock (i.e. Lock_thd_data)
when the security_context is accessed through THD::security_context() API.
This may have potential performance impact. S-1.3 has two limitations but
We can live them since these are uncommon scenarios.

After carefully evaluating above three approaches, we decided to go with S-1.3. 

S-2: Rationale behind FR-03,04 is that role activation/de-activation is a
     session specific task. Therefore, SET ROLE statement can promote a session
     from regular to power or vice-versa. We probe the user's security_context
     for SYSTEM_USER privilege before executing the SET ROLE statement.
     
S-3  Rationale behind Fr-05,06 is that usually privileges are granted/revoked
     to a user through another user's session. It would be very expensive
     operation to reflect the change in user's privilege to its existing
     connections. Therefore, we update the system_user flag in the connection
     either when connection establishes or roles activated/deactivated.
     
S-4: Rationale behind FR-07 is that running a SQL statement with power user as
     definer does not affect the user's session in anyways. Instead, it might 
     be required  to  execute the SQL statements. Therefore, regular session 
     might get killed  even if it is running a SQL statement with definer who 
     has SYSTEM_USER privilege.

S-5: Impact on replication :- None

S-6: Impact on upgrade :- None   

S-7:  Threads not created by the users are categorized as system threads in
      server. That raises an interesting question

      "Can regular user kill the system threads ?".

     To answer above question. We prepare following matrix threads created by
     server. Objective of the matrix is to categorize the threads in four
     quadrant so that:

     - we can identify which system_threads that gets the SYSTEM_USER privilege
       by the virtue of skip-grants flag. Hence, they fall into to (1) quadrant
       and it is safe to not allow end-users to kill them. Since end-users
       cannot them spawn them anyways.

     - We can identify which system threads are safe for users in (4) quadrant
       to kill.

     - We can identify system threads in (3) quadrant that needs to be moved to
       1st quadrant so that users in (4) quadrant cannot kill them.

     - The remaining system threads in the 3rd quadrant will probably will be
       safe to be killed by users in 4th quadrant. Threads added in (3)
       quadrant need to be evaluated for the fact if they can be run by
       SYSTEM_USER as definer.



             Internal threads                         External threads
        ┌──────────────────────────────────────┬─────────────────────────────┐
        │         (1)                          │        (2)                  │
With    │  SYSTEM_THREAD_SLAVE_IO              │                             │
system  │  SYSTEM_THREAD_SLAVE_SQL             │   User connections with     │
_user   │  SYSTEM_THREAD_EVENT_SCHEDULER       │   SYSTEM_USER privilege     │
        │  SYSTEM_THREAD_NDBCLUSTER_BINLOG     │                             │
        │  SYSTEM_THREAD_SLAVE_WORKER          │                             │
        │  SYSTEM_THREAD_COMPRESS_GTID_TABLE   │                             │ 
        │                                      │                             │
        │                                      │                             │
        ├──────────────────────────────────────┼─────────────────────────────┤
        │          (3)                         │         (4)                 │
        │                                      │                             │
        │  SYSTEM_THREAD_EVENT_WORKER          │  User connections without   │
without │                                      │  SYSTEM_USER privilege      │
system  │                                      │                             │
_user   │  Threads that will be moved to (1)   │                             │
        │    SYSTEM_THREAD_INFO_REPOSITORY     │                             │
        │    SYSTEM_THREAD_DD_INITIALIZE       │                             │
        │    SYSTEM_THREAD_DD_RESTART          │                             │
        │    SYSTEM_THREAD_SERVER_INITIALIZE   │                             │
        │    SYSTEM_THREAD_INIT_FILE           │                             │
        │    SYSTEM_THREAD_SERVER_UPGRADE      │                             │
        │    SYSTEM_THREAD_BACKGROUND          │                             │
        │                                      │                             │
        └──────────────────────────────────────┴─────────────────────────────┘


     - SYSTEM_THREAD_EVENT_WORKER
       Is not granted any privilege. It runs with the definer's security
       context at the time of executing the event.  It is safe for them to be
       killed by users in 4th quadrant if definer does not have SYSTEM_USER
       privilege. Hence, we can keep them in the 3rd quadrant.

    We could grant SYSTEM_USER privilege to the following threads even though
    most of them are bootstrap threads so end-users will not see them.

     - SYSTEM_THREAD_INFO_REPOSITORY
       Created during bootstrap process or mysql startup. Used to read rpl_info
       table

     - SYSTEM_THREAD_DD_INITIALIZE
       Bootstrap thread. Runs during server initialization. No privileges
       granted.

     - SYSTEM_THREAD_DD_RESTART
       Bootstrap thread. Runs during server initialization. No privileges
       granted.

     - SYSTEM_THREAD_SERVER_INITIALIZE
       Runs during server initialization. No privileges granted.

     - SYSTEM_THREAD_INIT_FILE
       Runs during server initialization. No privileges granted.

     - SYSTEM_THREAD_SERVER_UPGRADE
       Bootstrap thread. Runs during server initialization. No privileges
       granted.

     - SYSTEM_THREAD_BACKGROUND
       Do not list processlist_id in the performance_schema.threads table.
       It means they cannot be killed by KILL command.
       Hence, we can keep them in the 3rd quadrant.