WL#11926: GR: IPv6 support

Affects: Server-8.0   —   Status: Complete   —   Priority: Medium

EXECUTIVE SUMMARY
=================

This worklog implements IPv6 support for MySQL Group
Replication. After this worklog, the user will be able to fully deploy
Group Replication not only in an IPv4 but also on a IPv6 network.

USER/DEV STORIES
================

As a system administrator I want to deploy an IPv6 network while
running MySQL Group Replication at the same time, so I can make use of
IPv6 features.

SCOPE
=====

The scope of this worklog is to make XCom support IPv6.

REFERENCES
==========

- Add IPv6 support for Group Replication
  https://bugs.mysql.com/bug.php?id=90217


PROTOTYPE
=========

As part of the effort to verify how much this task would take, we
actually came up with a working prototype for GCS running on IPv6. It
runs the simple_xcom example over IPv6.

It is located in the branch mysql-trunk-xcom-ipv6

Considerations about the prototype can be found at 
https://confluence.oraclecorp.com/confluence/pages/viewpage.action?
pageId=727464494
FR1: GCS/XCom must support IP v6 as a valid addressing protocol
FR2: GCS/XCom must continue to support IP v4 as a valid addressing protocol

FR3: A node must allow its local_node_address to be an IP v6 address
FR4: A node must allow its list of peers to be a list of IP v6 addresses
FR5: A node must allow its list of peers to be a list of mixed IP v4 and v6 
addresses

FR6: A node must allow its whitelist to be configured using IPv6 addresses
FR7: A node must allow its whitelist to be configured using IPv4 and IPv6 
addresses

FR8: When a node is configured in IP V6, it must show its new address type
     in Performance Schema tables
     
FR9: If a node with this feature implemented want to join a group that does not
     have this feature implemented, it must enter a group presenting itself
     with a local IPv4 address.

FR10: If a node does not respect FR9, an error must be thrown when joining.

FR11: If a node without this feature implemented wants to join a group in
      which there are nodes with this feature implemented, then all group
      members must present themselves with an IPv4 address configured

FR12: If FR11 is not respected, the seed node must reject the node entering the 
      group.

FR13: group_replication_force_members must support IPv6 as an input

NFR1: There must not be any performance regression due to the usage of IP V6
NFR2: None of the existing IPv4 functionality should be affected 
1.Introduction
========================

As modern networks grow in size, IPv6 is finally taking its place even in
internal professional networks, as replacement for the old and depleted IPv4.

XCom, and consequently GCS, had only been built around IPv4 networking, with
all its limitations, such as:
- All internal references to addresses and their parsing is considers only v4
  addresses
- Whitelisting only considers input v4 addresses
- Hardware queries only consider v4 networks and interfaces (ioctl).
- All low-level network layer code only creates socket structures regarding v4
- Client code only considers v4 servers.

The goal of this WL is to eliminate all of these limitations and allow XCom to
be an IPv6 dual-stacked applicaiton, supporting both client and server IPv6 and 
v4 connections.

Along the next chapters, one will enter in detail which code areas need to
be changed in order to fully support IPv6.

2.IPv6 Address Storage
=========================

Both in GCS and in XCom, the literal storage address is IPv4, which has a 
specific interpretation for the format "IP:PORT". One considers
string literal inputs of that type for:
- Member identification, which goes up to Group Replication;
- Member configuration, both the local address and member seeds;
- XCom server identification, for the sake of consensus;
- XCom server addressing, for the sake of physical connections;

Literal address configuration will continue to be a reality, but with some
challenges, such as:
- Different address formats
- Different parsing rules

IPv4 addresses are know for their XXX.XXX.XXX.XXX format, each block 
representing 1 byte, commonly followed by a port. An example is the classic 
localhost: "127.0.0.1:12345".
IPv6 has a longer 128 bit format and each block is separated by colons. The 
generic format are 8 blocks such as XXXX:XXXX:XXXX:XXXX:XXXX:XXXX:XXXX:XXXX. In
order to have a port here, the standard recommends the usage of square brackets.
String literals fed into XCom will need to have this 
format:[2606:b400:8f0:80:8000::705]:12345

In this process, one will have to check if the input is a V4 or V6 address. We
can simplify it if we check the existence of the square brackets in the input,
for IP:PORT values, and the existence of colons in pure IP entries, such as
the whitelist, leaving the validation for methods such as getaddrinfo. Last in
line will be the failure from the connect itself if one can't parse the address.
One can augment the error messages, processing the returned error code,
to check if an incorrect address was used to connect to a remote member. 

3.IPv6 physical interface retrieval
=======================================

XCom uses physical interfaces for two reasons:
- Whitelisting, in order to automatically add private addresses to the whitelist
- Node identification, since one only adds node that match existing physical 
  addresses

Currently, one uses the legacy method ioctl to accomplish the task of interface
retrieval. But ioctl only supports legacy IPv4 information.

More modern implementations advise the use of getifaddrs, which contains the 
same
information conveyed by ioctl, but it is able to retrieve IPv4 and IPv6 
addresses.

That method is only available in *Nix. In windows, it is advisable to use 
GetAdaptersInfo and GetAdaptersAddresses, which have a similar output to 
getifaddr.

4. Low-level network code
===========================

All the network code that opens sockets, receives connections and does name
resolution is not ready to cope with IPv6. Common issues are:
- Most of the code is hardcoded to use sockaddr_in, which is a v4 structure;
- Socket creation is made with IPv4 only, both for clients and for servers.

To overcome the first limitation, one must start using the generic getaddrinfo,
in order to have a struct sockaddr of the correct type that allow us to use it 
in a generic fashion in all socket library methods.

Overcoming the last limitation will depend on the way one decides to implement
v6. If we go for a dual-stack approach, that means that we only expose v6 and
all we get are translated addresses, one needs to separate the creation of
client and server sockets:
- Server sockets would be created with v6 type but we would need to set
  dual stack mode via ioctl
- Client sockets would need to distinguish if the input address is v4 or v6 and
  act accordingly OR always use v6 using v4-translated addresses.

5. Whitelisting
==========================

Whitelisting has several challenges, regarding configuration since now we will 
need to store 128 bit addresses and be aware of longer SIDs. IPv6 interface also
have their particular such as the notion of link only and global addresses.

In case of AUTOMATIC setup, we need to retrieve the correct hardware 
configuration.
But this case is covered in the above section with the usage of getifaddrs

Finally, one needs to extend the current comparison, since the current octet 
comparison, is currently limited to 4 bytes.

6. Addresses in GR
==============================

In GR, one has 3 items where we configure addresses:
- Local Node address
- Whitelist
- Group Seeds

All of them are addresses, but some of them have both physical and 
logical attributes that are not user-visible.

Local addresses has two purposes:
- Uniquely identify a member in the group;
- Serve as the addresses that other group members will use to contact
us back, after an add_node request is accepted

Whitelist is purely physical. It is triggered when one receives a 
physical connection from other nodes. As such, no local address is at 
play here. Only physical counterparts.

Group Seeds are also purely physical. It is an address in which
we can send an add_node request. Since GR/GCS/XCom binds to all addresses
in the host, one can send the request to any available address.

6.1 Practical implications
===============================

Having said the above, when one adds a member to the group, it will
contact a seed node in order to send an add_node request. Consider 
node A, already in the group, and node B attempting to join:

1. B will create a physical connection to A, using the seed address 
   configured in Bs seed list;
2. A receives a physical connection from B and checks if B is allowed to 
   connect, following the permissions configured in the whitelist.
2.1 If the physical address of B used to connect to A is not in A's 
    whitelist, we will reject the connection.
2.2 If it belong to the whitelist, the physical connection is allowed to
    continue.
3. B sends an add_node request to A, that contains the Local Address of
   B.
4. A receives the add_node from B and runs a series of checks to see
   if B is allowed to join a group.
4.1 if B is rejected, it receives a REQUEST_FAIL answer
5. B proposes A to be addeded to the group
6. B will then receive physical connection from all group members, 
   including A.
7. When B receives a physical connection from A, it will run step 2 of
   this algorithm.
   
Considering the steps above, we see no issues in the following scenarios:
- IPv4 only
- IPv6 only
- Mixed IPv4 and IPv6 with old IPv4 binaries, since they will all 
  talk to each other using pure IPv4 or pure IPv6 clients and servers.
 
With this WL implemented, there is an issue that emphasizes the 
separation between what is logical and what is physical. Lets use the 
following example:

Node A:
NIC 1: 
10.10.172.123
2606:b400:8b0:40:3d9c:cc43:e006:19e4

Node B:
NIC 1: 10.10.172.124
2606:b400:8b0:40:3d9c:cc43:e006:19e8

Node A configuration
Bootstrap = YES
Local Address = 2606:b400:8b0:40:3d9c:cc43:e006:19e4
Seeds = = 2606:b400:8b0:40:3d9c:cc43:e006:19e4
WhiteList = 10.10.172/24, 2606:b400:8b0:40:3d9c:cc43:e006:19e4

Node B configuration
Bootstrap = NO
Local Address = 10.10.172.124
Seeds = = 2606:b400:8b0:40:3d9c:cc43:e006:19e4
WhiteList = 10.10.172/24, 2606:b400:8b0:40:3d9c:cc43:e006:19e4

Node A will boot the group. Then node B will try to join the group and
it will fail. The question is: Why?

Node B will try to contact node A with its IPv6 address. As such, Node B
will use a IPv6 connection. When it arrives on the other side, Node A
will run step 2 of the join algorithm. The address that it will see will
be the IPv6 address of Node B. And that address is not configured in 
Node A whitelist.

6.2 Correct configurations in a mixed scenario using IPv6 capable binaries
============================================================================

If one wants to maintain a mixed scenario as descibed above, we need to
take into consideration that:
- The protocol in which the seed is configured, is the protocol that we 
  will use to create the connection.
- Whitelist verification will use the address that is used to create
  the connection.

The corolary of this is that we need to consider in the group whitelists,
not only the logical local addresses but also the physical addresses of
each participating node.

And we need to remember that it needs to be reciprocal. As such, A needs
to have B in the whitelist and vice-versa.

A correct configuration of the scenario above will be:

Node A:
NIC 1: 
10.10.172.123
2606:b400:8b0:40:3d9c:cc43:e006:19e4

Node B:
NIC 1: 10.10.172.124
2606:b400:8b0:40:3d9c:cc43:e006:19e8

Node A configuration
Bootstrap = YES
Local Address = 2606:b400:8b0:40:3d9c:cc43:e006:19e4
Seeds = = 2606:b400:8b0:40:3d9c:cc43:e006:19e4
WhiteList = 10.10.172/24, 
2606:b400:8b0:40:3d9c:cc43:e006:19e4,2606:b400:8b0:40:3d9c:cc43:e006:19e8

Node B configuration
Bootstrap = NO
Local Address = 10.10.172.124
Seeds = = 2606:b400:8b0:40:3d9c:cc43:e006:19e4
WhiteList = 10.10.172/24, 2606:b400:8b0:40:3d9c:cc43:e006:19e4

Note that we added Node B IPv6 address to Node A whitelist. Adding a range
would also make the trick.


7. General Code Refactoring
==========================

XCom network code is spread all around in the client code, in server code and 
in the whitelist code. This WL must take the chance to unify the socket and
sockaddr creation, to avoid having duplicated code all over to accomplish the
same task. If possible, also refactor the headers to have well-defined
interfaces.


1. Introduction
==========================

This section will enter in detail which code will suffer changes,
which new methods will be used and finally which code will be 
refactored. The following sections will describe:
- How and when to change parsing for IPs;
- Implementation of a new way to retrieve physical interfaces;
- Replace legacy structures by getaddrinfo;
- Dual-Stack: how and where to implement;
- Whitelist augmentation;

2. Address input and parsing
==============================

From GCS, we have direct IP:PORT input from group_replication_local_address and
group_replication_group_seeds. We also have inputs in IP format from whitelist,
but it will be considered in another section.

From now on, one will need to accept both IPv4 and IPv6 addresses. The correct
way to accept those addresses is using the same notation that is recommended for
browser URLs which is: [IPv6]:PORT. IPv4 will remain the same.

IP parsing will start with checking if are in presence of an IPv4 or IPv6 
address
detecting the existence of square brackets in the address. To check its 
validity,
a run through getaddrinfo will check the validity of the address.

The parse will happen in GCS level in:
- Gcs_xcom_node_address class, that decomposes a string into IP and PORT
- is_valid_hostname

And in XCom level in:
- int end_token
- char *get_name
- xcom_port get_port

3. Physical address retrieval
===============================

ioctl is not an option when it comes to IPv6 physical interface address
retrieval. In GCS/XCom, this is used in two cases:
- Whitelist configuration
- To determine one's node index when adding a new member to the group.

Currently this is done in the tryptic:

             sock_probe.c
                /  \ 
               /    \
              V      V
  sock_probe_ix.c   sock_probe_win32.c
  
sock_probe.c includes either sock_probe_ix.c or sock_probe_win32.c, depending
on the platform where the code is built. Both files implement their version
of the methods:
- static int init_sock_probe(sock_probe *s)
- static void close_sock_probe(sock_probe *s)
- static int number_of_interfaces(sock_probe *s)
- static bool_t is_if_running(sock_probe *s, int count)
- static sockaddr get_sockaddr(sock_probe *s, int count, struct sockaddr **out)

The current implementation retrieves interfaces from ioctl and creates an index
on top of it.

getifaddrs simplifies this task since it returns a linked list of all existing
interfaces. As such, one just needs to replace the current sock_probe content
with that linked list reference. Note that one needs to be careful to only 
consider
valid interfaces the ones that belong to the AF_INET4 and AF_INET6 families.

In Windows, one must migrate the current solution, which is based in WSAIoctl 
to a more modern version using GetAdaptersAddresses, that retrieves all
adapters addresses regarding all address families. It works the same way as 
getifaddrs.

4. Usage of getaddrinfo instead of raw structures
===================================================

Most of the raw network code uses sockaddr_in structures. Some notable examples
are all the client code within XCom, both in the synchronous client methods and
in the dial() and connect() methods used to connect back to joining nodes.

That code is tied to the usage of IP v4 and, in order to make it generic, one
must change to use checked_getaddrinfo when possible, since the Socket API
methods can use the returned structures directly without the need for casting
back and forth between "struct sockaddr" and "struct sockaddr_in"

An example is:

[snip]
struct addrinfo *addr = 0;

char buffer[20];
sprintf(buffer, "%d", port);

checked_getaddrinfo(server, buffer , 0, &addr);

if (addr == 0) {
  return 0;
}
 
/* Connect socket to address */
 
SET_OS_ERR(0);
if (timed_connect(fd.val, addr->ai_addr, addr->ai_addrlen) == -1) {
[/snip]

The code becomes much cleaner, since checked_getaddrinfo fills all necessary
fields in a generic struct sockaddr. What needs to be taken care of is that
the return of getaddrinfo is a linked list of addresses. This means that, if
we are resolving a name, we need to be careful to check if it does not return
both V4 and V6 versions of the same name.

We need to take in consideration:
- the Upgrade and Downgrade scenario, described in chapter 7;
- Dual stack ability of the new code;

The wise approach is to always default to V4, since it is the omnipresent
protocol in both old and new nodes. As such, if a node is configured in DNS
both with V4 and V6, the address to be used will always be the V4 address.

If one want to use exclusively IPv6 with name configurations, name resolution
for those addresses should always point to the V6 address.If needed, one can 
always create a new parameter, in order to decide the default name resolution
decision: either v4 or v6.

5. Dual-Stack 
===========================

With this modification, one will support both IPv4 and IPv6 connections, as 
MySQL does. There are two ways to implement this:
  - Have two sockets bound, one in V6 and another in V4
  - Use what MySQL uses, which is Kernel support for dual stacking.

This works by creating an IPv6 server socket, and setting an option via ioctl.
An example follows:

int sock = socket(AF_INET6, SOCK_STREAM, 0);

int mode = 0;
setsockopt(sock, IPPROTO_IPV6, IPV6_V6ONLY, (char*)&mode, sizeof(mode);

This needs to be done when creating the server socket, which is done in:

result announce_tcp(xcom_port port);

It will allow the application to have only one open socket, but receive both
IPv4 and IPv6 connections. The only caveat of this approach is that, when
converting IPv4 addresses to text mode, they will be represented as IPv4-mapped
addresses,  which has the first 80 bits set to zeros, followed by the next 
16 bits set to all ones and finally, the last 32 bits written in dotted 
decimal appended to then end forming 128 bit IPv6 address. 

An example of an IPv4 Class A address of 12.155.166.101 would look like this 
in IPv4 Mapped address 0000:0000:0000:0000:0000:FFFF:12.155.166.101 or 
::FFFF:12.155.166.101 in IPv6's short form.

As MySQL server, we need to accept both formats as inputs in the parameters:
- ::FFFF:12.155.166.101
- 12.155.166.101

As physical storage in the "struct server", we shall not store the mapped
version, since there is no need to maintain the IPv4-mapped version.

6. Whitelist augmentation
=============================

Whitelisting in GCS/XCom has two moments where adding IPv6 becomes relevant:
- Configuration
- Runtime

When configuring the whitelist:

- bool Gcs_ip_whitelist::configure(const std::string &the_list)

This is entry method where the input string from the whitelist is split into
several strings. One needs to add support for splitting IPv6 addresses and
detect if localhost is configured. If not, we must add both IPv4 and IPv6 
localhost address.

- bool Gcs_ip_whitelist::add_address(std::string addr, std::string mask)

Indirectly, add_address uses Gcs_ip_whitelist_entry derivatives, which are
Gcs_ip_whitelist_entry_ip and Gcs_ip_whitelist_entry_hostname. 
Gcs_ip_whitelist_entry_ip uses bool get_address_for_whitelist that needs to be
checked if it has IPv6 support.

When using the whitelist in runtime:

- bool Gcs_ip_whitelist::do_check_block_whitelist

This method already uses octet block to compare entries in the whitelist. One
must ensure that the input is generic and it is able to compare either v4 or
v6 addresses.

- bool Gcs_ip_whitelist::do_check_block_xcom

This method compares the new entry with the existing group. It also needs to
take into account the new entries regarding IPv6 addresses.

Whitelist also supports the AUTOMATIC feature, in which GCS automatically fills 
the whitelist field with private addresses. For more detail on that please refer 
to WL#9345. In this WL, we need to augment this in order to add IPv6 private 
addresses, which are of 3 types:
- Localhost ::1
- Link-Only addresses that start with fe80::/10
- IPv6 reserved private addresses which start with fc00:/7

For more detail on this subject, please refer to the standard in:
- https://tools.ietf.org/html/rfc4193
- https://tools.ietf.org/html/rfc5156#page-2

For general knowledge on IPv6 addressĩng  please refer to 
https://tools.ietf.org/html/rfc4291


7. Upgrade/Downgrade
=============================

7.1 Upgrade
=============================

Regarding upgrading the group, one cannot join the group with an IPv6 address
since old nodes won't be able to contact you back. You need to join the group 
with an IPv4 address, and when all members are up-to-date, start switching 
the local addresses to the desired IPv6 address.

Note that, if you present yourself to the group with an IPv6 address, the other
nodes won't be able to contact you back. To really avoid that and have a 
meaningful error in the joiner node, one should bump the XCom protocol to
validate that one is using a correctly configured address when contacting a 
lower version that does not support V6 connections.

Since this will happen on an add node client request level, we should consider
adding this check in the function:

static int64_t xcom_send_client_app_data(connection_descriptor *fd,
                                         app_data_ptr a, int force)

7.2 Downgrade
=============================

Regarding downgrading the group, we have the same issue as in Upgrade. Old nodes
can't speak IPv6 and, even if they are reachable via IPv4, when they 
receive the new configuration, it will contain IPv6 addresses in String format,
which they do not know how to interpret.

Having said that, before starting a group downgrade, all nodes must be 
have their local addresses reconfigured to IPv4. After that, one can start
joining older nodes that do not support IPv6 to the group.

8. Security
=============================

There are no security considerations regarding adding IPv6.