WL#2387: Replication Master Filtering
Affects: WorkLog-3.4
—
Status: Un-Assigned
SUMMARY ------- Be able to have the replication filters work on master instead of on the slave. (Currently data is being replicated to the slave even if the filters on the slave discard that data.) MOTIVATION ---------- Much less network bandwidth used when replicating. Tables, databases that should be filtered away are being so already at the master. REQUIREMENTS ---------------- 1. Filtering on originating server (or originating cluster if we implement that) could also be done on the master. USER INTERFACE -------------- The following options start to take effect on master instead of slave: `--replicate-do-db=DB_NAME' `--replicate-do-table=DB_NAME.TBL_NAME' `--replicate-ignore-db=DB_NAME' `--replicate-ignore-table=DB_NAME.TBL_NAME' `--replicate-wild-do-table=DB_NAME.TBL_NAME' `--replicate-wild-ignore-table=DB_NAME.TBL_NAME' The following options still take effect at the slave: `--replicate-rewrite-db=FROM_NAME->TO_NAME' OPEN ISSUE ---------- Either the filtering can be controlled by the master (so that slaves would only get what the master has defined). Alternatively each slave can connect to the master with a different defintion of filter. The latter version needs changes to the way the slave asks the master for the binlog. OPTIONAL EXTENSION ------------------ All of this options could be added to CHANGE MASTER in the following way: CHANGE MASTER 'foo' TO MASTER_HOST=127.0.0.1, REPLICATE-DO-DB='mydb'; IMPLEMENTATION -------------- All filtering code is refactored into a separate file rpl_filter.cc Part 1: When the slave registers on the master it forwards information about all filters that should be applied. This requires an exension to the function slave.cc:register_slave_on_master(). Part 2: The master adds functionality in the dump thread to filter things. Much of the code in rpl_filter.cc can be used for this (functions like slave.cc:db_ok()) BINLOG EXTENSIONS ----------------- There is a possibility to divide the filtered binlog into separate binlogs, i.e. on binlog for one database and another for another database (Brian seems fond of this idea.) If we choose this path, we need to rename binlog files accordingly, for instance like this: --bin.index - -bin.NNNNNN Note, however that this is not really needed for filtering on master. One could just use one binlog and then apply the filtering in the dump thread instead. There are, however, benefits in dividing it into multiple binlogs (e.g. backups could be done of different binlogs at different times. Purging could be done differently on different binlogs). It is not yet decided if this extension should be implemented. Lars suggests that the naming of the binlogs is separate from the naming of the schemas, i.e. no automatic naming. When you specify that you want this schema in that binlog, you can provide the binlog name then. This removes problems with renamed schemas etc. Also it makes it more flexible (e.g. perhaps we want binlogs on other filters than schemas) See also Guilhems notes in WL#1401. NOTES ----- There are corresponding ideas for filtering the query log, see WL#3017. RELATED BUGS/WLs ---------------- BUG#2917 BUG#21146 BUG#41267 BUG#55733 WL#1049
Use rpl_filter for the actual logic behind the filtering mechanisms (Master binlog filtering, master replication filtering and slave replication filtering), but that a cached variable on the table object makes sense. Add "uint32 table->s->flags" and the following enum in table->s: enum enum_flag { FILTER_BINLOG_SEND_F = (1U << 0), FILTER_BINLOG_WRITE_F = (1U << 1), FILTER_SLAVE_EXECUTE_F = (1U << 2) }; Whenever the table object is created, the corresponding rpl_filter object should be asked for how to set each flag.
Copyright (c) 2000, 2024, Oracle Corporation and/or its affiliates. All rights reserved.