WL#2516: Filtering semantics change
Affects: Server-7.1
—
Status: Un-Assigned
SUMMARY ------- Fix the semantics of the filtering to consider *effective* database (that the statement effects) instead of *current* database (defined by USE statements). Since we in 5.1 have a class that is used for filtering, the same new semantics will apply to both binlog and replication slave filtering (and replication master filtering WL#2387, when that is implemented). DECISIONS/OPINIONS --------------------- - Lars thinks that it is unfortunate for RBR that DDL is filtered on current database and DML on effective database - Guilhem is ok with this as long as it is documented in update section of manual - Mats thinks: "I was thinking of considering this a bug when using RBR only. I see no problem with adding option of using the same filtering for SBR, but the default behaviour should be the same as before. The problem with using this filtering for SBR is the same as always: what to do with multi-update statements where one of the tables are filtered and one is not. To have complicated rules for deciding this under SBR is not a good thing; it's better to keep the existing (and very simple) rule of always using the current database." - This WL might be implemented together with scripting support, WL#4008. -- Lars, 2007-09-12 MOTIVATION ---------- 1. There are bug reports (e.g. BUG#8791) on the semantics of filtering and the semantics seems a bit tricky to grasp. 2. This is much simpler for user, since: 1. There is only three parameters to change (DO, IGNORE and REWRITE) 2. Rules are simpler 3. Uniform treatment between binlog and replication 3. It is unnatural to consider only the current database (Lars thinks. Mats? Guilhem?) when doing filtering. 4. For RBR, the rows are already filtered on effective database Guilhem (old) comments: I think that this topic is not the most urgent and should not consume too much energy, that there are now rather few users/customers having problems with the current filtering rules, and that this change will cause some of them some digging into their apps to find out why it does not work anymore. Some big customers with whom I'm dealing presently use the options you propose to remove. And the drawback of "effective" is that it requires parsing of the query, whereas "current" does not and is hence faster. So, if we need to change things is still an open question to me; if we do need, I agree that a new release branch is the place to change things. Given the nature of affected customers, I am not sure I am ok, so I would like Brian to explicitely give the ok on this topic. OLD SYNTAX ---------- The options: --binlog-do-db --binlog-ignore-db --replicate-do-db --replicate-do-table --replicate-ignore-db=DB_NAME --replicate-ignore-table=DB_NAME.TBL_NAME --replicate-wild-do-table=DB_NAME.TBL_NAME --replicate-wild-ignore-table=DB_NAME.TBL_NAME --replicate-rewrite-db=FROM_NAME->TO_NAME are all removed in this WL and cause an "unknown option" error. SUGGESTED NEW SYNTAX -------------------- Instead we have the following commands for replication: CHANGE MASTER 'foo' TO DO='db1.%', IGNORE='%.table2', REWRITE=db1.%->db2.%; The dot between database and table is mandatory. If there is a '%' as table or db, then this corresponds to the old options --binlog-do-db etc. (For REWRITE, first implementation can allow simple regular expressions only.) The following command can be implemented for binlog: CHANGE BINLOG 'bar' TO DO='db1.%', IGNORE='%_backup.table2'; NEW SEMANTICS ------------- For each statement having an effect on a database and/or a table, the following are the rules to see if it should be written or ignored: 1. Are there any rules defined? Yes: Go to next step No: Write statement and exit 2. Is there any do expression that matches effective database and table? Yes: Write statement, and exit No: Go to next step 3. Is there any ignore expression that matches effective database and table? Yes: Do not write statement, and exit No: Write statement, and exit NOTES ----- - Q: What if customers use USE for binlog on/off? A: If there are many customers that use USE to turn on/off binlogging, we should encourage them to instead use: SET SQL_LOG_BIN = {0|1} - Q: Isn't is slower? A: To get the same speed for IGNORE=db.% as for --replicate-ignore-db we should separately store all ignored databases so that there is no need to do real matching each time it is checked. This can be done in the rpl_filter object. - Q: Thought which arose from a question of a big customer "I want to include all tables like "abc.%" except the "%.EFG"". This can't be done now. If we supported Perl regexps we could do it with this pattern: /^abc\.(?!EFG)/ (I could not find an equivalent in the regex library MySQL uses). A: With the new semantics it is almost as powerful as having real regexps with negation. CURRENT SEMANTICS (4.1 and 5.0, but 4.0 does not completely follow it) ---------------------------------------------------------------------- (As copied from manual.) Binlog filtering ---------------- 1. Are there `binlog-do-db' or `binlog-ignore-db' rules? * No: Write the statement to the binary log and exit. * Yes: Go to the next step. 2. There are some rules (`binlog-do-db' or `binlog-ignore-db' or both). Is there a current database (has any database been selected by `USE'?)? * No: Do _not_ write the statement, and exit. * Yes: Go to the next step. 3. There is a current database. Are there some `binlog-do-db' rules? * Yes: Does the current database match any of the `binlog-do-db' rules? * Yes: Write the statement and exit. * No: Do _not_ write the statement, and exit. * No: Go to the next step. 4. There are some `binlog-ignore-db' rules. Does the current database match any of the `binlog-ignore-db' rules? * Yes: Do not write the statement, and exit. * No: Write the query and exit. Replication slave filtering --------------------------- 1. Are there some `--replicate-do-db' or `--replicate-ignore-db' rules? * Yes: Test them as for `--binlog-do-db' and `--binlog-ignore-db' (*note Binary log::). What is the result of the test? - Ignore the statement: Ignore it and exit. - Execute the statement: Don't execute it immediately, defer the decision, go to the next step. * No: Go to the next step. 2. Are there some `--replicate-*-table' rules? * No: Execute the query and exit. * Yes: Go to the next step. Only tables that are to be updated are compared to the rules (`INSERT INTO sales SELECT * FROM prices': only `sales' are compared to the rules). If several tables are to be updated (multiple-table statement), the first matching table (matching "do" or "ignore") wins. That is, the first table is compared to the rules. Then, if no decision could be mad, the second table is compared to the rules, and so forth. 3. Are there some `--replicate-do-table' rules? * Yes: Does the table match any of them? - Yes: Execute the query and exit. - No: Go to the next step. * No: Go to the next step. 4. Are there some `--replicate-ignore-table' rules? * Yes: Does the table match any of them? - Yes: Ignore the query and exit. - No: Go to the next step. * No: Go to the next step. 5. Are there some `--replicate-wild-do-table' rules? * Yes: Does the table match any of them? - Yes: Execute the query and exit. - No: Go to the next step. * No: Go to the next step. 6. Are there some `--replicate-wild-ignore-table' rules? * Yes: Does the table match any of them? - Yes: Ignore the query and exit. - No: Go to the next step. * No: Go to the next step. 7. No `--replicate-*-table' rule was matched. Is there another table to test against these rules? * Yes: Loop. * No: We have tested all tables to be updated and could not match any rule. Are there `--replicate-do-table' or `--replicate-wild-do-table' rules? - Yes: Ignore the query and exit. - No: Execute the query and exit. DISCUSSION TO TAKE INTO ACCOUNT ------------------------------------------------- On Tue, Jan 31, 2006 at 09:45:40AM +0100, Guilhem Bichot wrote: > Hi Lars, > > On Mon, Jan 30, 2006 at 11:47:24PM +0100, Lars Thalmann wrote: > > On Mon, Jan 30, 2006 at 04:38:30PM +0100, Mats Kindahl wrote: > > > Stefan Hinz wrote: > > > > > > >Mats, > > > > > > > >> > > > >>>Row-based replication : > > > >>> All table row changes are filtered individually. For > > > >>> multi-table updates, each table is filtered separately > > > >>> according to the rules. Some may be changed and some not, > > > >>> depending on the rules (and the actual changes, of course). > > > >> > > > >> > > > >>A minor thing, just as a clarification (this might need to be added > > > >>elsewhere): > > > >> > > > >> In addition, observe that the changes are filtered based on the > > > >>actual database as well, not based on the currently selected database. > > > > > > > > > > > >Good point. Let me see if I get this right. Assuming we replicate the > > > >'foo' database only, and the current (or default) database on the > > > >master is 'bar', this happens when I issue statements: > > > > > > > >- INSERT INTO foo.sometable VALUES (1) > > > > - SBR: not replicated > > > > - RBR: replicated > > > > > > > >- CREATE TABLE foo.sometable (i INT) > > > > - SBR: not replicated > > > > - RBR: replicated (sure ?) > > > > > > > > > Ugh... got me there (I think). Actually, it's only the binrow events > > > that are filtered based on the actual tables changed. The latter one, > > > the create statement, is replicated using statement-based replication, > > > so it is not replicated. Hmmm.... maybe we should consider this a bug to > > > get consistent behaviour for all row-based replication... > > > > Yes, I think that is a good idea. We need to support the old > > filtering too and it would be nice if this > > filter-by-actual-tables-changed would be possible also for SBR > > (perhaps with an option). > > > > This work actually already has a WL, namely WL#2516. Guilhem was a > > bit skeptical before, but now with RBR I think this becomes even more > > important. > > Well it becomes impossible to resist it, as the Rows_log_event don't > contain the default (USE'd) database... > As long as the big change is documented in the Upgrade section...
Copyright (c) 2000, 2024, Oracle Corporation and/or its affiliates. All rights reserved.