Writing a custom conflict handler for Bucardo 4

Bucardo, an asynchronous multi-master replication system (or, maybe the asynchronous multi-master replication system for PostgreSQL, cause I know of no other actively developed ones), deals with replication conflicts by providing conflict handlers in a form of standard (built-in) or custom code procedures. The built-in conflict handlers resolve a conflict by taking a row from the source or the target databases (in Bucardo 4 only 2 masters are supported, called 'source' and 'target' below), or using a delta row timestamp to determine the winner; other options include skipping conflicting rows altogether or picking one at random, but they are not very useful if you need the data to be consistent between replicas. For more complex conflict resolution rules it's necessary to write a custom conflict handler and, since the documentation is really scarce on that matter, I've decided to show how to create a simple one.

For the test we'll be using a simple database named 'test' with a single table:

CREATE TABLE test.products(product_id INTEGER PRIMARY KEY, name VARCHAR, 
price float); 

Suppose the table exists on both replicas, initially empty. Let's setup a simple Bucardo master-to-master replication:

bucardo_ctl add database source_test name=test host=source.local user=postgres
bucardo_ctl add database target_test name=test host=target.local user=postgres
bucardo_ctl add table test.products db=source_test herd=sample
bucardo_ctl add sync test_swap source=sample targetdb=target_test type=swap
Yikes, we've got an error:
WARNING: Issuing rollback() due to DESTROY without explicit disconnect() of
DBD::Pg::db handle dbname=test;host=source.local;port=5432 at line 29.
CONTEXT:  PL/Perl function "validate_sync"
SQL statement "SELECT validate_sync('test_swap')"
PL/Perl function "validate_sync"
Failed to add sync: DBD::Pg::st execute failed: ERROR: Table "test.products"
must specify a way to handle conflicts at line 285. at line 30.
CONTEXT: PL/Perl function "validate_sync" at /usr/local/bin/bucardo_ctl line
3362.

The problem is that we haven't specified how replication conflicts should be resolved. For this, let's add a custom handler to the 'test' table and try to add the sync once more:

bucardo_ctl add customcode example src_code=code/sample.pl whenrun=conflict 
goat=1
bucardo_ctl add sync test_swap source=sample targetdb=target_test type=swap

The sync is added normally. You can start Bucardo with bucardo_ctl start and replicate a couple of rows to make sure it works.

bucardo_ctl start

psql -h source test -c "INSERT INTO test.products(id, name, price)
SELECT id, 'product '||id, id+0.99 from generate_series(1,3) id"
psql -h source test -c "INSERT INTO test.products(id, name, price)
SELECT id, 'product '||id, id+0.99 from generate_series(4,5) id"

...
psql -h source test -C 'SELECT count(*) FROM test.products
psql -h target test -C 'SELECT count(*) FROM test.products

If everything works normally, 5 should be returned for both source and target databases.

Let's examine the custom conflict handler line. Obviously, the code should only be executed during conflicts, as indicated by the whenrun option. The line also contains the ID of the goat (table) the code should be added to. I'm not aware of a variant of bucardo_ctl command to get that id, but it's pretty simple once you recall that Bucardo keeps all the settings in the database and per-table properties are stored in the bucardo.goat table:

  select id from goats where name='test.products';  

The command also suggests that the custom code should be written in Perl, much like Bucardo itself. Below is the contents of sample.pl. The conflicts are resolved by choosing the row with the highest price value. Here's the code from sample.pl

my ($args) = @_;
return if (exists $args->{dummy});
my $sourceprice = $args->{rowinfo}{sourcerow}{price};
my $targetprice = $args->{rowinfo}{targetrow}{price};
my $winningdb = "";
if ($sourceprice > $targetprice) {
       $args->{rowinfo}{action} = 1;
	   $winningdb = $args->{sourcename}
} else {
	   $winningdb = $args->{targetname}
       $args->{rowinfo}{action} = 2;
}
$args->{message} = "$winningdb won, price: ".$targetprice;

The $args->{dummy} is the magic that Bucardo verifies to identify its own custom code. The code following that line reads the price values from source and target rows, compares them and decides which database wins. The lines that make Bucardo perform specific actions are the ones that assign new values to the $args->{rowinfo}{action} attribute. The value should be a bitmap of the following flags (the description below is actually taken directly from Bucardo source code).

	1 = Add source row to the target db
	2 = Add target row to the source db
	4 = Add source row to the source db
	8 = Add target row to the target db

Let's update source and target database rows with the same product_id but with different price values, simulating a conflict:

psql -h source -c 'update test.products set price=10 where product_id = 1' 
test;
psql -h target -c 'update test.products set price=5 where product_id = 1' 
test
UPDATE 1
UPDATE 1

The commands above lead to some activity in the Bucardo log. The relevant messages are:

[Wed Nov 21 10:13:56 2012] CTL Got notice "bucardo_ctl_kick_test_swap" from
24779
[Wed Nov 21 10:13:56 2012]  KID Target delta count for test.products: 1
[Wed Nov 21 10:13:56 2012]  KID Total source delta count: 1
[Wed Nov 21 10:13:56 2012]  KID Total target delta count: 1
[Wed Nov 21 10:13:56 2012]  KID Total delta count: 2
[Wed Nov 21 10:13:56 2012] KID Logged details of conflict to
bucardo_conflict.log
[Wed Nov 21 10:13:56 2012]  KID Running conflict custom code 22: example
[Wed Nov 21 10:13:56 2012]  KID Finished custom code 22
[Wed Nov 21 10:13:56 2012] KID Message from conflict code 22: source_test won,
price: 10
[Wed Nov 21 10:13:56 2012]  KID Conflict handler action: 1
[Wed Nov 21 10:13:56 2012]  KID Action summary: 1:1
[Wed Nov 21 10:13:56 2012]  KID [1/1] test.products UPDATE source to target 
pk 1

Lines above indicate that the target database will be updated with the source row, since it has the highest price value among the two. Note the message from the custom code: it's the line written to $args->{message}. A good way to get yourself acquainted with various information stored in args is to use Data::Dumper to dump that information into $args->{message} (the other way is to read the source code, e.g. search for rowinfo hash inside Bucardo.pm).

Let's check that the conflict is actually resolved in favor of the source database:

psql -U bucardo -d test -h source -c 'SELECT price FROM test.products WHERE
product_id = 1'
 price 
-------
    10
(1 row)

psql -U bucardo -d test -h target  -c 'SELECT price FROM test.products
WHERE product_id = 1'
 price 
-------
    10
(1 row)

At this point you may wonder about the last two actions in the bitmap above. Why might it be necessary to add the source row to the source database or the target row to a the target one? The answer becomes obvious once you learn that the handler can modify the rows written to the respective databases by changing the contents of the %rowinfo hash. For instance, let's write a new custom code that updates both conflicting rows with the average of their price values. Since the updated rows will be different from the original source and target ones, they should be propagated to both databases. That's when those two final bitmap actions come in handy. The new code looks like this:

my ($args) = @_;
return if (exists $args->{dummy});
my $sourceprice = $args->{rowinfo}{sourcerow}{'price'};
my $targetprice = $args->{rowinfo}{targetrow}{'price'};

my $newprice = ($sourceprice + $targetprice)/2;
$args->{message} = "Updating $args->{sourcename} and 
$args->{targetname} with price $newprice";

$args->{rowinfo}{sourcerow}{'price'} = $newprice;
$args->{rowinfo}{targetrow}{'price'} = $newprice;

# update source db with source row + update target db with target row
$args->{rowinfo}{action} = 12; 
To install the new code it's necessary to reload the corresponding sync:
bucardo_ctl update code sample src_code=src/sample_new.pl
bucardo_ctl reload sync test_swap

bucardo_ctl reload test_swap
Reloading sync test_swap...

Let's check the result:

psql -h source -c 'update test.products set price=20 where product_id = 1' 
test;
psql -h target -c 'update test.products set price=40 where product_id = 1' 
test
UPDATE 1
UPDATE 1

The following lines appear in the log:

[Wed Nov 21 16:29:04 2012] CTL Got notice "bucardo_ctl_kick_test_swap" from
24779
[Wed Nov 21 16:29:04 2012]  KID Target delta count for test.products: 1
[Wed Nov 21 16:29:04 2012]  KID Total source delta count: 1
[Wed Nov 21 16:29:04 2012]  KID Total target delta count: 1
[Wed Nov 21 16:29:04 2012]  KID Total delta count: 2
[Wed Nov 21 16:29:04 2012] KID Logged details of conflict to
bucardo_conflict.log
[Wed Nov 21 16:29:04 2012]  KID Running conflict custom code 22: example
[Wed Nov 21 16:29:04 2012]  KID Finished custom code 22
[Wed Nov 21 16:29:04 2012] KID Message from conflict code 22: Updating
source_test and target_test with price 30
[Wed Nov 21 16:29:04 2012]  KID Conflict handler action: 12
[Wed Nov 21 16:29:04 2012]  KID Action summary: 12:1
[Wed Nov 21 16:29:04 2012] KID [1/1] test.products UPDATE source to source pk 1
[Wed Nov 21 16:29:04 2012] KID [1/1] test.products UPDATE target to target pk 1
[Wed Nov 21 16:29:04 2012] KID Updating bucardo_track for test.products on
source_test
[Wed Nov 21 16:29:04 2012] KID Updating bucardo_track for test.products on
target_test
[Wed Nov 21 16:29:04 2012]  KID Issuing final commit for source and target
And the result is exactly the average of the row price values:

psql -U bucardo -d test -h source -c 'SELECT price FROM test.products WHERE
product_id = 1'
 price 
-------
    30
(1 row)

psql -U bucardo -d test -h target -c 'SELECT price FROM test.products WHERE
product_id = 1'
 price 
-------
    30
(1 row)
This post highlights only a small subset of Bucardo custom code capabilities. At the moment, the only way to learn more is by digging into the source code or reading the test cases provided with it. I hope it will be useful to anyone learning about Bucardo multi-master capabilities. Bucardo 5, the new version of Bucardo, may add even more features to conflict resolution code (and, hopefully, make it easier and better documented). Meanwhile, if you are willing to share your examples of Bucardo 4 custom code, use the comments to this post.