The Multimaster Replication Problem

Replication has its problems, specially if you have a multimaster replication system. To make matters worse, none of the PHP frameworks support multimaster replication systems nor handle master failover. Symfony uses Propel and only supports master-slave replication systems. When the master fails, it’s true that you have the slaves ready to replace it, but the process of detecting the failure and acting upon it requires human intervention. Zend Framework, on the other hand, doesn’t support replication at all.

I strongly believe that a master failover needs to be handled appropriately on the application side. Of course, you can always use an SQL proxy or any other server-side solution, but they are either limited or unreliable.

From Digg’s Blog:

The Digg database access layer is written in PHP and lives at the level of the application server. Basically, when the application decides it needs to do a read query, it passes off the query with a descriptor to a method that grabs a list of servers for the database pool that can satisfy the query, then picks one at random, submits the query, and returns the results to the calling method.

If the server picked won’t respond in a very small amount of time, the code moves on to the next server in the list. So if MySQL is down on a database machine in one of the pools, the end-user of Digg doesn’t notice. This code is extremely robust and well-tested. We worry neither that shutting down MySQL on a read slave in the Digg cluster, nor a failure in alerting on a DB slave that dies will cause site degradation.

Every few months we consider using a SQL proxy to do this database pooling, failover, and balancing, but it’s a tough sell since the code is simple and works extremely well. Furthermore, the load balancing function itself is spread across all our Apaches. Hence there is no “single point of failure” as there would be if we had a single SQL proxy.

If you are building your own solution and need a sandbox to test it, I recommend using MySQL Sandbox. Also, you might find this script useful: MySQL Master-Master Replication Manager

Full-text searching with MySQL

MySQL’s full-text search functions provide a simple framework for an easily implemented, approximate site search. Many sites, written in an interpreted language and powered by MySQL, can use MySQL’s full-text search to avoid third party dependencies.

The basics of the MySQL full-text search functions are well-documented in the MySQL online documentation. For those lacking patience, here is a quick rundown.

Continue reading

MySQL Master-Master Replication Manager

The MySQL Master-Master replication (often in active-passive mode) is popular pattern used by many companies using MySQL for scale out. Most of the companies would have some internal scripts to handle things as automatic fallback and slave cloning but no Open Source solution was made available.

In 2007, the High Performance Group at MySQL AB was asked to implement such solution for one of their customers and they kindly agreed to let them release the software under GPL2 license.

The MySQL Master-Master Replication Manager (MMM) is a set of flexible scripts to perform monitoring/failover and management of MySQL Master-Master replication configurations (with only one node writable at any time). The toolset also has the ability to read balance standard master/slave configurations with any number of slaves, so you can use it to move virtual IP addresses around a group of servers depending on whether they are behind in replication.

Links

Agile Database Deployment Using Phing

Phing allows you to use SQL to define changes to your database schema, making it possible to use a version control system to keep things synchronized with the actual code.

A common way to automate development and deployment tasks is by writing shell scripts, however, Phing provides some advantages over shell scripts for task automation. Most of the shell scripts I created so far help me automate the most tedious tasks, from configuring, building, testing and documenting applications to synchronizing files and migrating database schemas. But, having a large collection of shell scripts can lead to a maintenance nightmare, reason why I decided to port some of them to PHP as Phing tasks. I chose Phing because it’s simple, powerful and very easy to extend.

DbDeployTask

Phing offers an optional task for making revisions to a database, it’s called DbDeployTask. Dave Marshall wrote a nice article about it. After adding this task to my build script and testing it, I came to the conclusion that DbDeployTask is not very effective if you want to roll out incremental changes to the databases. Also, it introduces new problems to the database migration process.

For example, if you use DbDeployTask, the SQL script names must begin with a number that indicates the order in which they should be run:

basedir/
  database/
    deltas/
      1-create_foo_table.sql
      2-alter_foo_table.sql
      3-add_constraint_id_to_foo.sql
    history/
      up_20080622120000.sql
      down_20080622120000.sql
  public/

But, what happens when 2 or more developers are making changes to the same database design? How do they know which scripts have been executed? What if they don’t use a central development database, how do they merge individual changes? To be honest, I don’t know.

Agile Database Deployment

A more Agile way to manage database changes is for the developers to connect to a central development database. After making any changes to the database, the developers will have to:

  1. Update the local database/ directory before modifying any script.
  2. Copy the changes made to the database to the corresponding SQL file (up and down).
  3. Check the files back into the repository.

Finally, the changes can be propagated to other databases by executing a script or setting up a Cron job.

This is how the database directory can be structured:

basedir/
  database/
    down/
      alter/
      constraints/
      create/
      data/
    up/
      alter/
      constraints/
      create/
      data/
  public/

To put your database definition under version control, you need an off-line representation of that database. The directory up/ is used when migrating to a new version, down/ is used to roll back any changes if needed. These directories contain all the changes made to the database:

up/
  alter/
    foo.table.sql
  constraints/
    foo.table.sql
  create/
    foo.table.sql
  data/

This directory structure is based on the one suggested by Mike Hostetler, and it clearly offers more advantages. After the team has made all necessary changes, the next step is to deploy the schema. If you are responsible for performing that task, you can get the most recent version of the database project from version control, build the deployment script, manually update that script as necessary, and then run the script to deploy the schema changes into development, staging or production.

Learn More

Database Schema Deployment (PDF) – By Lukas Smith

10 great articles for optimizing MySQL queries

Optimization is a complex task because ultimately it requires understanding of the entire system to be optimized. Although it may be possible to perform some local optimizations with little knowledge of your system or application, the more optimal you want your system to become, the more you must know about it.

The following 10 articles explain and give some examples of different ways to optimize MySQL:

  1. Identifying slow MySQL queries
  2. MySQL & PHP Performance Optimization Tips
  3. MySQL Presentations: Optimizing MySQL
  4. MySQL 5 Optimization and Tuning Guide
  5. Survive heavy traffic with your Web server
  6. Tools to use for MySQL Performance – Peter Zaitsev
  7. MySQL Queries Optimization (video) – Peter Zaitsev
  8. Table Design and MySQL Index Details
  9. MySQL Reference Manual: Optimization Overview
  10. MySQL Reference Manual: The MySQL Query Cache

Next generation data storage with CouchDB

Most digital data we deal with in the real world is not inherently relational in nature, yet most web applications use a SQL RDBMS for data persistence. CouchDB is fundamentally a different type of database, storing data as independent “Document” objects. They can be easily replicated to other instances of CouchDB for distributed and offline editing and querying, and also for high availability and load balancing reasons.

This presentation explains the mindset behind the Open Source database CouchDB. It explains scenarios where CouchDB’s features provide elegant solutions to problems where traditional database systems struggle, and vice-versa. Even without knowing much about data storage, you will gain some insight about what is possible today.

Jan Lehnardt is an Open Source software consultant spcialized on internet technologies. He has years of experience with building small- and big-scale database backed applications.

Download Video (MPG4)


CouchDB: A document-oriented database accessible via a RESTful HTTP/JSON API

CouchDB was accepted for incubation at the Apache Software Foundation a couple of months ago. My congrats to the development team! It’s a very interesting and challenging project, and the fact that it was accepted for incubation will definitely get the team exited.

What is CouchDB?

  • A document database server, accessible via a RESTful JSON API.
  • Ad-hoc and schema-free with a flat address space.
  • Distributed, featuring robust, incremental replication with bi-directional conflict detection and management.
  • Query-able and index-able, featuring a table oriented reporting engine that uses Javascript as a query language.

Tutorials