The Multimaster Replication Problem

Replication has its problems, specially if you have a multimaster replication system. To make matters worse, none of the PHP frameworks support multimaster replication systems nor handle master failover. Symfony uses Propel and only supports master-slave replication systems. When the master fails, it’s true that you have the slaves ready to replace it, but the process of detecting the failure and acting upon it requires human intervention. Zend Framework, on the other hand, doesn’t support replication at all.

I strongly believe that a master failover needs to be handled appropriately on the application side. Of course, you can always use an SQL proxy or any other server-side solution, but they are either limited or unreliable.

From Digg’s Blog:

The Digg database access layer is written in PHP and lives at the level of the application server. Basically, when the application decides it needs to do a read query, it passes off the query with a descriptor to a method that grabs a list of servers for the database pool that can satisfy the query, then picks one at random, submits the query, and returns the results to the calling method.

If the server picked won’t respond in a very small amount of time, the code moves on to the next server in the list. So if MySQL is down on a database machine in one of the pools, the end-user of Digg doesn’t notice. This code is extremely robust and well-tested. We worry neither that shutting down MySQL on a read slave in the Digg cluster, nor a failure in alerting on a DB slave that dies will cause site degradation.

Every few months we consider using a SQL proxy to do this database pooling, failover, and balancing, but it’s a tough sell since the code is simple and works extremely well. Furthermore, the load balancing function itself is spread across all our Apaches. Hence there is no “single point of failure” as there would be if we had a single SQL proxy.

If you are building your own solution and need a sandbox to test it, I recommend using MySQL Sandbox. Also, you might find this script useful: MySQL Master-Master Replication Manager

Memcached consistent hashing mechanism

If you are using the Memcache functions through a PECL extension, you can set global runtime configuration options by specifying the values within your php.ini file. One of them is memcache.hash_strategy. This option sets the hashing mechanism used to select and specifies which hash strategy to use: Standard (default) or Consistent.

It’s recommended that you set to Consistent to allow servers to be added or removed from the pool without causing the keys to be remapped to other servers. When set to standard, an older strategy is used that potentially uses different servers for storage.

With PHP, the connections to the memcached instances are kept open as long as the PHP and associated Apache instance remain running. When adding a removing servers from the list in a running instance, the connections will be shared, but the script will only select among the instances explicitly configured within the script.

So, to ensure that changes to the server list within a script do not cause problems, make sure to use the consistent hashing mechanism.

Detect Replay Attacks in your Web Services

Many threats that are common to distributed systems are common to Web services as well. There are a few specific threats associated with the Web services processing model, such as:

  • Message replays: An attacker may re-play an entire message or a part of a SOAP message.
  • Man in the middle attack: An attacker may view and modify a SOAP message without the knowledge of either sender or the receiver.
  • Identity spoofing: An attempt to construct credentials that seems to be valid but not.
  • Denial of Service (DOS) attacks: An attempt to make a system expend its resources so that valid requests cannot access a service.
  • Message alteration: An attempt to alter a message compromising its integrity.
  • Confidentiality issues: Access to confidential information within a message by unauthorized parties.

Dimuthu wrote an interesting post about how to prevent replay attacks using WSF/PHP. He also shows how to detect them using WS-Addressing and WS-Username token headers.

Getting Started With Message Queues

When you’re building an infrastructure that is distributed all over the internet, you’ll come to a point where you can’t rely on synchronous remote calls that, for example, synchronize data on 2 servers:

  1. You don’t have any failover system that resends messages if something went wrong (network outages, software failures).
  2. Messages are processed over time and you have no control if something goes overloaded by too many requests.

Even if you don’t have to send messages all over the Internet there are enough points of failures where something can go wrong. You want a reliable and durable system that fails gracefully and ensure.

Solutions

Dropr

Dropr is a distributed message queue framework written in PHP. The main goals are:

  • Reliable and durable (failsafe)-messaging over networks.
  • Decentralized architecture without a single (point of failure) server instance.
  • Easy to setup and use.
  • Modularity for queue storage and message transports (currently filesystem storage and curl-upload are implemented).

More info

Beanstalkd

Beanstalkd is a fast, distributed, in-memory workqueue service. Its interface is generic, but was originally designed for reducing the latency of page views in high-volume web applications by running most time-consuming tasks asynchronously.

It was developed to improve the response time for the Causes on Facebook application (with over 9.5 million users). Beanstalkd drastically decreased the average response time for the most common pages to a tiny fraction of the original, significantly improving the user experience.

More info

Zend Platform Job Queues

Job Queues is an approach to streamline offline processing of PHP scripts. Job Queue Server provides the ability to reroute and delay the execution of PHP scripts that are not essential during user interaction with the Web Server. Job Queues improve the response time during interactive web sessions and utilizes unused resources.

More info

Memcached as simple message queue

In this post, Olly explains how to use memcached as a simple message queue:

Some months ago at work we were in the need of a message queue, a very simple one, basically just a message buffer. The idea is simple, the webservers send there messages to the queue, the queue always accepts all messages and waits until the ETL processes request messages for further processing. As the webservers are time critical and the ETL processes aren’t you need something in between.

More info

Links

How to Build a Web Hosting Infrastructure on EC2

Mike Brittain wrote:

In the months prior to leaving Heavy, I led an exciting project to build a hosting platform for our online products on top of Amazon’s Elastic Compute Cloud (EC2).  We eventually launched our newest product at Heavy using EC2 as the primary hosting platform.

We set out to build a fairly standard LAMP hosting infrastructure where we could easily and quickly add additional capacity.  In fact, we can add new servers to our production pool in under 20 minutes, from the time we call the “run instance” API at EC2, to the time when public traffic begins hitting the new server.  This includes machine startup time, adding custom server config files and cron jobs, rolling out application code, running smoke tests, and adding the machine to public DNS.

What follows is a general outline of how we do this.

Continue reading

Designing a CMS Architecture

François Zaninotto wrote:

When faced with the alternative between an off-the-shelf CMS or a custom development, many companies pick solutions like ezPublish or Drupal. In addition to being free, these CMS seem to fulfill all possible requirements. But while choosing an open-source solution is a great idea, going for a full-featured CMS may prove more expensive than designing and developing your own Custom Management System.

Given number of available open-source CMS solutions, building one on your own sounds like a stupid idea. But if your website is 50% content management and 50% something else, you probably need to start with a web application framework like Symfony or Django, rather than a CMS. These frameworks provide plugins that do part of the Content Management job already, so creating a CMS today is like assembling Lego bricks to build something that exactly fits your needs.

Continue reading

Code Refactoring Guidelines

In software engineering, “refactoring” source code means modifying it without changing its behaviour, and is sometimes informally referred to as “cleaning it up”. Refactoring neither fixes bugs nor adds new functionality, though it might precede either activity. Rather it improves the understandability of the code and changes its internal structure and design, and removes dead code, to make it easier to comprehend, more maintainable and amenable to change. Refactoring is usually motivated by the difficulty of adding new functionality to a program or fixing a bug in it.

Code Refactoring Guidelines

  1. Big Picture
  2. Extreme Abstraction
  3. Extreme Separation
  4. Extreme Readability
  5. Interfaces
  6. Error Handling
  7. General Issues
  8. Security
  9. General Objects

Favour object composition over class inheritance

What does “favour composition over inheritance” mean, and why is it a good thing to do?

Object composition and inheritance are two techniques for reusing functionality in object-oriented systems. In general, object composition should be favoured over inheritance. It promotes smaller, more focused classes and smaller inheritance hierarchies.

Troels Knak-Nielsen wrote:

Class inheritance is a mix of two concepts. The extending class inherits the parents implementation (functions/methods) and it inherits the parents type. In a statically typed language, the latter is fairly important, since you can’t freely mix types. So if some method expects an argument of a given type, you might use inheritance to satisfy this. In a dynamically typed language that is a non-issue. You can simply implement the expected behaviour and that’s all there is to it. If you need a more explicit contract, you can document it or – since PHP has sort of a middle-way on this matter – you could use a statically typed interface (Eg. implements Person, rather than extends Person) to do the same thing. Since PHP is dynamically typed, this (slightly more verbose and restrictive) solution is purely optional. You can just use an implicit contract (duck typing).

The other use of class inheritance is to reuse implementation. If your abstract class Person is extended by a subclass Employer, you would have access to the same code in Employer as you do in Person. You could achieve code-reuse with composition as well, but it takes a bit more work. Employer would have to implement a wrapper that delegates control to a Person instance in this case. Eg.:

class Person {
  function sayHello() {
    echo "Hello, World!";
  }
}
class Employer {
  protected $person;
  function __construct() {
    $this->person = new Person();
  }
  function sayHello() {
    $this->person->sayHello();
  }
}

rather than:

class Person {
  function sayHello() {
    echo "Hello, World!";
  }
}
class Employer extends Person {}

As you can see, slightly more work to do, which is why people often use inheritance in these cases. The cost however, is that the Person-Employer relationship is now set in stone; It can’t be changed or intercepted at runtime. There is also the matter of clarity. While the compositional code is more verbose, it is also very clear about what it does. You can look at the code and know what it does. With the inheritance version, you need to look at the superclass to find out what it does. Some times there are multiple levels of inheritance, making you trace up and down the chain to figure out exactly what code is available in the concrete class. Finally, there is the problem of multiple inheritance. In PHP, you can’t. You only have one shot at inheritance, so if you want to reuse code from two places, well, you’re out of luck.

Source: SitePoint Forums

Where is the include coming from?

The includes of the system map out the dependencies of the system, which files depend on which, which subsystem depends on which. When working with a system, it’s always useful to map out the dependencies before hand.

Here are some examples:

WordPress 2.2.1
http://wordpress.org

MediaWiki 1.12
http://www.mediawiki.org/

phpBB 3.0
http://www.phpbb.com/

phpMyAdmin 2.9.1.1
http://www.phpmyadmin.net/

Symfony 1.1
http://www.symfony-project.org/

Zend Framework 1.5.2
http://framework.zend.com/

CakePHP
http://www.cakephp.org/

CodeIgniter
http://codeigniter.com/

Seagull Framework
http://seagullproject.org/
(Demian Turner)

FileSyncTask: Using Phing to synchronize files and directories

I needed to automate the task of synchronizing files from one server to another, so I wrote a Phing task. Finally today I found some time to finish writing the documentation.

Overview

FileSyncTask is a Phing extension for Unix systems which synchronizes files and directories from one location to another while minimizing data transfer. FileSyncTask can copy or display directory contents and copy files, optionally using compression and recursion.

Rather than using FTP or some other form of file transfer, FileSyncTask uses rsync to copy only the diffs of files that have actually changed. Only actual changed pieces of files are transferred, rather than the whole file, which results in transferring only a small amount of data and are very fast. FTP, for example, would transfer the entire file, even if only one byte changed. The tiny pieces of diffs are then compressed on the fly, further saving you file transfer time and reducing the load on the network.

FileSyncTask can be used to synchronize Website trees from staging to production servers and to backup key areas of the filesystems.

Usage

There are 4 different ways of using FileSyncTask:

  1. For copying local files.
  2. For copying from the local machine to a remote machine using a remote shell program as the transport (ssh).
  3. For copying from a remote machine to the local machine using a remote shell program.
  4. For listing files on a remote machine.

The SSH client called by FileSyncTask uses settings from the build.properties file:

sync.source.projectdir=/home/development.com/public
sync.destination.projectdir=/home/staging.com
sync.remote.host=server.com
sync.remote.user=user
sync.destination.backupdir=/home/staging.com/backup
sync.exclude.file=/home/development.com/build/sync.exclude

Listing files

The “listonly” option will cause the modified files to be listed instead of transferred. You must specify a source and a destination, one of which may be remote.

<taskdef name="sync" classname="phing.tasks.ext.FileSyncTask" />
<sync
    sourcedir="${sync.source.projectdir}"
    destinationdir="${sync.destination.projectdir}"
    listonly="true"
    verbose="true" />

Excluding irrelevant files

To exclude files from synchronizations, open and edit the sync.exclude file under the build/ directory. Each line can contain a file, a directory, or a pattern:

*~
.svn
.htaccess
public/index.development.php
public/images/uploads/*
build/*
log/*
tmp/*

Copying files to a remote machine

The following task definition will transfer files from a local source to a remote destination:

<taskdef name="sync" classname="phing.tasks.ext.FileSyncTask" />
<sync
    sourcedir="${sync.source.projectdir}"
    destinationdir="${sync.remote.user}@${sync.remote.host}:${sync.destination.projectdir}"
    excludefile="${sync.exclude.file}"
    verbose="true" />

Example

Directory structure

In order to separate the sync settings from the main build file, I’ve created a file called sync.properties:

development.com
|-- build
|   |-- build.properties
|   |-- build.xml
|   |-- sync.exclude
|   `-- sync.properties
`-- public
    `-- index.php

XML build file

Phing uses XML build files that contain a description of the things to do. The build file is structured into targets that contain the actual commands to perform:

<?xml version="1.0" ?>
<project name="example" basedir="." default="build">
<property name="version" value="1.0" />

<!-- Public targets -->
<target name="sync:list" description="List files">
  <phingcall target="-sync-execute-task">
    <property name="listonly" value="true" />
  </phingcall>
</target>

<target name="sync" description="Copy files">
  <phingcall target="-sync-execute-task">
    <property name="listonly" value="false" />
  </phingcall>
</target>

<!-- Private targets -->
<target name="-init" description="Load main settings">
  <tstamp />
  <property file="build.properties" />
</target>

<target name="-sync-execute-task" depends="-init">
  <property file="sync.properties" />
  <if>
    <not>
      <isset property="sync.verbose" />
    </not>
    <then>
      <property name="sync.verbose" value="true" override="true" />
      <echo message="The value of sync.verbose has been set to true" />
    </then>
  </if>
  <property name="sync.remote.auth" value="${sync.remote.user}@${sync.remote.host}" />
  <taskdef name="sync" classname="phing.tasks.ext.FileSyncTask" />
  <sync
    sourcedir="${sync.source.projectdir}"
    destinationdir="${sync.remote.auth}:${sync.destination.projectdir}"
    backupdir="${sync.remote.auth}:${sync.destination.backupdir}"
    excludefile="${sync.exclude.file}"
    listonly="${listonly}"
    verbose="${sync.verbose}" />
</target>
</project>

Execute task

$ phing sync:list

Outputs:

Buildfile: /home/development.com/build/build.xml

example > sync:list:
[phingcall] Calling Buildfile '/home/development.com/build/build.xml'
            with target '-sync-execute-task'

example > -init:
[property] Loading /home/development.com/build/build.properties

example > -sync-execute-task:
[property] Loading /home/development.com/build/sync.properties
[echo] The value of sync.verbose has been set to true

Execute Command
----------------------------------------
rsync -razv --list-only -b --backup-dir

Sync files to remote server
----------------------------------------
Source:        /home/development.com/public
Destination:   user@server.com:/home/staging.com
Backup:        user@server.com:/home/staging.com/backup

Exclude patterns
----------------------------------------
*~
.svn
.htaccess
public/index.development.php
public/images/uploads/*
build/*
log/*
tmp/*

(list of files that have changed)

BUILD FINISHED
Total time: 1.9763 second

More information

Related Articles