NoSQL solutions: Membase, Redis, CouchDB and MongoDB

Each database has specific use cases and every solution has a sweet spot in terms of data, hardware, setup and operation. Here are some of the most popular key-value and document data stores:

Key-value

Membase

  • Developed by members of the memcached core team.
  • Simple (key value store), fast (low, predictable latency) and elastic (effortlessly grow or shrink a cluster).
  • Extensions are possible through a plug-in architecture (full-text search, backup, etc).
  • Supports Memcached ASCII and Binary protocols (uses existent Memcached libraries and clients).
  • Guarantees data consistency.
  • High-speed failover (server failures recoverable in under 100ms).
  • User management, alerts and logging and audit trail.

Redis

  • Developed by Salvatore Sanfilippo and acquired by VMWare in 2010.
  • Very fast. Non-blocking I/O. Single threaded.
  • Data is held in memory but can be persisted by written to disk asynchronously.
  • Values can be strings, lists or sets.
  • Built-in support for master/slave replication.
  • Distributes the dataset across multiple Redis instances.

Document-oriented

The major benefit of using a document database comes from the fact that while it has all the benefits of a key/value store, you aren’t limited to just querying by key. However, documented-oriented databases and MapReduce aren’t appropriate for every situation.

CouchDB

  • High read performance.
  • Supports bulk inserts.
  • Good for consistent master-master replica databases that are geographically distributed and often offline.
  • Good for intense versioning.
  • Android, MeeGo and WebOS include services for syncing locally stored data with a CouchDB non-relational database in the cloud.
  • Better than MongoDB at durability.
  • Uses REST as its interface to the database. It doesn’t have “queries” but instead uses “views”.
  • Makes heavy use of the file system cache (so more RAM is always better).
  • The database must be compacted periodically.
  • Conflicts on transactions must be handled by the programmer manually (e.g. if someone else has updated the document since it was fetched, then CouchDB relies on the application to resolve versioning issues).
  • Scales through asynchronous replication but lacks an auto-sharding mechanism. Reads are distributed to any server while writes must be propagated to all servers.

MongoDB

  • High write performance. Good for systems with very high update rates.
  • It has the flexibility to replace a relational database in a wider range of scenarios.
  • Supports auto-sharding.
  • More oriented towards master/slave replication.
  • Compaction of the database is not necessary.
  • Both CouchDB and MongoDB support map/reduce operations.
  • Supports dynamic ad hoc queries via a JSON-style query language.
  • The pre-filtering provided by the query attribute doesn’t have a direct counterpart in CouchDB. It also allows post-filtering of aggregated values.
  • Relies on language-specific database drivers for access to the database.

Links

OSCON 2010, The O’Reilly Open Source Convention

A couple of weeks ago I attended the O’Reilly Open Source Convention (OSCON) in Portland. OSCON has hundreds of sessions and activities focused on all aspects of open source software. I met some great people, the talks were good and I saw some promising ideas and technologies.

Workshops attended

  • Android for Java Developers
    Marko Gargenta (Marakana)
  • Building a NoSQL Data Cloud
    Krishna Sankar (Cisco Systems Inc)
  • Building Native Mobile Apps Using Open Source
    Kevin Whinnery (Appcelerator)

Sessions attended

  • Building Mobile Apps with HTML, CSS, and JavaScript
    Jonathan Stark (Jonathan Stark Consulting)
  • Open Source Tool Chains for Cloud Computing
    Mark Hinkle (Zenoss), John Willis (Opscode, Inc.), Alex Honor
  • Doctor, I Have a Problem with My Innovation.
    Rolf Skyberg (eBay, Inc.)
  • Ingex: Bringing Open Source to the Broadcast Industry
    By Brendan Quinn (BBC R&D)
  • membase.org: The Simple, Fast, Elastic NoSQL Database
    Matt Ingenthron (NorthScale, Inc.)
  • Introducing WebM: High Quality, Royalty-Free, Open Source Video
    John Koleszar (Google, Inc.)
  • Whiskey, Tango, Foxtrot: Understanding API Activity
    Clay Loveless (Mashery)
  • Deploying an Open Source Private Cloud On a Shoe String Budget
    Louis Danuser (AT&T Labs, Inc.)
  • Eucalyptus: The Open Source Infrastructure for Cloud Computing
    Shashi Mysore (Eucalyptus Systems Inc.)
  • Hadoop, Pig, and Twitter
    Kevin Weil (Twitter, Inc.)
  • Mahout: Mammoth Scale Machine Learning
    Robin Anil (Apache Software Foundation)
  • BlackBerry development for Web Application Developers
    Kevin Falcone (Best Practical Solutions)
  • Practical Concurrency
    Tim Bray (Google, Inc.)
  • Scribe – Moving Data at Massive Scale
    Robert Johnson (Facebook)
  • Make Open Easy
    Dan Bentley (Google)

Implementing Dynamic Finders and Parsing Method Expressions

Most ORMs support the concept of dynamic finders. A dynamic finder looks like a normal method invocation, but the method itself doesn’t exist, instead, it’s generated dynamically and processed via another method at runtime.

A good example of this is Ruby. When you invoke a method that doesn’t exist, it raises a NoMethodError exception, unless you define “method_missing”. Rails ActiveRecord::Base class implements some of its magic thanks to this method. For example, find_by_title(title) and find_by_title_and_date(title, date) are turned into:

find(:first, :conditions => ["title = ?", title])
find(:first, :conditions => ["title = ? AND date = ?", title, date])

What’s nice about Ruby is that the language allows you to define methods dynamically using the “define_method” method. That’s how Rails defines each dynamic finder in the class after it is first invoked, so that future attempts to use it do not run through the “method_missing” method.

Method Expressions

GORM, Grails ORM library, introduces the concept of dynamic method expressions. A method expression is made up of the prefix such as “findBy” followed by an expression that combines one or more properties. Grails takes advantage of Groovy features to provide dynamic methods:

findByTitle("Example")
findByTitleLike("Exa%")

Method expressions can also use a boolean operator to combine two criteria:

findAllByTitleLikeAndDateGreaterThan("Exampl%", '2010-03-23')

In this case we are using AND in the middle of the query to make sure both conditions are satisfied, but you could equally use OR:

findAllByTitleLikeOrDateGreaterThan("Exampl%", '2010-03-23')

Parsing Method Expressions

MethodExpressionParser is a PHP library for parsing method expressions. It’s designed to quickly and easily parse method expressions and construct conditions based on attribute names and arguments.

Description

[finderMethod]([attribute][expression][logicalOperator])?[attribute][expression]

Expressions

  • LessThan: Less than the given value
  • LessThanEquals: Less than or equal a give value
  • GreaterThan: Greater than a given value
  • GreaterThanEquals: Greater than or equal a given value
  • Like: Equivalent to a SQL like expression
  • NotEqual: Negates equality
  • IsNotNull: Not a null value (doesn’t require an argument)
  • IsNull: Is a null value (doesn’t require an argument)

Examples

findByTitleAndDate('Example', date('Y-m-d'));
SELECT * FROM book WHERE title = ? AND date = ?

findByTitleOrDate('Example', date('Y-m-d'))
SELECT * FROM book WHERE title = ? OR date = ?

findByPublisherOrTitleAndDate('Name', 'Example', date('Y-m-d'))
SELECT * FROM book WHERE publisher = ? OR (title = ? AND date = ?)

findByPublisherInAndTitle(array('Name1', 'Name2'), 'Example')
SELECT * FROM book WHERE publisher IN (?, ?) AND date = ?

findByTitleLikeAndDateNotNull('Examp%')
SELECT * FROM book WHERE title LIKE ? AND date NOT NULL

findByIdOrTitleAndDateNotNull(1, 'Example')
SELECT * FROM book WHERE (id = ?) OR (title = ? AND date NOT NULL)

Example 1:

findByTitleLikeAndDateNotNull('Examp%');

Outputs:

array
  0 =>
    array
      0 =>
        array
          'attribute' => string 'title'
          'expression' => string 'Like'
          'format' => string '%s LIKE ?'
          'placeholders' => int 1
          'argument' => string 'Examp%'
      1 =>
        array
          'attribute' => string 'date'
          'expression' => string 'NotNull'
          'format' => string '%s IS NOT NULL'
          'placeholders' => int 0
          'argument' => null

Example 2:

findByTitleAndPublisherNameOrTitleAndPublisherName('Title', 'a', 'Title', 'b');

Outputs:

array
  0 =>
    array
      0 =>
        array
          'attribute' => string 'title'
          'expression' => string 'Equals'
          'format' => string '%s = ?'
          'placeholders' => int 1
          'argument' => string 'Title'
      1 =>
        array
          'attribute' => string 'publisher_name'
          'expression' => string 'Equals'
          'format' => string '%s = ?'
          'placeholders' => int 1
          'argument' => string 'a'
  1 =>
    array
      0 =>
        array
          'attribute' => string 'title'
          'expression' => string 'Equals'
          'format' => string '%s = ?'
          'placeholders' => int 1
          'argument' => string 'Title'
      1 =>
        array
          'attribute' => string 'publisher_name'
          'expression' => string 'Equals'
          'format' => string '%s = ?'
          'placeholders' => int 1
          'argument' => string 'b'

See more examples: Project Wiki

Usage

class EntityRepository
{
    private $methodExpressionParser;

    // Return a single instance of MethodExpressionParser
    public function getMethodExpressionParser() {
    }

    // Finder methods
    public function findBy($conditions) {
        var_dump($conditions);
    }
    public function findAllBy($conditions) {
        var_dump($conditions);
    }

    // Invoke finder methods
    public function __call($method, $args) {
        if ('f' === $method{0}) {
            try {
                $result = $this->getMethodExpressionParser()->parse($method, $args);
                $finderMethod = key($result);
                $conditions = $result[$finderMethod];
            } catch (MethodExpressionParserException $e) {
                $message = sprintf('%s: %s()', $e->getMessage(), $method);
                throw new EntityRepositoryException($message);
            }
            return $this->$finderMethod($conditions);
        }

        $message = 'Invalid method call: ' . __METHOD__;
        throw new BadMethodCallException($message);
    }
}

Performance

PHP doesn’t allow you to define methods dynamically, this means that every time you invoke a finder method the parser has to search, extract and map all the attribute names and expressions. To avoid introducing this performance overhead you can cache the attribute names. For example:

class EntityRepository
{
    private $methodExpressionParser;
    private $classMetadata;

    // Return a single instance of MethodExpressionParser
    public function getMethodExpressionParser() {
    }

    // Return a single instance of ClassMetadata
    public function getClassMetadata() {
    }

    // Invoke finder methods
    public function __call($method, $args) {
        if ('f' === $method{0}) {
            $parser = $this->getMethodExpressionParser();
            $classMetadata = $this->getClassMetadata();
            try {
                $finderMethod = $parser->determineFinderMethod($method);
                if ($classMetadata->hasMissingMethod($method)) {
                    $attributes = $classMetadata->getMethodAttributes($method);
                    $conditions = $parser->map($args, $attributes);
                } else {
                    $expressions = substr($method, strlen($finderMethod));
                    $attributes = $this->extractAttributeNames($expressions);
                    $conditions = $parser->map($args, $attributes);
                    $classMetadata->setMethodAttributes($method, $attributes);
                }
            } catch (MethodExpressionParserException $e) {
                $message = sprintf('%s: %s()', $e->getMessage(), $method);
                throw new EntityRepositoryException($message);
            }
            return $this->$finderMethod($conditions);
        }

        $message = 'Invalid method call: ' . __METHOD__;
        throw new BadMethodCallException($message);
    }
}

The Expression objects are lazy-loaded, depending on the expressions found in the method name.

Extensibility

The MethodExpressionParser class was designed with extensibility in mind, allowing you to add new Expressions to the library.

abstract class Expression {
}
class EqualsExpression extends Expression {
}

Source Code

Browse source code:
http://fedecarg.com/repositories/show/expressionparser

Check out the current development trunk with:

$ svn checkout http://svn.fedecarg.com/repo/Zf/Orm

Sky Named Britain’s Most Admired Company

Based on a survey of thousands of managers and investment analysts, Management Today has named BSkyB as Britain’s Most Admired Company for 2009. BSkyB is the youngest company ever to win this Award.

BSkyB beat off the superstore giant Tesco into second place. Johnson Mathey took the third slot, with Cadbury, GlaxoSmithKline and Rolls-Royce trailing at numbers four, five and six.

BSkyB headed off the sector in all of the criteria laid down by the organizers, coming out top in “quality of goods and services”, “quality of marketing” and “capacity to innovate”.

Most Admired Top 20, 2009

(Last year’s position in brackets)

1 (4)     BSkyB 72.25
2 (5)     Tesco 71.38
3 (2)     Johnson Matthey 71.00
4 (18)    Cadbury 70.40
5 (19)    GlaxoSmithKline 70.00
6 (7)     Rolls-Royce 69.96
7 (26)    BP 67.08
8 (11)    BG Group 67.03
9 (1)     Diageo 65.83
10 (47)   Cobham 65.75
11 (3)    Unilever 65.0
12 (52)   BAE Systems 64.9
13 (51)   Ultra Electronics 64.7
14 (154)  Centrica 64.4
14 (24)   Royal Dutch Shell 64.4
16 (81)   Admiral 63.9
16 (17)   Capita Group 63.9
18 (27)   Sainsbury 63.8
19 (55)   Balfour Beatty 63.1
20 (29)   Marks & Spencer 62.9

BSkyB is a great company to work for, filled with talented people. Congratulation for this prestigious award!

Links

Management Today
Sky News

Command-line memcached stat reporter

Nicholas Tang wrote a nice little perl script that shows a basic memcached top display for a list of servers. You can specify thresholds, for instance, and it’ll change color to red if you exceed the thresholds. You can also choose the refresh/sleep time, and whether to show immediate (per second) stats, or lifetime stats.

To install it you only need to download the script and make it executable:

$ curl http://memcache-top.googlecode.com/files/memcache-top-v0.6 > ~/bin/memcache-top
$ chmod +x ~/bin/memcache-top
$ memcache-top --sleep 3 --instances 10.50.11.3,10.50.11.4,10.50.11.5

Here’s some sample output:

memcache-top v0.6       (default port: 11211, color: on, refresh: 3 seconds)

INSTANCE                USAGE   HIT %   CONN    TIME    EVICT/s GETS/s  READ/s  WRITE/s
10.50.11.3:11211        88.9%   69.7%   1661    0.9ms   0.3     47      13.9K   9.8K
10.50.11.4:11211        88.8%   69.9%   2121    0.7ms   1.3     168     17.6K   68.9K
10.50.11.5:11211        88.9%   69.4%   1527    0.7ms   1.7     48      14.4K   13.6K
AVERAGE:                84.7%   72.9%   1704    1.0ms   1.3     69      13.5K   30.3K   

TOTAL:          19.9GB/ 23.4GB          20.0K   11.7ms  15.3    826     162.6K  363.6K
(ctrl-c to quit.)

Project Home
http://code.google.com/p/memcache-top/

Managing Multiple Build Environments

Last updated: 3 March, 2010

One of the challenges of Web development is managing multiple build environments. Most applications pass through several environments before they are released. These environments include: A local development environment, a shared development environment, a system integration environment, a user acceptance environment and a production environment.

Automated Builds

Automated builds provide a consistent method for building applications and are used to give other developers feedback about whether the code was successfully integrated or not. There are different types of builds: Continuous builds, Integration builds, Release builds and Patch builds.

A source control system is the main point of integration for source code. When your team works on separate parts of the code base, you have to ensure that your checked in code doesn’t break the Integration build. That’s why it is important that you run your unit tests locally before checking in code.

Here is a recommended process for checking code into source control:

  • Get the latest code from source control before running your tests
  • Verify that your local build is building and passing all the unit tests before checking in code
  • Use hooks to run a build after a transaction has been committed
  • If the Integration build fails, fix the issue because you are now blocking other developers from integrating their code

Hudson can help you automate these tasks. It’s extremely easy to install and can be configured entirely from a Web UI. Also, it can be extended via plug-ins and can execute Phing, Ant, Gant, NAnt and Maven build scripts.

Build File

We need to create a master build file that contains the actions we want to perform. This script should make it possible to build the entire project with a single command line.

First we need to separate the source from the generated files, so our source files will be in the “src” directory and all the generated files in the “build” directory. By default Ant uses build.xml as the name for a build file, this file is usually located in the project root directory.

Then, you have to define whatever environments you want:

project/
    build/
        files/
            local/
            development/
            integration/
            production/
        packages/
            development/
                project-development-0.1-RC.noarch.rpm
            integration/
            production/
        default.properties
        local.properties
        development.properties
        production.properties
    src/
        application/
            config/
            controllers/
            domain/
            services/
            views/
        library/
        public/
    tests/
    build.xml

Build files tend to contain the same actions:

  • Delete the previous build directory
  • Copy files
  • Manage dependencies
  • Run unit tests
  • Generate HTML and XML reports
  • Package files

The target element is used as a wrapper for a sequences of actions. A target has a name, so that it can be referenced from elsewhere, either externally from the command line or internally via the “depends” or “antcall” keyword. Here’s a basic build.xml example:

<?xml version="1.0" encoding="iso-8859-1"?>
<project name="project" basedir="." default="main">

    <target name="init"></target>
    <target name="test"></target>
    <target name="test-selenium"></target>
    <target name="profile"></target>
    <target name="clean"></target>
    <target name="build" depends="init, test, profile, clean"></target>
    <target name="package"></target>

</project>

The property element allows the declaration of properties which are like user-definable variables available for use within an Ant build file. Properties can be defined either inside the buildfile or in a standalone properties file. For example:

<?xml version="1.0" encoding="iso-8859-1"?>
<project name="project" basedir="." default="main">

    <property file="${basedir}/build/default.properties" />
    <property file="${basedir}/build/${build.env}.properties" />
    ...

</project>

The core idea is using property files which name accords to the environment name. Then simply use the custom build-in property build.env. For better use you should also provide a file with default values. The following example intends to describe a typical Ant build file, of course, it can be easily modified to suit your personal needs.

<?xml version="1.0" encoding="iso-8859-1"?>
<project name="project" basedir="." default="main">

    <property file="${basedir}/build/default.properties" />
    <condition property="build.env" value="${build.env}" else="local">
        <isset property="build.env" />
    </condition>
    <property file="${basedir}/build/${build.env}.properties" />

     <property environment="env" />
     <condition property="env.BUILD_ID" value="${env.BUILD_ID}" else="">
         <isset property="env.BUILD_ID" />
     </condition>

    <target name="init">
        <echo message="Environment: ${build.env}"/>
        <echo message="Hudson build ID: ${env.BUILD_ID}"/>
        <echo message="Hudson build number: ${env.BUILD_NUMBER}"/>
        <echo message="SVN revision: ${env.SVN_REVISION}"/>
        <tstamp>
            <format property="build.datetime" pattern="dd-MMM-yy HH:mm:ss"/>
        </tstamp>
        <echo message="Build started at ${build.datetime}"/>
    </target>

    <target name="test">
        ...
    </target>

    <target name="clean">
        <delete dir="${build.dir}/files/${build.env}"/>
        <delete dir="${build.dir}/packages/${build.env}"/>
        <mkdir dir="${build.dir}/files/${build.env}"/>
        <mkdir dir="${build.dir}/packages/${build.env}"/>
    </target>

    <target name="build" depends="init, test, profile, clean">
        ...
    </target>
    ...

</project>

Using ant -Dname=value lets you define values for properties on the Ant command line. These properties can then be used within your build file as any normal property: ${name} will put in value.

$ ant build -Dbuild.env=development

There are different ways to target multiple environments. I hope I have covered enough of the basic functionality to get you started.

Testing Zend Framework Action Controllers With Mocks

In this post I’ll demonstrate a unit test technique for testing Zend Framework Action Controllers using Mock Objects. Unit testing controllers independently has a number of advantages:

  1. You can develop controllers test-first (TDD).
  2. It allows you to develop and test all of your controller code before developing any of the view scripts.
  3. It helps you quickly identify problems in the controller, rather than problems in one of the combination of Model, View and Controller.

The Action Controller I’m going to test has only one method, profileAction():

tests/application/controllers/UserController.php

class UserController extends Zend_Controller_Action
{
    public function profileAction()
    {
        $this->view->userId = $this->_getParam('user_id');
        return $this->render();
    }
}

tests/application/ControllerTestCase.php

class ControllerTestCase extends Zend_Test_PHPUnit_ControllerTestCase
{
    public $application;

    public function setUp()
    {
        $this->application = new Zend_Application(
            APPLICATION_ENV,
            APPLICATION_PATH . '/config/application.ini'
        );

        $this->bootstrap = array($this, 'bootstrap');
        parent::setUp();
    }

    public function tearDown()
    {
        Zend_Controller_Front::getInstance()->resetInstance();

        $this->resetRequest();
        $this->resetResponse();

        $this->request->setPost(array());
        $this->request->setQuery(array());
    }

    public function bootstrap()
    {
        $this->application->bootstrap();
    }
}

tests/application/controllers/UserControllerTest.php

require_once TESTS_PATH . '/application/ControllerTestCase.php';
require_once APPLICATION_PATH . '/controllers/UserController.php';

class UserControllerTest extends ControllerTestCase
{
    public function testStubRenderMethodCall()
    {
        $request = $this->getRequest()
            ->setRequestUri('/user/profile/1')
            ->setParams(array('user_id'=>1))
            ->setPathInfo(null);

        $response = $this->getResponse();

        $this->getFrontController()
            ->setRequest($request)
            ->setResponse($response)
            ->throwExceptions(true)
            ->returnResponse(false);

        $controller = $this->getMock(
            'UserController',
            array('render'),
            array($request, $response, $request->getParams())
        );
        $controller->expects($this->once())
                 ->method('render')
                 ->will($this->returnValue(true));

        $this->assertTrue($controller->profileAction());
        $this->assertTrue($controller->view->user_id == 1);
    }
}

You can go further making both the tests and the implementation more sophisticated. The main point is that you can build and test a controller in a way that doesn’t require a view script to be written to do so.

Zend Framework Known Issues

By default Zend_Test_PHPUnit_ControllerTestCase sets the redirector exit value to false, leading to unexpected behavior when unit testing your code. For that reason, make sure you always add a return statement after calling a utility method:

class UserController extends Zend_Controller_Action
{
    public function profileAction()
    {
        if (null == $this->_getParam('user_id', null) {
            return $this->_redirect('/');
        }
        return $this->render();
    }
}

If you want the Front Controller to throw exceptions, you have no other choice than to overwrite the dispatch method and pass a boolean TRUE to the throwExceptions() method:

class ControllerTestCase extends Zend_Test_PHPUnit_ControllerTestCase
{
    ...

    public function dispatch($url = null)
    {
        // redirector should not exit
        $redirector = Zend_Controller_Action_HelperBroker::getStaticHelper('redirector');
        $redirector->setExit(false);

        // json helper should not exit
        $json = Zend_Controller_Action_HelperBroker::getStaticHelper('json');
        $json->suppressExit = true;

        $request = $this->getRequest();
        if (null !== $url) {
            $request->setRequestUri($url);
        }
        $request->setPathInfo(null);

        $this->getFrontController()
             ->setRequest($request)
             ->setResponse($this->getResponse())
             ->throwExceptions(true)
             ->returnResponse(false);

        $this->getFrontController()->dispatch();
    }

    ...
}

The Dispatcher not only violates the DRY principle but also suffers from amnesia. The problem is that it doesn’t store the instance of the Action Controller, instead, it destroys it (Zend_Controller_Dispatcher_Standard Line 305). You can easily get around this issue by extending the standard dispatcher and overwriting the dispatch() method:

class Zf_Controller_Dispatcher_Standard extends Zend_Controller_Dispatcher_Standard
{
    ...

    public function dispatch($url = null)
    {
        ...
        Zend_Registry::set('Zend_Controller_Action', $controller);

        // Destroy the page controller instance and reflection objects
        $controller = null;
    }

This will allow you to access the view object after dispatching the request:

class ExampleControllerTest extends ControllerTestCase
{
    public function testDefaultActionRendersViewObject()
    {
        $this->dispatch('/');

        $controller = Zend_Registry::get('Zend_Controller_Action');

        $this->assertEquals('ExampleController', get_class($controller));
        $this->assertTrue(isset($controller->view));
    }

Links

PHPUnit: Testing Zend Framework Controllers
PHPUnit: Mock Objects