ActiveRecord: JavaScript ORM Library

Aptana has just released a beta version of its ActiveRecord.js which is an ORM JavaScript library that implements the ActiveRecord pattern. It works with AIR and other environments:

ActiveRecord.js is a single file, MIT licensed, relies on no external JavaScript libraries, supports automatic table creation, data validation, data synchronization, relationships between models, life cycle callbacks and can use an in memory hash table to store objects if no SQL database is available.

Example:

var User = ActiveRecord.define('users',{
    username: '',
    email: ''
});
User.hasMany('articles');

var ryan = User.create({
    username: 'ryan',
    email: 'rjohnson@aptana.com'
});

var Article = ActiveRecord.define('articles',{
    name: '',
    body: '',
    user_id: 0
});
Article.belongsTo('user');

var a = Article.create({
    name: 'Announcing ActiveRecord.js',
    user_id: ryan.id
});
a.set('name','Announcing ActiveRecord.js!!!');
a.save();

a.getUser() == ryan;
ryan.getArticleList()[0] == a;

Links

Geo Proximity Search: The Haversine Equation

I’m working on a project that requires Geo proximity search. Basically, what I’m doing is plotting a radius around a point on a map, which is defined by the distance between two points on the map given their latitudes and longitudes. To achieve this I’m using the Haversine formula (spherical trigonometry). This equation is important in navigation, it gives great-circle distances between two points on a sphere from their longitudes and latitudes. You can see it in action here: Radius From UK Postcode.

This has already been covered in some blogs, however, I found most of the information to be inaccurate and, in some cases, incorrect. The Haversine equation is very straight forward, so there’s no need to complicate things.

I’ve implemented the solution in SQL, Python and PHP. Use the one that suits you best.

SQL implementation

set @latitude=53.754842;
set @longitude=-2.708077;
set @radius=20;

set @lng_min = @longitude - @radius/abs(cos(radians(@latitude))*69);
set @lng_max = @longitude + @radius/abs(cos(radians(@latitude))*69);
set @lat_min = @latitude - (@radius/69);
set @lat_max = @latitude + (@radius/69);

SELECT * FROM postcode
WHERE (longitude BETWEEN @lng_min AND @lng_max)
AND (latitude BETWEEN @lat_min and @lat_max);

Python implementation

from __future__ import division
import math

longitude = float(-2.708077)
latitude = float(53.754842)
radius = 20

lng_min = longitude - radius / abs(math.cos(math.radians(latitude)) * 69)
lng_max = longitude + radius / abs(math.cos(math.radians(latitude)) * 69)
lat_min = latitude - (radius / 69)
lat_max = latitude + (radius / 69)

print 'lng (min/max): %f %f' % (lng_min, lng_max)
print 'lat (min/max): %f %f' % (lat_min, lat_max)

PHP implementation

$longitude = (float) -2.708077;
$latitude = (float) 53.754842;
$radius = 20; // in miles

$lng_min = $longitude - $radius / abs(cos(deg2rad($latitude)) * 69);
$lng_max = $longitude + $radius / abs(cos(deg2rad($latitude)) * 69);
$lat_min = $latitude - ($radius / 69);
$lat_max = $latitude + ($radius / 69);

echo 'lng (min/max): ' . $lng_min . '/' . $lng_max . PHP_EOL;
echo 'lat (min/max): ' . $lat_min . '/' . $lat_max;

It outputs the same result:

lng (min/max): -3.1983251898421/-2.2178288101579
lat (min/max): 53.464986927536/54.044697072464

That’s it. Happy Geolocating!

Four Great InfoQ Presentations

Hope you like these recommendations and if you know of any other good tech-related video, then please let me know.

1. Developing Expertise: Herding Racehorses, Racing Sheep

One of my favourites. In this presentation Dave Thomas (The Pragmatic Programmer) talks about expanding people’s expertise in their domains of interest by not treating them uniformly as they had the same amount of knowledge and level of experience.

Developing Expertise

2. Real World Web Services

Another good presentation. In this one Scott Davis provides a pragmatic, down-to-earth introduction to Web services as used in the real world by public sites, including SOAP-based and REST examples.

Real World Web Services

3. CouchDB and me

This presentation is different, and that’s why I like it so much. Damien Katz shares his experiences and reminds people how difficult but at the same time gratifying is to be an open source developer. He talks about the history of CouchDB development from a very personal point of view. His inspirations for CouchDB and why he decided to move my wife and kids to a cheaper place and live off savings to build this thing.

CouchDB and me

4. Yahoo! Communities Architectures

In this presentation, Ian Flint tries to explain the infrastructure and architecture employed by Yahoo! to keep going a multitude of servers running of different platforms and offering different services. Very interesting!

Yahoo! Communities Architectures

The Multimaster Replication Problem

Replication has its problems, specially if you have a multimaster replication system. To make matters worse, none of the PHP frameworks support multimaster replication systems nor handle master failover. Symfony uses Propel and only supports master-slave replication systems. When the master fails, it’s true that you have the slaves ready to replace it, but the process of detecting the failure and acting upon it requires human intervention. Zend Framework, on the other hand, doesn’t support replication at all.

I strongly believe that a master failover needs to be handled appropriately on the application side. Of course, you can always use an SQL proxy or any other server-side solution, but they are either limited or unreliable.

From Digg’s Blog:

The Digg database access layer is written in PHP and lives at the level of the application server. Basically, when the application decides it needs to do a read query, it passes off the query with a descriptor to a method that grabs a list of servers for the database pool that can satisfy the query, then picks one at random, submits the query, and returns the results to the calling method.

If the server picked won’t respond in a very small amount of time, the code moves on to the next server in the list. So if MySQL is down on a database machine in one of the pools, the end-user of Digg doesn’t notice. This code is extremely robust and well-tested. We worry neither that shutting down MySQL on a read slave in the Digg cluster, nor a failure in alerting on a DB slave that dies will cause site degradation.

Every few months we consider using a SQL proxy to do this database pooling, failover, and balancing, but it’s a tough sell since the code is simple and works extremely well. Furthermore, the load balancing function itself is spread across all our Apaches. Hence there is no “single point of failure” as there would be if we had a single SQL proxy.

If you are building your own solution and need a sandbox to test it, I recommend using MySQL Sandbox. Also, you might find this script useful: MySQL Master-Master Replication Manager

Full-text searching with MySQL

MySQL’s full-text search functions provide a simple framework for an easily implemented, approximate site search. Many sites, written in an interpreted language and powered by MySQL, can use MySQL’s full-text search to avoid third party dependencies.

The basics of the MySQL full-text search functions are well-documented in the MySQL online documentation. For those lacking patience, here is a quick rundown.

Continue reading

MySQL Master-Master Replication Manager

The MySQL Master-Master replication (often in active-passive mode) is popular pattern used by many companies using MySQL for scale out. Most of the companies would have some internal scripts to handle things as automatic fallback and slave cloning but no Open Source solution was made available.

In 2007, the High Performance Group at MySQL AB was asked to implement such solution for one of their customers and they kindly agreed to let them release the software under GPL2 license.

The MySQL Master-Master Replication Manager (MMM) is a set of flexible scripts to perform monitoring/failover and management of MySQL Master-Master replication configurations (with only one node writable at any time). The toolset also has the ability to read balance standard master/slave configurations with any number of slaves, so you can use it to move virtual IP addresses around a group of servers depending on whether they are behind in replication.

Links

Agile Database Deployment Using Phing

Phing allows you to use SQL to define changes to your database schema, making it possible to use a version control system to keep things synchronized with the actual code.

A common way to automate development and deployment tasks is by writing shell scripts, however, Phing provides some advantages over shell scripts for task automation. Most of the shell scripts I created so far help me automate the most tedious tasks, from configuring, building, testing and documenting applications to synchronizing files and migrating database schemas. But, having a large collection of shell scripts can lead to a maintenance nightmare, reason why I decided to port some of them to PHP as Phing tasks. I chose Phing because it’s simple, powerful and very easy to extend.

DbDeployTask

Phing offers an optional task for making revisions to a database, it’s called DbDeployTask. Dave Marshall wrote a nice article about it. After adding this task to my build script and testing it, I came to the conclusion that DbDeployTask is not very effective if you want to roll out incremental changes to the databases. Also, it introduces new problems to the database migration process.

For example, if you use DbDeployTask, the SQL script names must begin with a number that indicates the order in which they should be run:

basedir/
  database/
    deltas/
      1-create_foo_table.sql
      2-alter_foo_table.sql
      3-add_constraint_id_to_foo.sql
    history/
      up_20080622120000.sql
      down_20080622120000.sql
  public/

But, what happens when 2 or more developers are making changes to the same database design? How do they know which scripts have been executed? What if they don’t use a central development database, how do they merge individual changes? To be honest, I don’t know.

Agile Database Deployment

A more Agile way to manage database changes is for the developers to connect to a central development database. After making any changes to the database, the developers will have to:

  1. Update the local database/ directory before modifying any script.
  2. Copy the changes made to the database to the corresponding SQL file (up and down).
  3. Check the files back into the repository.

Finally, the changes can be propagated to other databases by executing a script or setting up a Cron job.

This is how the database directory can be structured:

basedir/
  database/
    down/
      alter/
      constraints/
      create/
      data/
    up/
      alter/
      constraints/
      create/
      data/
  public/

To put your database definition under version control, you need an off-line representation of that database. The directory up/ is used when migrating to a new version, down/ is used to roll back any changes if needed. These directories contain all the changes made to the database:

up/
  alter/
    foo.table.sql
  constraints/
    foo.table.sql
  create/
    foo.table.sql
  data/

This directory structure is based on the one suggested by Mike Hostetler, and it clearly offers more advantages. After the team has made all necessary changes, the next step is to deploy the schema. If you are responsible for performing that task, you can get the most recent version of the database project from version control, build the deployment script, manually update that script as necessary, and then run the script to deploy the schema changes into development, staging or production.

Learn More

Database Schema Deployment (PDF) – By Lukas Smith