Federico Cargnelutti

Simple is better than complex. Complex is better than complicated. | @fedecarg

Archive for the ‘Databases’ Category

Redis: The Lamborghini of Databases

without comments

Antonio Cangiano wrote:

Redis is a key-value database written in C. It can be used like memcached, in front of a traditional database, or on its own thanks to the fact that the in-memory datasets are not volatile but instead persisted on disk. Redis provides you with the ability to define keys that are more than mere strings, as well as being able to handle multiple databases, lists, sets and even basic master-slave replication. It’s amazingly fast. On an entry level Linux box, Redis has been benchmarked performing 110,000 SET operations, and 81,000 GETs, per second.

Despite being a very young project, it already has client libraries for several languages: Python and PHP, Erlang, and Ruby. Salvatore Sanfilippo, the creator of Redis, has implemented a Twitter clone known as Retwis to showcase how you can use Redis and PHP to build applications without the need for a database like MySQL or any SQL query.

Salvatore will be publishing a beginner’s article based on the PHP Twitter clone he wrote, soon.

Full article: Introducing Redis: a fast key-value database

Written by Federico

March 9, 2009 at 7:22 pm

Database Migrations: The Next Killer PHP Application

with 16 comments

This post, written by Adam Wiggins, got me thinking:

Migrations bother me. On one hand, migrations are the best solution we have for the problem of versioning databases. The scope of that problem includes merging schema changes from different developers, applying schema changes to production data, and creating a DRY representation of the schema. But even though migrations is the best solution we have, it still isn’t a very good one.

Schema synchronization is the killer feature missing in phpMyAdmin!

This feature can be found in some commercial applications, like SQLyog Enterprise for example. SQLyog’s Database Synchronization Tool (DST) is a visual comparison/synchronization tool designed for developers who work between different MySQL servers or need to keep databases between two MySQL servers in sync. This means reporting the differences between tables, indexes, columns of two databases, and generating the necessary “Up” and “Down” scripts to bring them in sync.

However, SQLyog DST is not free and only runs on Windows.

So, if you want to build the next killer PHP application, port SQLyog’s database synchronization tool to PHP and open source it. People will love you for it.

Links

Written by Federico

March 3, 2009 at 1:16 pm

Server-side Marker Clustering with PHP and Google Maps

with one comment

with_cluster2

As maps get busier, marker clustering is likely to be needed. Marker clustering is a technique by which several points of interest can be represented by a single icon when they’re close to one another.

Mika Tuupola wrote a PHP library that divides the map into a given number of cells and represents all the markers present in the same cell by a single icon. This icon shows the number of markers it symbolizes.

He also wrote an excellent post explaining how marker clustering works.

Related Posts

Written by Federico

February 26, 2009 at 3:32 pm

MySQL Split String Function

with 7 comments

MySQL does not include a function to split a delimited string. However, it’s very easy to create your own function.

Create function syntax

A user-defined function is a way to extend MySQL with a new function that works like a native MySQL function.

CREATE [AGGREGATE] FUNCTION function_name
RETURNS {STRING|INTEGER|REAL|DECIMAL}

To create a function, you must have the INSERT privilege for the <mysql> database.

Split delimited strings

The following example function takes 3 parameters, performs an operation using an SQL function, and returns the result.

Function

CREATE FUNCTION SPLIT_STR(
  x VARCHAR(255),
  delim VARCHAR(12),
  pos INT
)
RETURNS VARCHAR(255)
RETURN REPLACE(SUBSTRING(SUBSTRING_INDEX(x, delim, pos),
       LENGTH(SUBSTRING_INDEX(x, delim, pos -1)) + 1),
       delim, '');

Usage

SELECT SPLIT_STR(string, delimiter, position)

Example

SELECT SPLIT_STR('a|bb|ccc|dd', '|', 3) as third;

+-------+
| third |
+-------+
| ccc   |
+-------+

Written by Federico

February 22, 2009 at 12:11 pm

Posted in Databases

ActiveRecord: JavaScript ORM Library

without comments

Aptana has just released a beta version of its ActiveRecord.js which is an ORM JavaScript library that implements the ActiveRecord pattern. It works with AIR and other environments:

ActiveRecord.js is a single file, MIT licensed, relies on no external JavaScript libraries, supports automatic table creation, data validation, data synchronization, relationships between models, life cycle callbacks and can use an in memory hash table to store objects if no SQL database is available.

Example:

var User = ActiveRecord.define('users',{
    username: '',
    email: ''
});
User.hasMany('articles');

var ryan = User.create({
    username: 'ryan',
    email: 'rjohnson@aptana.com'
});

var Article = ActiveRecord.define('articles',{
    name: '',
    body: '',
    user_id: 0
});
Article.belongsTo('user');

var a = Article.create({
    name: 'Announcing ActiveRecord.js',
    user_id: ryan.id
});
a.set('name','Announcing ActiveRecord.js!!!');
a.save();

a.getUser() == ryan;
ryan.getArticleList()[0] == a;

Links

Written by Federico

February 16, 2009 at 8:55 pm

Geo Proximity Search with PHP, Python and SQL

with 10 comments

I’m working on a project that requires Geo proximity search. Basically, what I’m doing is plotting a radius around a point on a map, which is defined by the distance between two points on the map given their latitudes and longitudes. To achieve this I’m using the Haversine formula (spherical trigonometry). This equation is important in navigation, it gives great-circle distances between two points on a sphere from their longitudes and latitudes. You can see it in action here: Radius From UK Postcode.

This has already been covered in some PHP blogs, however, I found most of the information to be inaccurate and, in some cases, incorrect.

Of course, there are always exceptions. In this post, Kevin uses the Haversine equation to develop a postcode radius facility for his website, like the one on freemaptools.com.

He assumes you have the following values:

$lon = (float) -2.708077;
$lat = (float) 53.754842;
$radius = 20; // in miles

And calculates the latitude (min/max) and longitude (min/max) based on that:

(float)$dpmLAT = 1 / 69.1703234283616;

// Latitude calculation
(float)$usrRLAT = $dpmLAT * $radius;
(float)$latMIN = $lat - $usrRLAT;
(float)$latMAX = $lat + $usrRLAT;

// Longitude calculation
(float)$mpdLON = 69.1703234283616 * cos($userLat * (pi/180));

// degrees per mile longintude
(float)$dpmLON = 1 / $mpdLON;
$usrRLON = $dpmLON * $radius;
$lonMIN = $lon - $usrRLON;
$lonMAX = $lon + $usrRLON;

The variable names are a bit confusing, but you can tell straight away he knows what he’s doing. Unfortunately, the script doesn’t work, $userLat is not defined. If you change $userLat to $lat, it works. Also, you can use the deg2rad() function to convert the $lat number to the radian equivalent.

Here is Kevin’s implementation refactored (Codepad)

// Degrees per mile latitude
$dpm_lat = (float) 1 / 69;

// Latitude calculation
$rlat = (float) $dpm_lat * $radius;
$lat_min = (float) $latitude - $rlat;
$lat_max = (float) $latitude + $rlat;

// Longitude calculation
$mpd_lng = (float) abs(cos(deg2rad($latitude)) * 69);
$dpm_lng = (float) 1 / $mpd_lng;

$rlng = $dpm_lng * $radius;
$lng_min = $longitude - $rlng;
$lng_max = $longitude + $rlng; 

echo 'lng (min/max): ' . $lng_min . '/' . $lng_max;
echo 'lat (min/max): ' . $lat_min . '/' . $lat_max;

Outputs:

lng (min/max): -3.1983251898421/-2.2178288101579
lat (min/max): 53.464986927536/54.044697072464

Cool, it works. Now, for the sake of James Inman, lets refactor the script again and simplify the hole thing. The Haversine equation is very straight forward, so there’s no need to complicate things.

I’ve implemented the solution in PHP, Python and SQL. Use the one that suits you best.

PHP implementation (Codepad)

$longitude = (float) -2.708077;
$latitude = (float) 53.754842;
$radius = 20; // in miles

$lng_min = $longitude - $radius / abs(cos(deg2rad($latitude)) * 69);
$lng_max = $longitude + $radius / abs(cos(deg2rad($latitude)) * 69);
$lat_min = $latitude - ($radius / 69);
$lat_max = $latitude + ($radius / 69);

echo 'lng (min/max): ' . $lng_min . '/' . $lng_max . PHP_EOL;
echo 'lat (min/max): ' . $lat_min . '/' . $lat_max;

It outputs the same result:

lng (min/max): -3.1983251898421/-2.2178288101579
lat (min/max): 53.464986927536/54.044697072464

SQL implementation (Codepad)

set @latitude=53.754842;
set @longitude=-2.708077;
set @radius=20;

set @lng_min = @longitude - @radius/abs(cos(radians(@latitude))*69);
set @lng_max = @longitude + @radius/abs(cos(radians(@latitude))*69);
set @lat_min = @latitude - (@radius/69);
set @lat_max = @latitude + (@radius/69);

SELECT * FROM postcode
WHERE (longitude BETWEEN @lng_min AND @lng_max)
AND (latitude BETWEEN @lat_min and @lat_max);

Python implementation (Codepad)

from __future__ import division
import math

longitude = float(-2.708077)
latitude = float(53.754842)
radius = 20

lng_min = longitude - radius / abs(math.cos(math.radians(latitude)) * 69)
lng_max = longitude + radius / abs(math.cos(math.radians(latitude)) * 69)
lat_min = latitude - (radius / 69)
lat_max = latitude + (radius / 69)

print 'lng (min/max): %f %f' % (lng_min, lng_max)
print 'lat (min/max): %f %f' % (lat_min, lat_max)

That’s it. Happy Geolocating!

Written by Federico

February 8, 2009 at 11:48 am

Four Great InfoQ Presentations

with one comment

Hope you like these recommendations and if you know of any other good tech-related video, then please let me know.

1. Developing Expertise: Herding Racehorses, Racing Sheep

One of my favourites. In this presentation Dave Thomas (The Pragmatic Programmer) talks about expanding people’s expertise in their domains of interest by not treating them uniformly as they had the same amount of knowledge and level of experience.

Developing Expertise

2. Real World Web Services

Another good presentation. In this one Scott Davis provides a pragmatic, down-to-earth introduction to Web services as used in the real world by public sites, including SOAP-based and REST examples.

Real World Web Services

3. CouchDB and me

This presentation is different, and that’s why I like it so much. Damien Katz shares his experiences and reminds people how difficult but at the same time gratifying is to be an open source developer. He talks about the history of CouchDB development from a very personal point of view. His inspirations for CouchDB and why he decided to move my wife and kids to a cheaper place and live off savings to build this thing.

CouchDB and me

4. Yahoo! Communities Architectures

In this presentation, Ian Flint tries to explain the infrastructure and architecture employed by Yahoo! to keep going a multitude of servers running of different platforms and offering different services. Very interesting!

Yahoo! Communities Architectures

Written by Federico

February 6, 2009 at 12:02 am

Planet Ubuntu: Mastering phpMyAdmin 2.11

without comments

Matthew Helmke wrote:

I am the sole person responsible for seven websites that use MySQL databases, and have worked on the back end of several others. Over the years I have worked on these databases in many different ways. I’ve done it from the command line, via ssh. I’ve done it using PHP scripts within the software being run on a site. I have also used phpMyAdmin. Each method has applications and moments where I would prefer using it to the others at my disposal.

I have generally used phpMyAdmin only in those moments when I had something quick and simple to do, or when a hosting company did not provide ssh or command line access. This was mainly because I did not realize how powerful and flexible the software can be.

Mastering phpMyAdmin 2.11

Written by Federico

January 19, 2009 at 12:09 am

Sun: MySQL Native Driver for PHP

with one comment

The MySQL native driver for PHP, mysqlnd, is an additional, alternative way to connect from PHP 5 and PHP 6 to the MySQL Server 4.1 or later versions. mysqlnd is a replacement for the MySQL client library, libmysql; and it is tightly integrated into PHP starting with the release of PHP 5.3. Due to the tight integration into PHP 5.3 (and later), mysqlnd eliminates the dependency on the MySQL client programming support when the database extension(s) and the database driver are built with the support for mysqlnd.

Some of the advantages of using mysqlnd are listed below:

  • Easy to compile: no linking against libmysql, and no dependency on the MySQL client programming support.
  • May outperform libmysql in certain cases
  • Persistent connections for ext/mysqli
  • Uses PHP memory management, supports PHP memory limit
  • Reduced memory footprint — keeps every row only once in memory, where as with libmysql you have it twice in memory
  • Keeps a long list of performance related statistics for bottle-neck analysis
  • Client-side result set cache

Sun’s Blog: MySQL Native Driver for PHP

Written by Federico

January 18, 2009 at 6:42 pm

Posted in Databases, Open-source, PHP

The Multimaster Replication Problem

with 7 comments

Replication has its problems, specially if you have a multimaster replication system. To make matters worse, none of the PHP frameworks support multimaster replication systems nor handle master failover. Symfony uses Propel and only supports master-slave replication systems. When the master fails, it’s true that you have the slaves ready to replace it, but the process of detecting the failure and acting upon it requires human intervention. Zend Framework, on the other hand, doesn’t support replication at all.

I strongly believe that a master failover needs to be handled appropriately on the application side. Of course, you can always use an SQL proxy or any other server-side solution, but they are either limited or unreliable.

From Digg’s Blog:

The Digg database access layer is written in PHP and lives at the level of the application server. Basically, when the application decides it needs to do a read query, it passes off the query with a descriptor to a method that grabs a list of servers for the database pool that can satisfy the query, then picks one at random, submits the query, and returns the results to the calling method.

If the server picked won’t respond in a very small amount of time, the code moves on to the next server in the list. So if MySQL is down on a database machine in one of the pools, the end-user of Digg doesn’t notice. This code is extremely robust and well-tested. We worry neither that shutting down MySQL on a read slave in the Digg cluster, nor a failure in alerting on a DB slave that dies will cause site degradation.

Every few months we consider using a SQL proxy to do this database pooling, failover, and balancing, but it’s a tough sell since the code is simple and works extremely well. Furthermore, the load balancing function itself is spread across all our Apaches. Hence there is no “single point of failure” as there would be if we had a single SQL proxy.

If you are building your own solution and need a sandbox to test it, I recommend using MySQL Sandbox. Also, you might find this script useful: MySQL Master-Master Replication Manager

Written by Federico

January 2, 2009 at 1:10 am

Posted in Databases, Frameworks, PHP