Archive for the ‘Databases’ Category
Four Great InfoQ Presentations
Hope you like these recommendations and if you know of any other good tech-related video, then please let me know.
1. Developing Expertise: Herding Racehorses, Racing Sheep
One of my favourites. In this presentation Dave Thomas (The Pragmatic Programmer) talks about expanding people’s expertise in their domains of interest by not treating them uniformly as they had the same amount of knowledge and level of experience.
2. Real World Web Services
Another good presentation. In this one Scott Davis provides a pragmatic, down-to-earth introduction to Web services as used in the real world by public sites, including SOAP-based and REST examples.
3. CouchDB and me
This presentation is different, and that’s why I like it so much. Damien Katz shares his experiences and reminds people how difficult but at the same time gratifying is to be an open source developer. He talks about the history of CouchDB development from a very personal point of view. His inspirations for CouchDB and why he decided to move my wife and kids to a cheaper place and live off savings to build this thing.
4. Yahoo! Communities Architectures
In this presentation, Ian Flint tries to explain the infrastructure and architecture employed by Yahoo! to keep going a multitude of servers running of different platforms and offering different services. Very interesting!
The Multimaster Replication Problem
Replication has its problems, specially if you have a multimaster replication system. To make matters worse, none of the PHP frameworks support multimaster replication systems nor handle master failover. Symfony uses Propel and only supports master-slave replication systems. When the master fails, it’s true that you have the slaves ready to replace it, but the process of detecting the failure and acting upon it requires human intervention. Zend Framework, on the other hand, doesn’t support replication at all.
I strongly believe that a master failover needs to be handled appropriately on the application side. Of course, you can always use an SQL proxy or any other server-side solution, but they are either limited or unreliable.
From Digg’s Blog:
The Digg database access layer is written in PHP and lives at the level of the application server. Basically, when the application decides it needs to do a read query, it passes off the query with a descriptor to a method that grabs a list of servers for the database pool that can satisfy the query, then picks one at random, submits the query, and returns the results to the calling method.
If the server picked won’t respond in a very small amount of time, the code moves on to the next server in the list. So if MySQL is down on a database machine in one of the pools, the end-user of Digg doesn’t notice. This code is extremely robust and well-tested. We worry neither that shutting down MySQL on a read slave in the Digg cluster, nor a failure in alerting on a DB slave that dies will cause site degradation.
Every few months we consider using a SQL proxy to do this database pooling, failover, and balancing, but it’s a tough sell since the code is simple and works extremely well. Furthermore, the load balancing function itself is spread across all our Apaches. Hence there is no “single point of failure” as there would be if we had a single SQL proxy.
If you are building your own solution and need a sandbox to test it, I recommend using MySQL Sandbox. Also, you might find this script useful: MySQL Master-Master Replication Manager
Full-text searching with MySQL
MySQL’s full-text search functions provide a simple framework for an easily implemented, approximate site search. Many sites, written in an interpreted language and powered by MySQL, can use MySQL’s full-text search to avoid third party dependencies.
The basics of the MySQL full-text search functions are well-documented in the MySQL online documentation. For those lacking patience, here is a quick rundown.
Profiling queries with Zend_Db
Eran Galperin wrote:
Database performance is one of the major bottlenecks for most web applications. Most web developers are not database experts (and I’m no exception), there are however several basic methods to analyse and optimize database performance without resorting to expert consultants.
The first step I take when profiling database performance for a web app is to measure the running time of all the queries running in it. Absolute run time of a query is not necessarily a good measure of how optimized it is, since some queries are naturally more complex or pull more data – It will give me a good idea however of where to start improving the response time of the application I’m optimizing. My main goal is not specific query performance, but overall system performance.
Continue reading: Profiling queries with Zend_Db
MySQL Master-Master Replication Manager
The MySQL Master-Master replication (often in active-passive mode) is popular pattern used by many companies using MySQL for scale out. Most of the companies would have some internal scripts to handle things as automatic fallback and slave cloning but no Open Source solution was made available.
In 2007, the High Performance Group at MySQL AB was asked to implement such solution for one of their customers and they kindly agreed to let them release the software under GPL2 license.
The MySQL Master-Master Replication Manager (MMM) is a set of flexible scripts to perform monitoring/failover and management of MySQL Master-Master replication configurations (with only one node writable at any time). The toolset also has the ability to read balance standard master/slave configurations with any number of slaves, so you can use it to move virtual IP addresses around a group of servers depending on whether they are behind in replication.
Links
Solar Framework: MySQL Replication Adapter
As part of the Solar-1.0.0alpha2 release, Paul M. Jones included an SQL adapter that lets you connect to master/slave MySQL installations.
Replication
Replication enables data from one MySQL database server (called the master) to be replicated to one or more MySQL database servers (slaves). Replication is asynchronous – your replication slaves do not need to be connected permanently to receive updates from the master, which means that updates can occur over long-distance connections and even temporary solutions such as a dial-up service. Depending on the configuration, you can replicate all databases, selected databases, or even selected tables within a data.
Solar Adapter
When you get into a situation where you need to scale up, you might need a replicated MySQL database setup. In such cases, there is one “master” server that handles reads and writes, and there are one or more “slave” servers that are read-only. This means you need to pick which server you’re going to connect to based on the kind of statement you need to issue (SELECT, INSERT, UPDATE, etc).
The Solar_Sql_Adapter_MysqlReplicated adapter handles all the connections and switching-around for you, and you don’t need change a single line of application code. All you need to do is modify your configuration file to tell it where your master and slave servers are.
More: Adapter for Master/Slave MySQL Setups
Related Links
Kenneth Downs: Why I do not use ORM
A thorough knowledge of database behaviour tends to lead a person away from ORM. First off, the two basic premises of ORM are factually incorrect: One, that there is some native incompatibility between databases and code, and two, that all the world must be handled in objects. These two misconceptions themselves might be excusable if they turned out to be harmless, but they are far from harmless. They promote a wilful ignorance of actual wise database use, and in so doing are bound to generate methods that are inefficient at best and horrible at worst.
Propel 1.3 uses PDO instead of Creole
In a move to increase performance and take advantage and support new PHP features, Propel 1.3 uses the native PDO database abstraction layer. This change has a number of implications, particularly for those who are executing SQL directly. PDO’s API is loosely similar to Creole’s, so this change shouldn’t require any major re-architecture.
New Features
The new features in Propel 1.3 that will affect upgrading are:
- New PHP minimum requirements
- Use of PDO instead of Creole
- New DSN format for build and runtime properties
- New transaction API
- Some method signature changes
- mysqli adapter is obsolete & has been removed
- New SPL autoload integration
- Some API changes for extending classes.
Symfony 1.2
Propel 1.3 will be the default version of Propel bundled with Symfony 1.2. If you can’t switch to Propel 1.3 for whatever reason, you will still be able to use Propel 1.2 by installing the sfPropelPlugin.
DbFinder plugin
The DbFinderPlugin is a symfony plugin that provides an easy API for finding Model objects, whether the underlying ORM is Propel or Doctrine. The idea behind this plugin is to write queries to retrieve model objects through an ORM, but fast. Inspired by Doctrine, Rails has_finder plugin and SQLAlchemy, DbFinder can be seen as “jQuery for symfony’s model layer”. It also aims at putting the things in the right order, meaning that writing a find() query will feel natural for those familiar with SQL.
Related article: The ORM isn’t important anymore
Agile Database Deployment Using Phing
Phing allows you to use SQL to define changes to your database schema, making it possible to use a version control system to keep things synchronized with the actual code.
A common way to automate development and deployment tasks is by writing shell scripts, however, Phing provides some advantages over shell scripts for task automation. Most of the shell scripts I created so far help me automate the most tedious tasks, from configuring, building, testing and documenting applications to synchronizing files and migrating database schemas. But, having a large collection of shell scripts can lead to a maintenance nightmare, reason why I decided to port some of them to PHP as Phing tasks. I chose Phing because it’s simple, powerful and very easy to extend.
DbDeployTask
Phing offers an optional task for making revisions to a database, it’s called DbDeployTask. Dave Marshall wrote a nice article about it. After adding this task to my build script and testing it, I came to the conclusion that DbDeployTask is not very effective if you want to roll out incremental changes to the databases. Also, it introduces new problems to the database migration process.
For example, if you use DbDeployTask, the SQL script names must begin with a number that indicates the order in which they should be run:
basedir/
database/
deltas/
1-create_foo_table.sql
2-alter_foo_table.sql
3-add_constraint_id_to_foo.sql
history/
up_20080622120000.sql
down_20080622120000.sql
public/
But, what happens when 2 or more developers are making changes to the same database design? How do they know which scripts have been executed? What if they don’t use a central development database, how do they merge individual changes? To be honest, I don’t know.
Agile Database Deployment
A more Agile way to manage database changes is for the developers to connect to a central development database. After making any changes to the database, the developers will have to:
- Update the local database/ directory before modifying any script.
- Copy the changes made to the database to the corresponding SQL file (up and down).
- Check the files back into the repository.
Finally, the changes can be propagated to other databases by executing a script or setting up a Cron job.
This is how the database directory can be structured:
basedir/
database/
down/
alter/
constraints/
create/
data/
up/
alter/
constraints/
create/
data/
public/
To put your database definition under version control, you need an off-line representation of that database. The directory up/ is used when migrating to a new version, down/ is used to roll back any changes if needed. These directories contain all the changes made to the database:
up/
alter/
foo.table.sql
constraints/
foo.table.sql
create/
foo.table.sql
data/
This directory structure is based on the one suggested by Mike Hostetler, and it clearly offers more advantages. After the team has made all necessary changes, the next step is to deploy the schema. If you are responsible for performing that task, you can get the most recent version of the database project from version control, build the deployment script, manually update that script as necessary, and then run the script to deploy the schema changes into development, staging or production.
Learn More
Database Schema Deployment (PDF) – By Lukas Smith
10 great articles for optimizing MySQL queries
Optimization is a complex task because ultimately it requires understanding of the entire system to be optimized. Although it may be possible to perform some local optimizations with little knowledge of your system or application, the more optimal you want your system to become, the more you must know about it.
The following 10 articles explain and give some examples of different ways to optimize MySQL:
- Identifying slow MySQL queries
- MySQL & PHP Performance Optimization Tips
- MySQL Presentations: Optimizing MySQL
- MySQL 5 Optimization and Tuning Guide
- Survive heavy traffic with your Web server
- Tools to use for MySQL Performance – Peter Zaitsev
- MySQL Queries Optimization (video) – Peter Zaitsev
- Table Design and MySQL Index Details
- MySQL Reference Manual: Optimization Overview
- MySQL Reference Manual: The MySQL Query Cache