Deployment, Linux, Open-source, Security, Tools

Check whether your web server is correctly configured

Last year Zone-H reported a record number of 1.5 million websites defacements. 1 million of those websites where running Apache.

When it comes to configuring a web server, some people tend to turn everything on by default. Developers are happy because the functionality that they wanted is available without any extra configuration, and there is a reduction in support calls due to functionality not working out-of-the-box. This has proven to be a major source of problems for security in general. A web server should start off with total restriction and then access rights should be applied appropriately.

You can check whether your web server is correctly configured by using Nikto, a great open source vulnerability scanners that is able to scan for quite a large number of web server vulnerabilities. From their site:

“Nikto is an Open Source (GPL) web server scanner which performs comprehensive tests against web servers for multiple items, including over 6400 potentially dangerous files/CGIs, checks for outdated versions of over 1200 servers, and version specific problems on over 270 servers. It also checks for server configuration items such as the presence of multiple index files, HTTP server options, and will attempt to identify installed web servers and software. Scan items and plugins are frequently updated and can be automatically updated.”

I’m going to run a default scan by just supplying the IP of the target:

$ cd nikto-2.1.4
$ ./nikto.pl -h 127.0.0.1

- ***** SSL support not available (see docs for SSL install) *****
- Nikto v2.1.4
---------------------------------------------------------------------------
+ Target IP:          127.0.0.1
+ Target Hostname:    localhost.localdomain
+ Target Port:        80
+ Start Time:         2011-12-12 13:06:59
---------------------------------------------------------------------------
+ Server: Apache
+ No CGI Directories found (use '-C all' to force check all possible dirs)
+ 6448 items checked: 0 error(s) and 0 item(s) reported on remote host
+ End Time:           2011-12-12 13:08:07 (68 seconds)
---------------------------------------------------------------------------
+ 1 host(s) tested

By looking at the last section of the Nikto report, I can see that there are no issues that need to be addressed.

Tools like Nikto and Skipfish serve as a foundation for professional web application security assessments. Remember, the more tools you use, the better.

Links

Open-source, Programming, Web Services

JavaScript: Retrieve and paginate JSON-encoded data

I’ve created a jQuery plugin that allows you to retrieve a large data set in JSON format from a server script and load the data into a list or table with client side pagination enabled. To use this plugin you need to:

Include jquery.min.js and jquery.paginate.min.js in your document:

<script type="text/javascript" src="js/jquery.min.js"></script>
<script type="text/javascript" src="js/jquery.paginate.min.js"></script>

Include a small css to skin the navigation links:

<style type="text/css">
a.disabled {
    text-decoration: none;
    color: black;
    cursor: default;
}
</style>

Define an ID on the element you want to paginate, for example: “listitems”. If you have a more than 10 child elements and you want to avoid displaying them before the javascript is executed, you can set the element as hidden by default:

<ul id="listitems" style="display:none"></ul>

Place a div in the place you want to display the navigation links:

<div id="listitems-pagination" style="display:none">
    <a id="listitems-previous" href="#" class="disabled">&laquo; Previous</a>
    <a id="listitems-next" href="#">Next &raquo;</a>
</div>

Finally, include an initialization script at the bottom of your page like this:

<script type="text/javascript">
$(document).ready(function() {
    $.getJSON('data.json', function(data) {
        var items = [];
        $.each(data.items, function(i, item) {
            items.push('<li>' + item + '</li>');
        });
        $('#listitems').append(items.join(''));
        $('#listitems').paginate({itemsPerPage: 5});
    });
});
</script>

You can fork the code on GitHub or download it.

Frameworks, Open-source, PHP, Software Architecture, Web Services

Building a RESTful Web API with PHP and Apify

Apify is a small and powerful open source library that delivers new levels of developer productivity by simplifying the creation of RESTful architectures. You can see it in action here. Web services are a great way to extend your web application, however, adding a web API to an existing web application can be a tedious and time-consuming task. Apify takes certain common patterns found in most web services and abstracts them so that you can quickly write web APIs without having to write too much code.

Apify exposes similar APIs as the Zend Framework, so if you are familiar with the Zend Framework, then you already know how to use Apify. Take a look at the UsersController class.

Building a RESTful Web API

In Apify, Controllers handle incoming HTTP requests, interact with the model to get data, and direct domain data to the response object for display. The full request object is injected via the action method and is primarily used to query for request parameters, whether they come from a GET or POST request, or from the URL.

Creating a RESTful Web API with Apify is easy. Each action results in a response, which holds the headers and document to be sent to the user’s browser. You are responsible for generating the response object inside the action method.

class UsersController extends Controller
{
    public function indexAction($request)
    {
        // 200 OK
        return new Response();
    }
}

The response object describes the status code and any headers that are sent. The default response is always 200 OK, however, it is possible to overwrite the default status code and add additional headers:

class UsersController extends Controller
{
    public function indexAction($request)
    {
        $response = new Response();

        // 401 Unauthorized
        $response->setCode(Response::UNAUTHORIZED);

        // Cache-Control header
        $response->setCacheHeader(3600);

        // ETag header
        $response->setEtagHeader(md5($request->getUrlPath()));

        // X-RateLimit header
        $limit = 300;
        $remaining = 280;
        $response->setRateLimitHeader($limit, $remaining);

        // Raw header
        $response->addHeader('Edge-control: no-store');

        return $response;
    }
}

Content Negotiation

Apify supports sending responses in HTML, XML, RSS and JSON. In addition, it supports JSONP, which is JSON wrapped in a custom JavaScript function call. There are 3 ways to specify the format you want:

  • Appending a format extension to the end of the URL path (.html, .json, .rss or .xml)
  • Specifying the response format in the query string. This means a format=xml or format=json parameter for XML or JSON, respectively, which will override the Accept header if there is one.
  • Sending a standard Accept header in your request (text/html, application/xml or application/json).

The acceptContentTypes method indicates that the request only accepts certain content types:

class UsersController extends Controller
{
    public function indexAction($request)
    {
    	// only accept JSON and XML
        $request->acceptContentTypes(array('json', 'xml'));

        return new Response();
    }
}

Apify will render the error message according to the format of the request.

class UsersController extends Controller
{
    public function indexAction($request)
    {
        $request->acceptContentTypes(array('json', 'xml'));

    	$response = new Response();
        if (! $request->hasParam('api_key')) {
            throw new Exception('Missing parameter: api_key', Response::FORBIDDEN);
        }
        $response->api_key = $request->getParam('api_key');

        return $response;
    }
}

Request

GET /users.json

Response

Status: 403 Forbidden
Content-Type: application/json
{
    "code": 403,
    "error": {
        "message": "Missing parameter: api_key",
        "type": "Exception"
    }
}

Resourceful Routes

Apify supports REST style URL mappings where you can map different HTTP methods, such as GET, POST, PUT and DELETE, to different actions in a controller. This basic REST design principle establishes a one-to-one mapping between create, read, update, and delete (CRUD) operations and HTTP methods:

HTTP Method URL Path Action Used for
GET /users index display a list of all users
GET /users/:id show display a specific user
POST /users create create a new user
PUT /users/:id update update a specific user
DELETE /users/:id destroy delete a specific user

 

If you wish to enable RESTful mappings, add the following line to the index.php file:

try {
    $request = new Request();
    $request->enableUrlRewriting();
    $request->enableRestfulMapping();
    $request->dispatch();
} catch (Exception $e) {
    $request->catchException($e);
}

The RESTful UsersController for the above mapping will contain 5 actions as follows:

class UsersController extends Controller
{
    public function indexAction($request) {}
    public function showAction($request) {}
    public function createAction($request) {}
    public function updateAction($request) {}
    public function destroyAction($request) {}
}

By convention, each action should map to a particular CRUD operation in the database.

Building a Web Application

Building a web application can be as simple as adding a few methods to your controller. The only difference is that each method returns a view object.

class PostsController extends Controller
{
    /**
     * route: /posts/:id
     *
     * @param $request Request
     * @return View|null
     */
    public function showAction($request)
    {
        $id = $request->getParam('id');
        $post = $this->getModel('Post')->find($id);
        if (! isset($post->id)) {
            return $request->redirect('/page-not-found');
        }

        $view = $this->initView();
        $view->post = $post;
        $view->user = $request->getSession()->user

        return $view;
    }

    /**
     * route: /posts/create
     *
     * @param $request Request
     * @return View|null
     */
    public function createAction($request)
    {
        $view = $this->initView();
        if ('POST' !== $request->getMethod()) {
            return $view;
        }

        try {
            $post = new Post(array(
                'title' => $request->getPost('title'),
                'text'  => $request->getPost('text')
            ));
        } catch (ValidationException $e) {
            $view->error = $e->getMessage();
            return $view;
        }

        $id = $this->getModel('Post')->save($post);
        return $request->redirect('/posts/' . $id);
    }
}

The validation is performed inside the Post entity class. An exception is thrown if any given value causes the validation to fail. This allows you to easily implement error handling for the code in your controller.

Entity Class

You can add validation to your entity class to ensure that the values sent by the user are correct before saving them to the database:

class Post extends Entity
{
    protected $id;
    protected $title;
    protected $text;

    // sanitize and validate title (optional)
    public function setTitle($value)
    {
        $value = htmlspecialchars(trim($value), ENT_QUOTES);
        if (empty($value) || strlen($value) < 3) {
            throw new ValidationException('Invalid title');
        }
        $this->title = $title;
    }

    // sanitize text (optional)
    public function setText($value)
    {
        $this->text = htmlspecialchars(strip_tags($value), ENT_QUOTES);
    }
}

Routes

Apify provides a slimmed down version of the Zend Framework router:

$routes[] = new Route('/posts/:id',
    array(
        'controller' => 'posts',
        'action'     => 'show'
    ),
    array(
        'id'         => '\d+'
    )
);
$routes[] = new Route('/posts/create',
    array(
        'controller' => 'posts',
        'action'     => 'create'
    )
);

HTTP Request

GET /posts/1

Incoming requests are dispatched to the controller “Posts” and action “show”.

Feedback

  • If you encounter any problems, please use the issue tracker.
  • For updates follow @fedecarg on Twitter.
  • If you like Apify and use it in the wild, let me know.
Design Patterns, Programming, Software Architecture

JavaScript: Asynchronous Script Loading and Lazy Loading

Most of the time remote scripts are included at the end of an HTML document, right before the closing body tag. This is because browsers are single threaded and when they encounter a script tag, they halt any other processes until they download and parse the script. By including scripts at the end, you allow the browser to download and render all page elements, style sheets and images without any unnecessary delay. Also, if the browser renders the page before executing any script, you know that all page elements are already available to retrieve.

However, websites like Facebook for example, use a more advanced technique. They include scripts dynamically via DOM methods. This technique, which I’ll briefly explain here, is known as “Asynchronous Script Loading”.

Lets take a look at the script that Facebook uses to download its JS library:

(function () {
    var e = document.createElement('script');
    e.src = 'http://connect.facebook.net/en_US/all.js';
    e.async = true;
    document.getElementById('fb-root').appendChild(e);
}());

When you dynamically append a script to a page, the browser does not halt other processes, so it continues rendering page elements and downloading resources. The best place to put this code is right after the opening body tag. This allows Facebook initialization to happen in parallel with the initialization on the rest of the page.

Facebook also makes non-blocking loading of the script easy to use by providing the fbAsyncInit hook. If this global function is defined, it will be executed when the library is loaded.

window.fbAsyncInit = function () {
    FB.init({
        appId: 'YOUR APP ID',
        status: true,
        cookie: true,
        xfbml: true
    });
};

Once the library has loaded, Facebook checks the value of window.fbAsyncInit.hasRun and if it’s false it makes a call to the fbAsyncInit function:

if (window.fbAsyncInit && !window.fbAsyncInit.hasRun) {
    window.fbAsyncInit.hasRun = true;
    fbAsyncInit();
}

Now, what if you want to load multiple files asynchronously, or you need to include a small amount of code at page load and then download other scripts only when needed? Loading scripts on demand is called “Lazy Loading”. There are many libraries that exist specifically for this purpose, however, you only need a few lines of JavaScript to do this.

Here is an example:

$L = function (c, d) {
    for (var b = c.length, e = b, f = function () {
            if (!(this.readyState
            		&& this.readyState !== "complete"
            		&& this.readyState !== "loaded")) {
                this.onload = this.onreadystatechange = null;
                --e || d()
            }
        }, g = document.getElementsByTagName("head")[0], i = function (h) {
            var a = document.createElement("script");
            a.async = true;
            a.src = h;
            a.onload = a.onreadystatechange = f;
            g.appendChild(a)
        }; b;) i(c[--b])
};

The best place to put this code is inside the head tag. You can then use the $L function to asynchronously load your scripts on demand. $L takes two arguments: an array (c) and a callback function (d).

var scripts = [];
scripts[0] = 'http://www.google-analytics.com/ga.js';
scripts[1] = 'http://ajax.googleapis.com/ajax/libs/jquery/1.4.2/jquery.js';

$L(scripts, function () {
    console.log("ga and jquery scripts loaded");
});

$L(['http://connect.facebook.net/en_US/all.js'], function () {
    console.log("facebook script loaded");
    window.fbAsyncInit.hasRun = true;
    FB.init({
        appId: 'YOUR APP ID',
        status: true,
        cookie: true,
        xfbml: true
    });
});

You can see this script in action here (right click -> view page source).

Software Architecture

Collective Wisdom from the Experts

I’ve finally had a chance to read a book I bought a while ago called “97 Things Every Software Architect Should Know – Collective Wisdom from the Experts“. Not the shortest title for a book, but very descriptive. I bought this book at the OSCON Conference in Portland last year. It’s an interesting book and I’m sure anyone involved in software development would benefit from reading it.

More than 40 architects, including Neal Ford and Michael Nygard, offer advice for communicating with stakeholders, eliminating complexity, empowering developers, and many more practical lessons they’ve learned from years of experience. The book offers valuable information on key development issues that go way beyond technology. Most of the advice given is from personal experience and is good for any project leader involved with software development no matter their job title. However, you have to keep in mind that this is a compilation book, so don’t expect in-depth information or theoritical knowledge about architecture design and software engineering.

Here are some extracts from the book:

Simplify essential complexity; diminish accidental complexity – By Neal Ford

Frameworks that solve specific problems are useful. Over-engineered frameworks add more complexity than they relieve. It’s the duty of the architect to solve the problems inherent in essential complexity without introducing accidental complexity.

Chances are your biggest problem isn’t technical – By Mark Ramm

Most projects are built by people, and those people are the foundation for success and failure. So, it pays to think about what it takes to help make those people successful.

Communication is King – By Mark Richards

Every software architect should know how to communicate the goals and objectives of a software project. The key to effective communication is clarity and leadership.

Keeping developers in the dark about the big picture or why decisions were made is a clear recipe for disaster. Having the developer on your side creates a collaborative environment whereby decisions you make as an architect are validated. In turn, you get buy-in from developers by keeping them involved in the architecture process

Architecting is about balancing – By Randy Stafford

When we think of architecting software, we tend to think first of classical technical activities, like modularizing systems, defining interfaces, allocating responsibility, applying patterns, and optimizing performance.  Architects also need to consider security, usability, supportability, release management, and deployment options, among others things.  But these technical and procedural issues must be balanced with the needs of stakeholders and their interests.

Software architecting is about more than just the classical technical activities; it is about balancing technical requirements with the business requirements of stakeholders in the project.

Skyscrapers aren’t scalable – By Michael Nygard

We cannot easily add lanes to roads, but we’ve learned how to easily add features to software. This isn’t a defect of our software processes, but a virtue of the medium in which we work. It’s OK to release an application that only does a few things, as long as users value those things enough to pay for them.

Quantify – Keith Braithwaite

The next time someone tells you that a system needs to be “scalable” ask them where new users are going to come from and why. Ask how many and by when? Reject “Lots” and “soon” as answers. Uncertain quantitative criteria must be given as a range: the least, the nominal, and the most. If this range cannot be given, then the required behavior is not understood.

Some simple questions to ask: How many? In what period? How often? How soon? Increasing or decreasing? At what rate? If these questions cannot be answered then the need is not understood. The answers should be in the business case for the system and if they are not, then some hard thinking needs to be done.

Architects must be hands on – By John Davies

A good architect should lead by example, he/she should be able to fulfill any of the positions within his team from wiring the network, and configuring the build process to writing the unit tests and running benchmarks. It is perfectly acceptable for team members to have more in-depth knowledge in their specific areas but it’s difficult to imagine how team members can have confidence in their architect if the architect doesn’t understand the technology.

Use uncertainty as a driver – By Kevlin Henney

Confronted with two options, most people think that the most important thing to do is to make a choice between them. In design (software or otherwise), it is not. The presence of two options is an indicator that you need to consider uncertainty in the design. Use the uncertainty as a driver to determine where you can defer commitment to details and where you can partition and abstract to reduce the significance of design decisions.

You can purchase “97 Things Every Software Architect Should Know” from Amazon.

Databases, Open-source, Software Architecture

NoSQL solutions: Membase, Redis, CouchDB and MongoDB

Each database has specific use cases and every solution has a sweet spot in terms of data, hardware, setup and operation. Here are some of the most popular key-value and document data stores:

Key-value

Membase

  • Developed by members of the memcached core team.
  • Simple (key value store), fast (low, predictable latency) and elastic (effortlessly grow or shrink a cluster).
  • Extensions are possible through a plug-in architecture (full-text search, backup, etc).
  • Supports Memcached ASCII and Binary protocols (uses existent Memcached libraries and clients).
  • Guarantees data consistency.
  • High-speed failover (server failures recoverable in under 100ms).
  • User management, alerts and logging and audit trail.

Redis

  • Developed by Salvatore Sanfilippo and acquired by VMWare in 2010.
  • Very fast. Non-blocking I/O. Single threaded.
  • Data is held in memory but can be persisted by written to disk asynchronously.
  • Values can be strings, lists or sets.
  • Built-in support for master/slave replication.
  • Distributes the dataset across multiple Redis instances.

Document-oriented

The major benefit of using a document database comes from the fact that while it has all the benefits of a key/value store, you aren’t limited to just querying by key. However, documented-oriented databases and MapReduce aren’t appropriate for every situation.

CouchDB

  • High read performance.
  • Supports bulk inserts.
  • Good for consistent master-master replica databases that are geographically distributed and often offline.
  • Good for intense versioning.
  • Android, MeeGo and WebOS include services for syncing locally stored data with a CouchDB non-relational database in the cloud.
  • Better than MongoDB at durability.
  • Uses REST as its interface to the database. It doesn’t have “queries” but instead uses “views”.
  • Makes heavy use of the file system cache (so more RAM is always better).
  • The database must be compacted periodically.
  • Conflicts on transactions must be handled by the programmer manually (e.g. if someone else has updated the document since it was fetched, then CouchDB relies on the application to resolve versioning issues).
  • Scales through asynchronous replication but lacks an auto-sharding mechanism. Reads are distributed to any server while writes must be propagated to all servers.

MongoDB

  • High write performance. Good for systems with very high update rates.
  • It has the flexibility to replace a relational database in a wider range of scenarios.
  • Supports auto-sharding.
  • More oriented towards master/slave replication.
  • Compaction of the database is not necessary.
  • Both CouchDB and MongoDB support map/reduce operations.
  • Supports dynamic ad hoc queries via a JSON-style query language.
  • The pre-filtering provided by the query attribute doesn’t have a direct counterpart in CouchDB. It also allows post-filtering of aggregated values.
  • Relies on language-specific database drivers for access to the database.

Links

Open-source

OSCON 2010, The O’Reilly Open Source Convention

A couple of weeks ago I attended the O’Reilly Open Source Convention (OSCON) in Portland. OSCON has hundreds of sessions and activities focused on all aspects of open source software. I met some great people, the talks were good and I saw some promising ideas and technologies.

Workshops attended

  • Android for Java Developers
    Marko Gargenta (Marakana)
  • Building a NoSQL Data Cloud
    Krishna Sankar (Cisco Systems Inc)
  • Building Native Mobile Apps Using Open Source
    Kevin Whinnery (Appcelerator)

Sessions attended

  • Building Mobile Apps with HTML, CSS, and JavaScript
    Jonathan Stark (Jonathan Stark Consulting)
  • Open Source Tool Chains for Cloud Computing
    Mark Hinkle (Zenoss), John Willis (Opscode, Inc.), Alex Honor
  • Doctor, I Have a Problem with My Innovation.
    Rolf Skyberg (eBay, Inc.)
  • Ingex: Bringing Open Source to the Broadcast Industry
    By Brendan Quinn (BBC R&D)
  • membase.org: The Simple, Fast, Elastic NoSQL Database
    Matt Ingenthron (NorthScale, Inc.)
  • Introducing WebM: High Quality, Royalty-Free, Open Source Video
    John Koleszar (Google, Inc.)
  • Whiskey, Tango, Foxtrot: Understanding API Activity
    Clay Loveless (Mashery)
  • Deploying an Open Source Private Cloud On a Shoe String Budget
    Louis Danuser (AT&T Labs, Inc.)
  • Eucalyptus: The Open Source Infrastructure for Cloud Computing
    Shashi Mysore (Eucalyptus Systems Inc.)
  • Hadoop, Pig, and Twitter
    Kevin Weil (Twitter, Inc.)
  • Mahout: Mammoth Scale Machine Learning
    Robin Anil (Apache Software Foundation)
  • BlackBerry development for Web Application Developers
    Kevin Falcone (Best Practical Solutions)
  • Practical Concurrency
    Tim Bray (Google, Inc.)
  • Scribe – Moving Data at Massive Scale
    Robert Johnson (Facebook)
  • Make Open Easy
    Dan Bentley (Google)