Wednesday, April 13, 2011

Writing a Bare-Minimum PHP Application

PHP has a pretty bad rep for being cluttered, inconsistent, and ugly. Why would any language use \ as a namespace delimiter? Or –> for dereferencing object members? Well, because, . is reserved strictly for string concatenation… HA! But although PHP has many shortcomings, it remains the most ubiquitous language for web development today. Perhaps one of the most distressing things about PHP is that it makes it very easy to write bad code. The gentle learning curve assures that PHP is heavily used by beginners and is the server-side language of choice for almost every shared web hosting service. It also assures that there is a lot of bad, bad code out there.

When I first learned PHP, I remember wondering why anyone would want to write a class or a function when you could drop a bunch of PHP inside a bunch of HTML and get instant gratification. I wrote a fan-finder application for a fan site that my friends and I ran at the time this way. Users of our fan site would write a little bio and upload a picture and they would be added to an interactive world map of all users who used the service (a few hundred). There was no database, everything was saved into a pipe-delimited text file, which needed to be read and written by nearly every page in this application. It got completely out of control! The same code for reading/writing the data file existed in at least twelve different places, all written slightly differently. It was riddled with bugs. I barely managed to keep it alive—I actually had to maintain the data file by hand at times. I learned very quickly how important it was to keep my code minimal and clean. Some languages attempt to force you to do this, to a degree—but PHP made it so easy to do everything the wrong way.

Nowadays I would just use one of the many excellent PHP frameworks out there, such as Code Igniter. But sometimes it's nice to build everything yourself. So I spent some time designing an ultra-light, barebones PHP app which I would consider to be the minimum required amount of structure for a small PHP application, in order for it to be able to survive and evolve. This is intended mainly for PHP beginners--my "framework" is by no means an optimal solution, and many other designs would be sufficient also, but here's what I would do:

 

Simple Structure

For reference, this project is going to consist of the following files and folders:

./
    /application
        entry.php
        application.php
        router.php
    /pages
        home.php
        info.php
        notfound.php
    .htaccess

The top-level folder is in a sub-folder of my WAMP installation, so I'll access these files from localhost:8080/qdphpapp/. This example will deal with the base directory not being /, but it wouldn't be an issue if you were building this application at the root directory of a website.

 

.htaccess URL Rewriting

In some shared hosting environments, this feature may not be available – but if it is I highly suggest taking advantage of it. The .htaccess file defines access rules and overrides for the server, which change how a user can interact with your website. In this case I'm using URL rewriting: using a regular expression to match the request URL, and redirecting that request to a different file. This is one method websites can use to achieve pretty URLs: things like http://mywebsite.com/products/page/1 instead of http://mywebsite.com/products.php?page=1. I'm simply going to redirect every request to /application/entry.php:

RewriteEngine on
RewriteRule ^(.*) application/entry.php

Save this file as .htaccess. The first line turns the URL rewriting module of Apache on, and the second line defines a rule. It just says "redirect everything to application/entry.php". Now with this saved to the parent directory of your website, any access to that directory (or any subdirectories) will be redirected to an entry point script – so we better make sure it has a script to execute.

 

Goodbye index.php

index.php as a general-purpose entry point to an application has always bothered me. The fact that it gets executed at all is only incidental to the folder hierarchy of the application and the behaviour of the webserver. in most applications, there is a single, well-defined entry point, which is what I've aimed to accomplish in this project. It is aptly named entry.php and is the only file in the application that executes when loaded:

<?php
// application/entry.php

//All relative URLs will start from the application's top directory
chdir("../");

try
{    
   
//Include classes that will be used by the application
   
include_once("application/router.php");
    include_once(
"application/application.php");
    
   
//Set up application parameters and begin
   
Application::$URLBase = "qdphpapp";
   
Application::Run();
}
catch (
Exception $e)
{
    print
"<pre>{$e}</pre>";
}

The first executable line makes it so that relative file paths will start from the parent directory of the application (since the entry script is in the subdirectory "/applications", includes will start from "/applications" by default, and so referencing a file from the pages directory would look like "../pages/file.php").

Next we've got a try-catch block: this will be the very top level of error handling: any exceptions that slip through the cracks will be caught here, and something useful can be done with them (the application should usually still fail, but in this way we can at least show a pretty error page and log the error instead of PHP just letting the user know there was an uncaught exception). In the try block we include the application components (in a real-world example, you would also probably include a database class, an authorization class, and possibly others), set up the executing environment for the application, and run it. In the catch block all we do is print out whatever exception we found (this is surprisingly better than the default: PHP doesn't print out uncaught exceptions. It just tells you that they happened).

Application as seen inside the try block, is a static class that is representing the whole application. The entry point is just a layer which gets ready for the application. In this case we're telling the application that it isn't executing in the top level directory of its domain (in this case it's in a subdirectory) which we'll see in action in a minute. So what happens when the application is run?

 

The Application Class

Nothing special. The application class is a layer which contains information about the application. In this example we've just got the base URL. Since I've added a separate Router class which takes care of which page to load, all that will happen when the application is run is routing:

<?php
//application/application.php

class Application
{
    public static
$URLBase = "";    
    
    public static function
Run()
    {
        try
        {
           
Router::LoadPage();
        }
        catch (
PageNotFoundError $e)
        {
           
Router::LoadPage("notfound");
        }
        catch (
InvalidURLError $e)
        {
           
Router::LoadPage("notfound");
        }
    }
}

You might want to add other things to this class as well: maybe invoke an authorization class to see if the user is or needs to be logged in. You might also add a Debug method which allows other parts of the application to write text to the page or log file only if a debug mode is turned on. But this is a bare minimum example, so I'm just redirecting the user to the requested page. You can also see that we're testing for a couple of exceptions from the Router class: namely a PageNotFoundError and an InvalidURLError. Both of these result in loading a "notfound" page, which will just be a 404 error page. You may want a separate page if the user enters an invalid URL (we'll see what that means in the next section).

Obviously I'm not putting additional try blocks inside of the catch blocks, even though I'm calling the same method I was trying in the first place. I don't see the use in doing so: in placing Router::LoadPage("notfound"); inside a catch block, I'm assuming that there won't be any trouble displaying an error page to the user. If there is, then that exception should bubble up to the top and cause the application to fail: obviously something is wrong if we can't display an error page.

So what's happening inside that Router class? Let's take a look…

 

The Router Class

The router class is the crux of this whole project. It allows us to separate our logic into classes and methods rather than files and folders. It makes it so /products/view/1 refers to a call to a method view() of the class products passing 1 as a parameter, instead of /products/view.php?id=1 referring to a file in a folder called with a query string. It allows us to decide how a URL gets mapped to our application, instead of being reliant on the filesystem for our application's structure.

In this demonstration I've only got one simple route: the first URL element is the class name, the second is the method name, and all the rest are passed as arguments to the method. But any and multiple rules could be used.

So how is this one rule implemented? Well the first step is the LoadPage() method…

<?php
// application/router.php

class Router
{    
   
//Executes a method on a Page object based on URL parameters
   
public static function LoadPage($URLOverride = null)
    {
        list(
$page, $method, $arguments) = self::URLElements($URLOverride);
       
$pageObject = self::GetPageObject($page);
       
call_user_func_array(array($pageObject, $method), $arguments);
    }
 
}

There's a couple of things going on here. Three variables are extracted from the return value of a method called URLElements(). Then a page object is created by a GetPageObject() method, which then gets one of its methods called with an array of arguments. The routing magic happens in the URLElements() method:

    
   
//Gets the elements necessary to load and execute a page from the URL, 
    //returning them as a 3 item array: (page, method, arguments).
   
protected static function URLElements($URL = null)
    {
        if (
$URL === null)
           
$URL = $_SERVER["REQUEST_URI"];
            
       
$trimmedURL = trim($URL, "/");
        
       
//If the application isn't located at the top directory, get rid of the
        //base part of the directory for processing
       
$applicationURL = ltrim($trimmedURL, Application::$URLBase."/");
       
$parts = explode("/", $applicationURL);
        
       
//if the "/" was requested, explode will return a 1 element array
        //with an empty element. Change it to an empty array for the next step
       
if (count($parts) == 1 && $parts[0] == "")
           
$parts = array();
        
       
self::assertValidURL($parts);
        
       
//Build the array of URL elements
       
$page = (count($parts) > 0) ? $parts[0] : "home";
       
$method = (count($parts) > 1) ? $parts[1] : "index";
       
$arguments = array_slice($parts, 2);
        
        return array(
$page, $method, $arguments);
    }

The first two lines of code in the method allow for a path to be specified by the calling code, rather than just grabbing the information from the request URL. We utilize this in application.php when we try to load a page called "notfound" instead of the requested page.

Then we clean up the URL and turn it into something we can use. We get rid of leading and trailing slashes, and then remove the base part of the URL if the application doesn't occupy the root of the website. Next we split it into an array delimited by forward slashes. I also changed it into an empty array if there was only one element which was an empty string – it makes more sense to have 0 elements since there's effectively nothing there (this occurs when "/" is requested of the application).

Finally, we specify the first element as the page with "home" being the default, the second element as the method (to call on the page object) with "index" being the default, and the rest (if any) to be arguments to the method. I'll not that the defaults for the page and method can be anything but should not be the same for this example. Since $page corresponds to the name of a class, and $method to the method that will be called on that class, the page object would have to contain a method with the same name as the class. This is also called a constructor, and will always be called when the object is created.

You can also see an assertValidURL() method just before we return the elements. This is where an "invalid URL" will be detected, and in this case I just used a simple regular expression:

    protected static $URLSegmentPattern = '/^[a-zA-Z0-9_]*$/';

   
//Makes sure that a URL matches a pattern, or throws an exception.
   
protected static function assertValidURL($parts)
    {
        foreach(
$parts as $part)
            if (
preg_match(self::$URLSegmentPattern, $part) == 0)
                throw new
InvalidURLError();
    }

Like I said, all I'm doing is comparing each element of the URL to a regular expression. This prevents tricky requests like /../../../etc/passwd (not that that would do anything in this case, but you get the idea).

The only thing left to examine in this Router class is the part where we get the page object:


   
//Includes a file and creates an instance of a page object
   
protected static function GetPageObject($pageName)
    {
       
$pageClassFile = "pages/".$pageName.".php";
        
        if (!
file_exists($pageClassFile))
            throw new
PageNotFoundError();
            
        include_once
$pageClassFile;
        
        if (!
class_exists($pageName))
            throw new
PageNotFoundError();
            
        return new
$pageName();
    }

Again, this is extremely simple. All we do is check if the file for the page exists, and if so, include it. Then we check if the class for the page exists, and if so, instantiate and return it. In this case our pages will consist of a class in a file of the same name (e.g., the class info will reside in info.php). This looks insecure, but the assertValidURL() method has ensured that $pageName only contains alphanumeric characters by this point. If you wanted to allow a wider range of characters into URLs, it would be prudent to check that $pageName is a valid filename in this method.

You can just define the exceptions we used at the bottom of this source file, below the Router class definition:


//Thrown when a requested page cannot be found
class PageNotFoundError extends Exception 
{ }

//Thrown when the user requested a page with an invalid URL
class InvalidURLError extends Exception
{ }

Now we're done the framework! We can start writing pages that will be accessible from our pretty URLs as soon as they're saved.

 

Adding Pages

The first thing we'll add is the home page. Recall that the default class name for this application is "home" and the default method is "index":

<?php
// pages/home.php

class home
{
   
//Default function called when no method is specified
   
public function index()
    {
        print
"calling index on home.<br />";
    }
}

and when we navigate to it…

qdphpapp_index

Awesome! But what if we try and get to /info/distance/moon

qdphpapp_uncaughtexception

Recall that in case a page can't be found, we'll just try and load the "notfound" page. We don't try to catch the exception in the case that the "notfound" page is not found, and lo and behold the exception bubbled all the way up to the application's entry point where we catch and print all exceptions. So we'd better create a not found page:

<?php
// pages/notfound.php

class notfound
{
    public function
index()
    {
        print
"Page not found<br />";
    }
}

qdphpapp_notfound

And while we're at it, we may as well create this "info" class as well!

<?php
// pages/info.php

class info
{
    public function
index()
    {
        print
"some interesting facts";
    }
    
    public function
distance($to = "moon")
    {
        print
"the distance to the ".$to." is... ";
        
        switch(
$to)
        {
        case
"moon":
            print
"400,000 km!";
            break;
        case
"sun":
            print
"150,000,000 km!";
            break;
        default:
            print
"I have no idea.";
        }
    }
}

qdphpapp_info

qdphpapp_distance

qdphpapp_moon

qdphpapp_sun

qdphpapp_you

I think it's a lot more sane to develop even small web applications this way. The code required to get it set up is tiny compared to the extensibility that you gain. This kind of development allows you to assume complete control of your application. Obviously my example is incomplete – there's no support for databases, templates, etc., and it would be a huge pain to try and serve media with this set up. But it's a good starting point. I'll leave it as an exercise for the reader to expand upon this idea and implement the missing parts.

 

Where to now?

There are a few advantages to this setup that i haven't taken in this example. One obvious one is to expand on the routing a little more. Most MVC frameworks will allow you to define a list of patterns to match requests against, and where those requests will be sent.

Another would be to create a Page superclass, which all other pages would extend. This would make it infinitely easier to implement pages which are variably private (require some level of authorization to view), as well as to display a header/footer around the page.

Adding database, template, and authorization classes to the application folder would allow a broader range of projects to be made with this example.

These are just a couple examples of how you might try to build on this project. If you're new to web development and are used to writing applications the way I once did, I encourage you to work with this example and try and build a small project with it. Expand on it as necessary, while trying to keep it in modules. Never repeat the same code twice. You'll find that the application is easier to develop, maintain, and debug.

6 comments: