Optimizing ZF for High Volume Traffic

Presenter Notes

Disclaimer

  • This is a high level overview.
  • Most of the things that we do may or may not be applicable to your project.
  • I'm a first time conference speaker, please go gentle.

Presenter Notes

Background - Me

  • PHP Developer since 2001
  • Zend Certified Engineer (PHP 5 & PHP 5.3)
  • Worked for NationalGuard.com
  • Work at Discovery Communications, Inc. (Discovery Channel)
  • Work specifically on HowStuffWorks.com.
  • You might know me from IRC/Twitter as mFacenet.

Presenter Notes

Joind.In Reviews

http://joind.in/3971

Presenter Notes

About HowStuffWorks.com

  • Founded in 1998 by Marshall Brain.
  • Covers various topics and describes "How They Work"
  • Has won several awards.
  • Offers various podcasts including "Stuff You Should Know"
  • Bought by Discovery Communications in 2007.

Presenter Notes

Webby Awards, Time Magazine's "25 Web Sites We Can't Live Without" and PC Magazine's "Top 100 Web Sites"

HowStuffWorks.com Stuff

Statistics:

  • We serve over 4.5 Million Pages every day.
  • We have 970,000 unique URLs serving approximately 442,000 pieces of content.

Our Stack

  • Apache 2
  • Apache Traffic Server
  • Solr
  • MySQL
  • Memcached
  • PHP 5.3 w/ APC
  • Zend Framework 1.7
  • Zend Framework 2.0 (in pre-release status)
  • MongoDB
  • Presenter Notes

    So what do I mean by optimize?

    • Not how to micro-tune your application.
    • Not how to scale to an unlimited amount of horizontal nodes.

    These are foolish endeavors most of time (unless you are Facebook)

    • Get the right mix of scalability, performance and minimization of yaks shorn per feature.

    http://www.flickr.com/photos/theasarya/1879734854/in/photostream/

    Presenter Notes

    My goal today is to show you some of the things we did to compromise the flexibility of ZF with the performance we wanted & the horizontal scalability we needed. Won't be showing a terribly large amount of code, just enough to demonstrate the point. Any apparently useless activity which, by allowing you to overcome intermediate difficulties, allows you to solve a larger problem.

    A blurb about Zend Framework

    Zend Framework

    For us, Zend Framework is:

    • A decent library that allows us to not have to write the system plumbing.
    • Decently well performing.
    • Flexible to our needs.
    • Easily extensible so that we can override behavior.
    • We've been using it since 0.1 and it's been in production for us since 0.7 so in some cases we've diverged greatly from the framework.

    Presenter Notes

    What Should Everyone be doing?

    Presenter Notes

    Opcode Caching

    If you are not using opcode caching (APC, Wincache, Zend Optimizer+) you're missing out.

    Without APC: 84 req/sec


    With APC: 331 req/sec

    (This was done last night on my server, just a simple test using "ab")

    Presenter Notes

    How easy is APC to install? apt-get install php5-apc on ubuntu or pecl install apc

    Background Processing

    Heavy processing should be run in the background, there are several ways to do this here are some:

    • Gearman.
    • Beanstalkd.
    • PCNTL extension on Unix.
    • Cron Jobs.
    • In house job queues (surprising how many places have these.)

    Presenter Notes

    Not going into this hardcore, entire tutorials are built on this subject. Good examples for things to process in the background, api calls to register or update information externally, processing on uploads, Indexing of content, etc

    So what are we using right now?

    • Zend Controller
    • Zend FrontController
    • Zend View
    • Zend View Helper
    • Zend Router (more on this shortly)
    • Zend Dispatcher
    • Zend Registry
    • Zend Log
    • Zend Mail
    • Zend\Loader

    Presenter Notes

    And what are we generating with it?

    • Everything but the ads.
    • On article pages, we use facebook for the comments system.
    • Our services, internal APIs, and AJAX providers are all using ZF.
    • Our CLI scripts mostly use ZF as well.

    Presenter Notes

    But why aren't we using X?

    • Zend Application - It's a bit heavy if you know exactly what you want.
    • Zend Config - Parsing INI Files is slow and requires disk IO.
    • Zend Form - We're better off writing forms by hand.
    • Zend Db - We have our own modeling Layer, it's not great but it's better suited for what we wanted.
    Badges

    Presenter Notes

    Zend Form

    Matthew Turland wrote an interesting blog post about Zend Framework 1 Forms & the Plugin Loader which will aide with Forms in larger implementations, his post:

    http://bit.ly/zfforms

    Presenter Notes

    So what have we done?

    Presenter Notes

    Autoloading

    • Probably the biggest change we've made to the architecture since I joined HowStuffWorks.com
    • We use ZendLoader (from Zend Framework 2).
    • Specifically we are using the Classmap Autoloader.
    • With this autoloader we saw a 15%+ performance increase over using include/include_once.
    • Was stable back in April (according to MWOP)

    So why does this work?

    • No longer searching include paths (stating for files is expensive).
    • Direct include (no _onces) and not stats to check the file system.

    Presenter Notes

    Compromise here was that we now have to generate a classmap upon deployment. Remember to strip out include_once statements in ZF, these will include every tid bit of framework with them.

    Stuck on Zend Loader 1?

    I strongly strongly strongly strongly advise moving to the ZF2 psr0 and classmap autolaoders, there are some strong problems with the ZF1 Autoloader.

    • It iterates through each include path to try to find a file matching what you're looking for.
    • This is slow because it has to then stat the file system for every include path to see if the file exists.

    Here's what the Zend Framework guide says on how to optimize the Include Path (if you absolutely have to use it)

    • Use absolute paths.
    • Reduce the number of paths you define.
    • Define the library path as early as possible.
    • Define the current directory last, or not at all.

    Presenter Notes

    What about require_once

    • Tends to require files that may not actually be needed.
    • Should be stripped out.

      find . -name '.php' -not -wholename '/Loader/Autoloader.php' -not -wholename '*/Application.php' -print0 | xargs -0 sed --regexp-extended --in-place 's/(require_once)/// 1/g'

    Presenter Notes

    Server Stuff

    Hatchett

    • Moved Rewrite rules to Apache Configuration.
    • We have a favicon.ico file in the root, you should too!
    • Disabled overwrite (.htaccess.)
    • Disabled Apache modules we're not using.

    Presenter Notes

    Check your httpd.conf and see how many modules exist in there that you technically don't need to use.

    Our Configuration System

    Settings

    • Moved Configuration to a PHP file that's processed prior to PHP execution. (using auto prepend file)
    • All Configuration is written in PHP and set to the $_SERVER superglobal (and subsequently cached in APC).
    • Extended the Request object to act as a gateway to the $_SERVER superglobal and contain default values for common settings.

    Presenter Notes

    This has several benefits, the request object is available in various places so it's given us a chance to abstract API key's and the such out of code into an object that's generally available. Can't really cache ini files (Zend_Config uses readfile for this).

    Zend Config

    Problems with Zend Config

    • Uses parse_ini_file or simplexml_load_file to process the configuration file. (Not Cacheable, stats the filesystem, etc)
    • These have to tokenize the files, not absurdly slow but still additional time.
    • This is done EVERY time you make a request.

    We can do this better

    • Create your configuration in a PHP array.
    • Set via apache SetEnv.
    • Set via preprocessed file (what we're doing for now.)
    • At the very least cache your results from INI or XML

    Presenter Notes

    Zend Config via Array

    $config = require(APPCLIATION_PATH . '/configs/application.php');
    $application = new Zend_Application(
        APPLICATION_ENV,
            $config
    );
    

    The File

    <?php
        return array(
            'key' => 'value'
        );
    

    --or--

    <?php
        $config = array()
        if (APPLICATION_ENV == 'development')
        {
            $config['key'] = 'value';
        }
        return $config;
    

    Presenter Notes

    This is kind of what's going on in ZF2, it's a lot faster since you're not compiling a Markup to the language & APC will cache this.

    Zend Config Caching Code

    Replace apc_fetch/apc_store with memcache::get/memcached::set, wincache_ucache_add/wincache_ucache_get, etc.

    $config = apc_fetch('my_config');
    if (!(is_array($config)) {
      require_once 'Zend/Config/Ini.php';
    $section = APPLICATION_ENV;
    $filename = APPLICATION_PATH . '/configs/application.ini';
      $config = new Zend_Config_Ini($filename, $section);
      $config = $config->toArray();
      apc_store('my_config', $config, 600);
    }
    
    // Create application, bootstrap, and run
    $application = new Zend_Application(
        APPLICATION_ENV,
        $config
    )
    

    Presenter Notes

    I know we don't use Zend Config but Rob Allan (Akrabat) has given this tip and it's very valid.

    Zend Router

    • We completely replaced the Zend Router implementation with our own.
    • Our system searches a MySQL table using b-tree indexes to figure out the route.

    Why?

    • The stock rewrite router iterates through every route until it finds the one it's looking for, by using MySQL we are querying in place using indexes.
    • Allows us to process the 970,000 urls in our system, including adding new urls on the fly.

    Presenter Notes

    This is where it becomes important to remember we rewrote this to suit our needs, we have well over 900 thousand urls in our system, at this point it would have been a bit ridiculous to try to update a configuration file with each new additional URL plus it would have taken significant space in memory. The cost of querying the database and waiting for a response here was far cheaper than it would have been to use the loop structure that ZF1 uses currently.

    And last but not least - Caching

    Presenter Notes

    Object Caching

    • When using MySQL or external services, you generally want to cache the return where you can.
    • Shared Memory Caches are good for single server deployments, if you go larger Memcache or Redis is good.
    • You can use Zend Cache, we didn't but mostly only because it wasn't a component at the time we started using it.

    We use Memcache, some of the things we cache:

    • Routes
    • Database Returns
    • PHP Sessions (we use Memcache as our Session Save Handler)

    Presenter Notes

    CDNs & Reverse Proxying

    • Allows you to generate output and cache it.
    • Works great for us, we have mostly static content and can invalidate content at will.
    • Completely bypasses ZF, PHP, etc.
    • Several providers in the space, Akamai is the one we use.
    • Additional benefits include geographical distribution.

    Presenter Notes

    I put this last because it's probably the biggest thing you can do to improve your performance. If you attended Ralph's Session on ZendCode yesterday this is a very similar premise.

    In Closing

    Just a quick summary:

    • Choose what components you need, most are written for flexibility not performance.
    • Use autoloading, strip the include and require statements everywhere.
    • Use Opcode caching, you're a fool if you're not in production.
    • If you use Zend Config, cache the results, using INI or XML means this isn't cached by default.
    • Routing can be slow if you have many routes, try to make these explicit.
    • Cache where possible, especially front end.
    • Serve resources from a CDN.

    Most Importantly

    • Don't shoehorn things into Zend Framework, if it doesn't work for you don't force it.

    Presenter Notes

    Questions?

    Presenter Notes

    Thank You

    Presenter Notes