Running a successful website is a constant struggle for performance and speed improvements. Read how we have used the Seagull framework to build our portal podcast.de upon it. As a start-up we provide a web based service to find, comment, play and recommend audio and video podcasts. At the moment the service is intended for a German speaking audience only but we are prepared for internationalisation thanks to Seagull.
In February of 2006 I published the first version of podcast.de based on an early 0.4 release of Seagull. It took me over fourteen month to modify Seagull to fit my needs. I wrote several new modules and hacked lots of things like clean urls. Many of the things are now part of the core (and are not written by me). One thing did make it into the release. The export module which generates RSS feeds. RSS feeds are what our system is really about. Not knowing any better I built the portal based upon a MySQL database. An XML db might have been a better choice at least in some areas. The biggest table holds over 500.000 entries. These ar all the references to podcast episodes and its metadata. In the time being we upgraded from a MySQL db version 4.x to the latest, stable version of the 5th release. Database performance has been an issue since the start only at the beginning it was not noticeable enough. The layout of the database should have been designed less complex. With the amount of data and the overall usage growing we ran into hown-grown issues. With indexes, query optimisations, better hardware, newer MySQL releases, a migration to split sequence tables, db caching, MySQL tweaking and most importantly the replacement of most DataObjects code we got fairly good performance on the db side.
Caching is really the one life-saver every webmaster should look out for. Seagull has built-in template and nowadays library caching support which is nice. We used the SGL_Cache class for general purpose data caching as well. With a rising number of pages and a climbing user base the file-based cache turned into a bottleneck as the harddrive had to seek through several thousand files. A different file-system for smaller files might have helped for a while. We chose to replace the file-based with memory-based caching. It is just a five line hack in SGL_Cache and a little wrapper! In our case we decided to use memcache. Memcache(d) is commonly used. The PHP part is installable through PEAR and with PHP5.x which we switched to in our development cycle you have an OO interface. Shared memory would have been an alternative. At first memcache worked very well but having a few thousand objects stored it slowed down. For a while we had to flush the cache every now and then because we could not find anything related on the net which explained this behaviour. Migrating to a new server with a newer linux and a newer memcached installed solved the problem. We could not replace the memcached before because of dependency issues.
Another performance issue is rendering the navigation. This got a bit better over time. The solution again is to cache most parts of it. There are still some other minor issues with the DB-based navigation. I have not tried the file-based one, yet. With DB-based nav you are not able to add an alias with two slashes. That is the biggest drawback for us at the moment. I wrote my own alias strategy for this which works well. Proper I18N might be the next issue once we need it.
I am still not friends with emailing in Seagull. As default Seagull sends HTML mails. I am not a big fan of that. I want to send text and HTML mails combined. Ages ago I sent in a patch which did not get accepted. A couple month back an almost identical patch got sent in. Neither did that one get accepted. Anyhow we are using this modification successfully.
Now that I named all the issues with Seagull letÂŽs have a look at the good parts. The overall structure is complex but fairly clean and intuitive once you start working with it. The overall progress might be slow but the framework is steadily improving thanks to Demian, Lakiboy and all the other active developers. The framework is flexible and versatile. We are using it with several different interfaces (web, mobile, iptv, search). I hooked up Zend libs succesfully into the system without problems. We have several crawlers running using cron jobs and the CLI interface. Migrating from 0.4 to 0.6 was an effort but thanks to the loosely coupled core possible.
Since the last relaunch of podcast.de at the end of September ÂŽ07 we are intensively using AJAX. We moved the database to its own server. We have the static properties on another separate server. We tried to use lighttpd as http server instead of ApacheÂŽs http-server to reduce the load. We failed. I could not get it stable on load.
These days we welcome over 10.000 unique visitors a day. Almost 25.000 people registered for our service. We serve over two million pages a month with one Seagull server. We have 18 custom Seagull modules, 6 different themes (not all in use, yet) and three years of development spent on the system. The database now has 112 tables with around 1 GB of data. The next step in performance upgrades will be a second database server for a read-only slave. Before that we have to modify SQL query executions to differ between read and write operations. At the moment there is room for more so pay us a visit at www.podcast.de and let us know how you like it! einer