When I first built the Weekly timetable in My Book It I was particularly proud of the achievement. The page draws a table that changes columns with every load, it automatically updates itself as changes are made to the database. It is probably the single most complicated page in My Book It.
When it was first created the database had very little data in it, maybe 100 classes in a week and a few hundred associations behind the scenes. Due to the small amount of data it was very easy to read from the database often and the site still perform quickly. Over the last two years however these databases have grown significantly in the number of records that are being held.
This created a problem where my once fast running Weekly Timetable now could take over 12 seconds to load for a site that had been running for two years on my hardware. This is a big problem, it was compounded by the possibility of moving the site off my own equipment and onto a VPS. (Original tests of the page took over 30 seconds to load).
With the recent growth of My Book It, a move to a VPS (or similar) was imminent, so this is an issue that had to be addressed and sooner rather then later. It would be a hard sell to say we are moving the site to improve reliability and performance, and by the way your timetables will take 30 seconds to load first time around. (The second time it loaded it was cached in MySQL)
So the first issue was to identify what was causing the slow loading. After watching what was happening behind the scenes while the page loaded I noted that the MySQL process would hit 100% CPU for the duration of the load. This proved that the issue was to do with the database queries and not a limit of PHP or apache. Now that I had the suspected cause, I needed to review the code to try and identify the cause.
The original code basically did the following
## Work out the variables required. ## Get the start times for the week from the classes table ## Work out how many in each class from the assoc table (1 x lookup for each class) ## Draw the timetable ## For each class slot, look up the student information (1 x lookup for each student in each class)
That worked well, but the issue was that in a school of 300 or so students, it could easily look at the database over 1500 times and as the tables grew in size so did the time it took to get the information as some of these queries were very complicated and could join over 3 tables together to get the information it was looking for.
So now that I know the issue, what could be done about it? After much thought about it I came up with a few possible solutions. (and their possible issues)
- Archive old data – I could hold a few months (maybe a years) worth of data in the primary tables. Then on a regular basis archive old data out to another table. Then either build in some smarts to reference both or have an archive set of timetables. The issue with this was that my data is not liner (i.e. the age of the record corresponds to its location in the data).
- Increase resources to the machine. – I could add faster CPU’s, more RAM, increase the cache sizes. All of these would be easy but expensive to do. My Book It is designed to run lean so this was discounted early. Also this isn’t a fix it just delays the same issue to a point where you have spent a lot of money and just delayed the issue.
- Re-write the page. – This was the best option, but it was also the most daunting as the page I had written works, and how could I be sure that my replacement would work or have any difference at all?
After much thought about it all I decided that the right way to go was to re-write the page. This was very daunting as I could spend a lot of time re-writing the page and it may have no difference to the performance.
So I re-wrote the code with the following principals
- Reading data from the database should only happen once, I should never have to touch a record twice. (It can be stored in variables for use later)
- Database reads where expensive and memory is cheap. (i.e. variables are better then reading from the database)
In the end I re-wrote the page to look like
## Get the variables for the week ## Get all class information into an array (1 x query) ## Get all student information into an array (1 x query per class) ## Draw the timetable using the information from the arrays
This worked well it reduced page loads to approx 2 seconds down from 15. (The time was the same on both my hardware and the VPS) This is a fantastic result.
This was a very difficult thing to do. I can understand now why software companies a reluctant to re-write bad code. A lot of work could be poured into re-writing a page with little or no benefit, maybe they are unable to re-think code as this was a challenge to myself (often my re-thinking ended up with the same idea as the original), and with core components, it would be very difficult to not run into unintended consequences in re-writing how software works. (i.e. it will break something that depends on it running the way it did)
Overall, I am really glad that I did re-write this and not do one of the other methods. It was rewarding for myself to be able to rethink how I write my code and for My Book It customers in terms of performance and now reliability.