FC Inter Milan Using MySQL 5.1 Partitioning and Event Scheduler
If you are a football (soccer) fan, then F.C. Internazionale may mean something to you. Or perhaps its more common name, Inter Milan sounds familiar? Next in our Use Case Competition articles, here's Corrado Pandiani's experience, about using MySQL to power inter.it, one of the most popular soccer sites in Europe. Not only that, but inter.it is using the latest MySQL 5.1 technology.
The Inter Milan squad in 2008/09 features stars such as Zlatan Ibrahimović, Luís Figo, Hernán Crespo, Marco Materazzi, Patrick Vieira and Adriano Leite Ribeiro.
At F.C. Internazionale Milano Spa we've used MySQL for eight years, starting from 3.23 and up to 5.0. Recently, we have started using satisfactorily the partitioning and event scheduler features of MySQL 5.1. The tasks that most benefit from the new features are:
- real time logging of page impression requests of all our websites (http://www.inter.it is the main site)
- real time statistics such as: page impressions per site, page impressions per language, page impressions per country (using GEOIP), unique usertrack sessions and others;
- scheduled and on demand stats with more detailed information such as: single page impressions, user session durations, most visited pages, most visited paths on other stuffs we are developing.
Why we need partitioning
In the past, we logged web accesses to a daily log file, which exceeded 2 GB in size. For us, this is a huge amount of data. Now, the data is stored directly in a partitioned MySQL database.
A year ago, when we thought about logging web accesses to the database, we were limited by issues such as:
- using ARCHIVE table to save disk space were limited by the fact that a single ARCHIVE table can't grow more than 2GB. That was in 5.0. Now, with 5.1 the limitation has been lifted, and besides, using partitioning, we don't hit this problem any longer.
- using MERGE tables to keep on line months of logged data was not a great solution primarily for performance problems and the complexity of managing the automatic creation of new tables in the merge set. Also in this case, partitioning turned out to be a more flexible choice.
We found that partitioning in 5.1 was a very useful feature for us. We use it to keep online various months of data and to calculate statistics in an optimized way.
Why we need the MySQL 5.1 Event Scheduler
We have developed (and we plan to further develop) a series of stored procedures to generate advanced statistics daily, hourly and monthly. These SPs are invoked using the event scheduler instead of using the operating system cron.
Our Development Environment
- Operating Systems: Ubuntu Linux / MacOSX
- Development Language: Perl
Our Deployment Environment
- OS: Linux Debian
- Hardware: Pentium III 1GHz , 2GB RAM, 80GB HDD RAID0
Integration with Perl and Apache
We developed a short Perl script to log page referrers of a hidden image from every web page, and to calculate counters in real time.
When the web server starts, the script creates prepared statements with all needed INSERT queries. At every page request, the script generates a set of user variables with all needed values and then EXECUTEs the prepared statements.
The webtraffic script is piped to Customlog configuration of Apache server.
How partitioning meet our needs
Creating partitioned tables is very simple and the MySQL documentation on the partitioning feature is quite clear. The advantage of using partitioned tables is performance: All our scheduled statistics are computed very quickly. The daily statistics finish in less than 3 minutes, and the monthly in less than 10 minutes.
In the past, using a 5.0.xx version and not having partitioned tables, we had to wait a longer time to compute daily stats. In fact, the daily run was so expensive in terms of execution time that we simply gave up on the idea. Thus, we don't have comparison execution time on more complexstatistics.
Now, with partitioning, we run our daily stats with no issues and even expect to be able to further trim our response times. Notice also that our server is not at the moment a very powerful machine, but for our tasks we amazingly discovered it has the needed power.
To show the kind of traffic we deal with, we can see the table of average time of visits and most active countries for all sites (3 september 2008)
or the top pages stats for the most visited portal www (3 september 2008)
or the daily stats for all sites (data referred to 3 september 2008)
or the hourly stats for the most visited portal www (3 september 2008)
Attached to this article you can find a compressed file with some scripts:
- webtraffic.txt: the logger script piped to apache
- apacheconf.txt: configuration of Apache. Piping customlog to webtraffic script
- db_structure.txt: structure and descriptions of the tables
- calcola_stats.sql: stored procedure to calculate daily statistics
- calcola_stats_mese.sql: stored procedure to calculate monthly statistics
- gecountry.txt: stored function to resolve an IP address using GEOIP tables
- event.txt: event scheduler definition