We won the Data Science Institute’s Summer Datachallenge!

Last week, Alvis Tang (another PhD student at the Centre for Complexity Science) and found out we had won the Summer Datachallenge, a data science competition hosted by Imperial’s . Well, we were joint winners with another Physicist at Imperial, Jason Cole but we were still very pleased.

The challenge involved taking a load of raw data concerning London in 2012, including house prices, tweets, Olympic medals, theatre ticket sales – and the data we ended up using which was from London’s transport system. The tube data consisted of the numbers of entries and exits from each tube station, in 15 minute intervals throughout the whole year, coming to a ~100MB text file. We began by just exploring the data to see what patterns emerged.
This is the number of people entering, and exiting my local station, Kennington, this day 2 years ago.


As you might expect, there are 2 daily peaks around the morning and evening rush hours (this was a Tuesday). Since Kennington is a alrgely residential area, there are more people entering in the morning, and more leaving in the evening, and the evening peak is more spread out – suggesting people leave for work at roughly the same time but get home at lots of different times. Got similar plots for every other tube station in the network and then decided to have a look at the Olympic period in the summer of 2012 to see whether there was a significant difference. In particular, we were interested as to whether TfL’s `Get Ahead of the Games’ scheme, designed to get commuters to work from home was very successful.

So we picked a few stations around business areas, and compared their traffic on Olympic days against traffic on a usual summer day.
BankCanary Wharf

London BridgeLiverpool Street

Here we are looking at the total traffic through the station – ie. entrances + exits. For almost all the stations, this plot looks the same. There is no real difference during peak hours, but outside peak hours there is a small increase in passenger numbers. There is one exception to this rule which is Canary Wharf. Here there was a drastic reduction in numbers during peak commuting hours – so it looks like the `Get Ahead of the Games’ campaign worked a lot better here than it did anywhere else.
We guessed that the difference is largely due to HSBC, who reportedly had up to 40% of their workforce , which more than accounts for the difference at commuting peak times.
We also looked at the individual entry gates to the station and found that the East entrace accounted for almost all of the drop in traffic with the West entrance seeing little difference, and it is the East entrace which is nearest to HSBC’s offices.

Leave a Reply

Your email address will not be published. Required fields are marked *