Fantasy Football and OBIEE Part 1 : You Can Be Statto!
One of the demos that we showed off at the recent UKOUG Conference in Birmingham was a football stats demo build using BI EE. The basic idea here was to upload a few years worth of Premier League statistics, create a semantic model and then use it to answer two question:
-
Picking a particular attribute (score at half time, number of fouls, goals scored etc), how high in the league do you think your team would have reached last year, and
-
Using these stats and some commonly used valuation models, given a theoretical large amount of money, which team would be the best to invest in (comparing Birmingham City, for example, with Chelsea)
The basic idea was to have a bit of fun with some publically available statistics and use them to put together an interesting BI EE demo, but the demo was received quite well and i thought it made sense to blog about the intricacies of what we did, including some nice features especially in the Repository design which i shall be blogging about today. The complete idea for this was conceptualized by our Business Development Manager, Chris Raby who got us all the necessary data to prepare the demo. I shall be covering the details of what we did to accomplish that in the next blog post.
All the data that you will be seeing here are based on actual data that we gathered from multiple websites.
When we started to build this demo we gathered our data from the football stats website here. They have extensive stats of all the matches that were played in English Premier League, Championship etc for the last 5 to 6 years or so. The data is available in the form of CSV files that one can download. The below screenshot basically shows you in what form the data exists.
Basically every match score was recorded in the form of Home Team and Away Team i.e each row in the CSV file had the stats for a single match. It had a set of stats like Number of Shots on Target, Number of Goals etc for both the Home Team as well as the Away Team. This was maintained in multiple CSV files for each year. One of the first things that we did was to load all of these multiple CSV files into a single Oracle Database table (with an extra Year column). I just used external tables for these. This ensured that we had the data in the desired format.
Once we had the data, the first thing that we had to do was to design the repository. If you look at the data that we have, every row contains the stats for 2 teams. Just to illustrate the complexity involved(in the RPD design), lets just take 2 rows of data shown below for the year 2008. Lets assume that only these 2 matches were played in the whole of year 2008
HomeTeam AwayTeam HomeGoals AwayGoals Result
Arsenal Chelsea 0 1 A
Chelsea Arsenal 2 1 H
Now, a common reporting question is how many matches did Arsenal Win, Lost and Drawn in year 2008. If we look at the table above, we can easily say that Arsenal Won none, Lost 2 and Drew none in the year 2008. But from a SQL perspective this can be a little bit tricky. Since each row contains stats for HOME and AWAY teams together, we need to traverse the table twice, one from a Home Team perspective and the other from an Away team perspective.
SELECT Team, Sum(Won) as Won, Sum(Lost) as Lost, Sum(Drawn) as Draw FROM (SELECT HOMETEAM as Team, CASE WHEN RESULT = 'H' THEN 1 ELSE 0 END as Won, CASE WHEN RESULT = 'A' THEN 1 ELSE 0 END as Lost, CASE WHEN RESULT = 'D' THEN 1 ELSE 0 END as Drawn FROM PREMIERSHIP_FOOTBALL_STATS WHERE HOMETEAM = 'Arsenal' AND YEAR = 2008 UNION SELECT AWAYTEAM, CASE WHEN RESULT = 'A' THEN 1 ELSE 0 END, CASE WHEN RESULT = 'H' THEN 1 ELSE 0 END, CASE WHEN RESULT = 'D' THEN 1 ELSE 0 END FROM PREMIERSHIP_FOOTBALL_STATS WHERE AWAYTEAM = 'Arsenal' AND YEAR = 2008) GROUP BY TEAM
Similarly if we need a report showing the number of Home Goals Scored and Away Goals scored by Arsenal, one will have to write a similar query as shown above. This is an excellent example wherein the physical data model is very straightforward, but the actual reporting requirements necessitate quite a bit of long and complex SQL.
You can now see why the BI EE repository design is not straightforward even though we just have a single physical table. So, to model this we start with identifying the possible Dimensions and Facts. From a reporting standpoint, we are doing analysis on Football Matches. So, we start with defining a dimension called Match. The attributes of a Match are
1. A Team & its opponent
2. Match Location
3. Match Date
4. Referee
And the Fact metrics are, Number of Matches Won, Number of Matches Won (by Home Team), Number of Matches Won (by Away Team) etc
Once we have identified this, we need to model the Match dimension conforming to 2 fact tables (Home Team Measures and Away Team Measures). The main reason why we are doing this is, when you analyze a team, it can either be a home team or an away team. We need to make BI EE to fire 2 separate queries (one for the home measures and the other for the away measures) and then join them in its memory(or in database memory through sub-query joins).
Then in the dimension logical table, we map 2 LTS . The first LTS will have Your Team attribute mapped to the HOMETEAM column in the Physical data source. The 2nd LTS will have Your Team attribute mapped to the AWAYTEAM column in the physical data source.
This is the only part which might take a while to understand. But if you can understand this, then creating all the measures will become very straightforward. Our final Presentation layer looked something like this
Using this model, if we wanted to create a sample report containing the number of matches won, Lost and Drawn by Arsenal, BI EE would fire 2 separate queries (2 queries that form the part of UNION in the above SQL) and then will stitch them externally as shown above in the sample query. We can control where the join (with a sub-query join pushed to the database or a BI Server in-memory join) happens.
This model turned out to be extremely flexible. For example, you can pick a team and analyze the stats for the team in isolation. Then you can compare the stats of your favorite team with some of the opponents that you do not like. This can further be opened up to Home and Away performance of each team against each opponent if required. There were some other stats that a lot of people were interested to see. For example, people wanted to know the referee who was responsible for maximum red cards in a season. While betting on a team, we need to analyze the pros and cons of each team by looking at the team stats over a period of time. Also, there are other extraneous factors like biased referees etc. So, using this model one can find out which referee has the potential to be biased against which team.
There were quite a few other things this model supported. For example, currently the points system in the EPL is 3 for win, 0 for Loss and 1 for Draw. People wanted to know if the number of points for a win is reduced to 2 or increased to 2 for a Draw, who would have won the championship. What if, we added a negative point for every red card for a team in a match, who would have won the championship. Some more scenarios like number of matches that a team was winning in Half time but went on to lose were possible as well(Adrian wanted an attribute for this which is the correct way, but again we had so many measures exposed that we were able to derive it out of the existing measures). A sample screenshot of some of the reports that we showed is given below
As you see, as part of the demo, we also did a bit of UI customization to match our company standard. I especially liked the login screen
The entire UI customization that we did was pretty straightforward. All we did was, we created a style so that the necessary dashboards can be altered to our look and feel. Apart from the style, we also created a Skin so that the every aspect of the dashboards/answers were customized as well. Myself and Mark have been discussing about a possibility of creating a small utility to automate the customization of the entire look and feel of BI EE. If we ever manage to create that, as always, we will share them here.