Sunday, March 17, 2013

Should Europe get fewer World Cup spots?

We’ve got two weeks of international football coming our way, so I thought I’d do something on the World Cup and who gets to qualify for it. Specifically, whether certain confederations (to borrow FIFA’s terminology) like UEFA or CONMEBOL (the South American equivalent of UEFA) are over-represented at the World Cup.

(Before I get into this, I want to let everyone know that I don’t really have an agenda, I’m just doing this because it seemed like something worth doing. I mean, the headline could just as well have been something dry like ‘World Cup Qualification 1930 – 2014’.)

I tried to use various criteria to judge how over- or under- represented a confederation is. Population size, area of the globe covered, contribution to the world’s GDP, military expenditure and FIFA’s own world rankings. I go into why I selected these factors and how I calculated them in the methodology portion of this post below. (I could have gone into it now but I didn’t want to drive away someone who may be a casual reader and is just looking to play with the Tableau visualization a bit. Want to make sure she or he gets something out of this post too.)

Simply put, what I’m saying is that, if say, the countries in Africa together have 20% of the world's population, they might have grounds for asking for 20% of the 32 places in the World Cup too. (Or 31 places, given that the tournament hosts qualify automatically.)

WHAT’S IN IT
So what do you get in the visualization? There are three charts- top-left is a bar chart which shows us how many slots out of the 31 available should be given to a confederation according to the various criteria mentioned. You use the drop-down menu to select a confederation, and the bars will resize according to the confederation’s ‘strength’ in the respective field.

If you look to the right of the confederation drop-down menu, you will see an option for ‘Actual no. / percentage’. If you select ‘percentage’, what it does simply is recalculate with the base as 100 instead of 31 in the ‘Actual no.’ option. Incidentally, changing that will affect both the bar and the pie chart to its right. The pie chart is nothing but the distribution of seats among the various confederations according to the criterion you choose in the drop-down menu above it.


Finally, the line chart at the bottom gives us the number of slots awarded to each confederation over the years. So you tick the boxes of the confederation you’re interested in learning about and lines will appear along with a colour legend to let you know what’s what. Again, you can get the actual number of slots competed for or you could get the percentage of slots awarded. In this case, using percentage makes the figures comparable across years, because the base kept changing as the tournament got bigger, from 14 to 22 to 30 and now 31. So using percentage instead of the actual number of slots gives us a truer picture of how confederations have been treated by FIFA over the years.

WHAT TO LOOK AT
If you want some initial advice on what options to choose, I have just two words for you--dig in! Now you can pretty much guess what most options will result in. (With two exceptions that I will come to later.) For example, if you choose population for the pie-chart, you know that it will grant the majority of slots to the Asian federation because China’s there. Or that if you choose military expenditure, CONCACAF (Confederation of North, Central American and Caribbean Association Football) will get most of the slots because of the US and its huge defense budget.

What I did find interesting was that if you take the UEFA option in the bar chart, according to every single criterion I use, the 13 seats Europe has been allocated is more than it deserves. (That should be the default view you’re presented with as your data viz loads.) Apart from Europe, if you click on Oceania and look at the FIFA ranking, Oceania is surprisingly under-represented. Long the forgotten step-child of FIFA, its winner having to compete every 4 years in a playoff with a nation from Asia or South-America depending on which side of the bed Sepp Blatter’s gotten up from, it seems its member states have actually been doing well enough for the region to get its own automatic qualifying spot.

THE PLEA
The way things are now, spots at the World Cup are gained and lost through a long attritional process of negotiation and horse-trading and there is hardly any transparency to the procedure at all. There is no periodic reassessment of the slots a confederation is awarded, in the way UEFA does when it takes a Champions League spot away from Serie A and gives it to the Bundesliga; not because the German FA haggled harder but because German teams performed better and a consensually-agreed upon statistical formula rewarded them for that. I realize that this is an issue that most people aren’t really aware of but if I’ve made this an issue that people discuss, or at least think about, then the purpose of this blog-post is served.

Things are going to get a bit boring from here on out, so if I’m already testing your patience with this long blog post, you can get on with the rest of your day. Thanks for stopping by!

 ------------------------------------------------------------------------------------------------------------

HOUSE-KEEPING
Ok, now to explain my methods.

All that I did was calculate weights according to how much each confederation contributed to the world’s area, its population, the GDP (PPP) and the military expenditure. Got the figures for the first three criteria from the CIA World Factbook  and the military expenditure data from the SIPRI military expenditure database

Now why did I choose these criteria? I guess I took the ‘world’ in ‘World Cup’ a little too literally, and was determined to find out how much of the world was actually represented at the tournament. So factors like area covered and population size seemed natural indicators to use. GDP (PPP), I guess, was used as some sort of proxy for economic power and military expenditure as a proxy indicator for political power. I still have reservations over using military expenditure but in the absence of another readily-available indicator I could borrow to represent political influence, SIPRI’s data will have to do.

I kind of anticipated the objection people would make that the World Cup is not just about representation but also about merit and about the world’s best teams playing each other. So I used FIFA’s ranking data to arrive at some kind of meritocratic measure.

In order to arrive at the strength of a confederation, what I did was calculate the average number of points of the top 10 nations in each confederation and used that to arrive at a weight. Now I think that’s a relatively unsophisticated but still reasonable way of going about it but if anyone has a different and better idea of how it should be done, do let me know at ultimateposeur@gmail.com and I'll make sure to incorporate that method in the next visualization I make (whenever that is). I'd also be interested in seeing your take on this and in fact, I would welcome it if you could use the dataset provided and make your own graph, chart etc. with your software of choice.

VIEW DATASET

The historical data for the line chart I got from, where else?, RSSSF.com.

FOOTNOTES INDICATOR-WISE
SIPRI military data
--Now all the SIPRI figures are from 2011 and I've taken a few shortcuts that academics might have heart-attacks over, such as using 2009 figures when there are no 2011 figures available for a country. Now this isn't meant for publication in an academic journal, so I think getting some sort of idea is better than having no insight at all.
--This is what I did, I took the figures for Central African Republic from 2010, Benin from 2008, Equatorial Guinea put at 0, Iceland from 2009, Iran from 2008, Reluctantly put North Korea at 0,  Libya from 2008, Luxembourg from 2007, Malawi from 2007, Mauritania from 2009, Myanmar at 0, Somalia at 0, Sudan from 2006, Qatar from 2008, Tajikistan from 2004, Turkmenistan from 1999, Uzbekistan from 2003, Yemen from 2008. Also, countries put at 0 are most likely not at 0, did it that way because SIPRI didn't have figures for them.

Population indicator
--Used figures from Gaza and West Bank in CIA Factbook for Palestine
--Used figures from French Polynesia in CIA Factbook for Tahiti

Historical timeline data
--Playoff places are counted as 0.5, seems to be the best way to deal with that problem
--What created additional problems for me was that FIFA used to adjust qualifying places according to which continent was hosting and whether a country from that continent was the defending champion. For eg. In Italia 90, countries from South America were competing for 2.5 places, instead of the 4 places on offer in Mexico'86 because Argentina was the defending champion. So then the Q. is whether I should consider CONMEBOL as up for 2.5 slots or 3.5 slots including Argentina?
--Up to and including 1982, Oceania didn’t have a separate group of its own but instead was treated as part of Asia. There were quasi-Oceania type groups though in the Asian zone for the 1978 and 1982 World Cups but not before that.
--Africa only got a separate slot of its own from 1970 onwards; there were combined Asia, Africa and Oceania groups before that.
--Just 13 teams were at the 1950 World Cup, but a lot of teams that had otherwise qualified withdrew, so I'm going to treat it as the 16 team tournament it was meant to be
--1938 was meant to be a 16 team tournament but only 15 teams competed
--In the inaugural World Cup, there was no qualification, so I'm going to just count the affiliations of those who were invited


COMMENTS INVITED -- BE NICE!
If you have to be critical, be gentle, imagine that I'm a friend standing in front of you and you're trying not to hurt my feelings but still hope to point out where I went wrong. Please don't use the fact you're not with me in person as a license to be mean!

Monday, June 11, 2012

On the Guardian homepage!

Ok, maybe not the Guardian's homepage but have gotten on to Guardian football's homepage where they've listed my post on pre-tournament odds as one of their favourite/favorite things this week. Visual proof below!



Big thank you to Sean Ingle ( @seaningle ) the sports editor of Guardian online (and their resident betting aficionado!) for putting me on there. Has definitely made my day!

Goals & Assists for 2011-12

The interactive dashboard below helps you explore the 2011-12 season data for goals and assists of players from clubs in the first divisions of 8 leagues - England, Spain, Germany, Italy, Russia, Portugal , Holland, France.

This includes goals and assists for those clubs in all the domestic (Leagues, Cups, Supercups) and European (Champions & Europa League) competitions the clubs were involved in.

You can filter the data by league by ticking or unticking boxes, and you can also build custom lists and compare your favourite players and clubs using the two search boxes.

Imscouting.com, which is where I got the data from, doesn't give you the number of appearances each player made but they do note how many minutes they played. So I used that data to see how effective players were in scoring goals and making assists in a 90-minute period, which would be the duration of a typical match.

The default setting for minutes played in 2011-12 season is at 1000 min. , you might want to lower or raise that.

As mentioned before, all of this data has been obtained from the imscouting.com website.


Notes

----Have not taken the Club World Cup into account, which is why Messi is at 71 and not 73 goals in my dashboard.

-----Have taken some justifiable liberties, such as restricting players to a single club. So if like the Austrian player, Marc Janko, you transferred from Twente to Porto, only your current affiliation is listed.

---Do note that imscouting's data regarding assists is different from the opta stats used by whoscored.com etc. For eg. Fabregas is at 18 assists according to Opta but he's at 13 according to Imscouting.

Dtd: 2012-05-27 twitter.com/ultimateposeur ultimateposeur@gmail.com

Wednesday, June 6, 2012

How far behind the champions were Arsenal under Wenger?

Had created a graphic a month back comparing Arsenal's performances under Wenger to the champions of each Premier league season.Kept on hearing that this has been arsenal's worst season from all the pundits, so I wanted to see what measure would truly tell us what has been arsenal's worst season.

So what you'll see from the graphic below is the number of points we were behind the title winners each season (the figures in white font with the red bars) and also a truer comparison, the line chart above the bar chart, our points as a percentage or the winning points total. So even if were 12 points behind in both 2004-05 and 2010-11, you can see that, on a percentage basis, we did worse in 2010-11 (85 per cent of winner's total compared to 87.5 % in 2004-05).

 
The surprising thing from this graph is that in terms of percentages and absolute points, 2005-06 was our worst ever season. We were 24 points behind and we were at the lowest percentage of 73.6 . (In case, you're wondering, in the season that's just gotten over, 2011-12, the percentage was 78.5, which is the 3rd worst.) Of course, some of you might rightly point out that Arsenal got to the champions league final that year, so that year can't really be termed our worst. And you might be right. In terms of premier league points  though, you can't deny it.

twitter.com/ultimateposeur ultimateposeur@gmail.com

Tuesday, June 5, 2012

Odds & Performances

Everyone’s already come out with their Euro 2012 visualization, so just wanted to put mine out before the tournament starts. (Shout out to two other visualizations/dashboards I liked - here  and here .)

Now the moment the draw is made for the finals of any major football/soccer tournament, there’s always a newspaper article that comes out talking about what the odds are for various teams to win the whole tournament. These odds change over the months before the tournament starts and I wanted to catalogue what the odds were in the week before the tournament, specifically for the 16 countries competing in Euro 2012.

This dashboard is my attempt to catalogue tournament-eve odds from the British bookmaker Ladbrokes for the past nine major tournaments, going back all the way to USA’94!

(Why didn’t I go back any further? Because before then you still have countries like the undivided USSR & Yugoslavia competing, and that raises lots of issues for what I was attempting to do, for example, whether past Soviet performances should be seen as part of Russia’s record or Ukraine’s. This is just a simple data-exploration dashboard and I really didn’t want to overcomplicate matters, so 1994 it is.)

Needless to say, trying to get the odds data was a pain but, I think, what I’ve managed to get -- while sticking to free sources -- should conform to the records that paid services like betbase and mabel’s tables have. (Also, since you ask, the main source for most of my data was the site thefreelibrary.com which hosts old issues going back to 1998 of the British betting news periodical “The Racing Post”.) Do note that the odds data I used wasn’t exact-moment-before-the-tournament-starts odds but some-time-in-the-week-before-the-tournament-starts odds.

(You will note that I’ve used decimal odds instead of the original fractional odds form of the data. Decimal odds are just more easily recognized by graphing applications.)

Now to the actual dashboard. There are three separate charts in it.


The chart in the top left corner plots the tournament-eve decimal odds for various countries against their performances in a specific tournament. (Choose a year from the drop-down menu.) The top axis lists the various stages a country can reach in either the World Cup or the European Championships while the vertical axis lists the decimal odds in reverse.

As for what you can learn from the graph, a rule of thumb is that if you see colored dots representing countries in the top left of that particular graph, those countries have underperformed while dots that appear in the bottom right of that graph represent overperformance by a country against the odds.

Similarly, the dashboard in the bottom left corner plots the tournament-eve decimal odds for a specific country in various tournaments. (Choose a country from the drop-down menu.)

The third and last chart plots the odds for a country(s) over several tournaments. This chart was done partly as a concession to the fact that you can’t really see what the exact odds for a country at a specific tournament are from the other two charts. Hopefully, this third chart will make up for that.

Other notes and caveats:
--If you want the original data just send a mail to the address listed below and I’ll forward it to you. If you’re looking to use it in a semi-professional or professional context though, you might be better off paying for the actual data from one of those above-mentioned firms that archive historical betting data.

--The axes resize as you choose different countries and years, just keep that in mind, when you’re choosing different options from the drop down menu. It isn’t really a like-for-like visual comparison. (Resizing axes does make it easier to view the data though.)

--Line breaks in the line chart when choosing particular countries represent periods when they hadn’t qualified for any major competition.

--This is elementary but I guess I do have to mention it. The years in two of the charts have the initials WC and EU next to them, for eg. 1994-WC. WC stands for World Cup while EU stands for European Championship.

--And finally, if you want to know what the latest odds are for Euro 2012, here’s the page  from Ladbrokes

ultimateposeur@gmail.com twitter.com/ultimateposeur