Friday, December 26, 2014

Jammu & Kashmir's near polarised verdict, explained in maps.

The assembly elections in Jammu & Kashmir came up with very interesting results. The results could be summarised as having reflected the name of the state - representing Jammu AND Kashmir. In other words, it was as if the different regions of J&K had completely different political choices in mind and in essentially in effect.

I try to show that through three maps here.

The first one is a map of which party won which constituency (click on the individual constituencies to find out the winner/ runner up /and their respective voteshares). Obviously this is a constituency map prepared from Election Commission Data. (Shoutout to Datameet for helping source this from eci.nic.in)

Map 1: Who Won Where?




The second map is simply a map that represents the proportion of people adhering to the dominant religions in the state (saffron represents Hinduism, green represents Islam and mild blue - mostly Buddhism among others), across various tehsils in the state. Data for this map is sourced from Census 2001. (To my knowledge, religion wise breakup of the census 2011 for the state is as yet unavailable online. I assume that the proportions haven't really changed much since 2001 even if the actual numbers have risen as is to be expected).

Map 2: Composition of Jammu & Kashmir by Religion (Hindu/Muslim/Others)



A eyeball comparison of the two maps shows how polarised the election was. The BJP simply won heavily in all the constituencies that were Hindu majority by a large margin (in the Jammu region), whereas the parties based in the valley won most of the seats with a Muslim majority. The Congress party did quite well in Ladakh and Kargil, where the chunk of "other religions"- Buddhism in particular- were concentrated.

There is of course Kishtwar, Doda (and to some extent Bhaderwah) with most of its Tehsils having a higher proportion of Muslims among the population, which has been won by the BJP.

A more detailed map that shows how each party performed in each constituency (intensity map showing vote percentages of the four main parties across all constituencies) will elaborate how there is a clear regional divide in the political choices in the state. (Use dropdown at the bottom of the map to choose the respective parties)

Map 3: Vote Percentages of Respective Parties across Constituencies


 

Sunday, August 24, 2014

Comparing overseas batting records

India's abysmal performance in the recent series against England in that country has been universally panned. While India did manage to win one test and its bowlers performed creditably well (relatively) - running out of luck with dropped catches galore - it was the all round failure of the Indian batsmen that has caught the eye.

But there hasn't much too surprise with the Indian record in England recently. Indian batsmen have traditionally struggled outside the sub-continent as they have to encounter either faster pitches, better seaming and swing conditions or tracks that aren't flat enough. Indian pitches, on the other hand, are relatively more conducive to turn, include a number of flat tracks, and are more difficult for faster bowlers than is the case elsewhere. That is the commonly understood story.

Is there to empirically verify this using nifty data visualisation tools? There is!

We set out to find if Indian batsmen are relatively worse off than the average batsmen elsewhere on overseas tracks.

What we do here is to not just use simple averages to compare batsmen, but to use a measure which is called, "Runs over average batsmen" for our purposes.

It is not enough to simply compare averages of batsmen on overseas pitches as this measure will not compare a player from one era to another. That is because a batsman in a particular era could face better bowlers (or worse) as compared to another. There are also various rule changes/ cricketing conditions (one bouncer per over since the 1990s or no helmets prior to the mid-1970s for examples). It is therefore simply not accurate to term that X with an average of 50 in the 1990s and who has played just 25 innings overseas is better than Y with an average of 40 in the 1970s and who has played 60 innings.

Therefore, what we ought to do is find the average number of runs scored in a particular set of years in which a player played, and then calculate the difference between the total number of runs scored by the batsman and this average. This will be the "overall_value" of the batsman. It is an intuitive idea that is similar to what Australian economist Nicholas Rohde used in his controversial paper to study batting records across times.

To illustrate, take VVS Laxman. He has an overall average of 45.97. His overseas average is 42.64. He has played 225 innings (34 not outs) in his career. How does he compare to someone like Gundappa Viswanath, a similar stylist from the past? During VVS Laxman's career between 1997 and 2012, the average number of runs scored by batsmen was 33.1. In overseas tests, VVS Laxman added a difference of 8.92 (42.02 - 33.1) and therefore contributed 8.92 * 109 (such innings played) = 1040.1 runs as his overall value added as compared to the average batsman of his era. Similarly, Viswanath's overall value added was 312.39 runs over the average player of his era.
We do this exercise for all batsmen who have played test cricket from 1877 to the present. And present the results in a nifty graph as below. (Hover on each cell to see the data. Lighter colours depict a better value and darker a lower value for the batsmen. Click on the countries to view country specific data).

 

What we notice here is that there is not too much of a difference in the overall overseas records of Indian batsmen as compared to the best of the cricketing world. India does have a sizeable number of batsmen having above average "value added" runs in overseas tests as compared to the top team, Australia. Among Indian batsmen though (click on India to view more details for Indian batsmen alone), it is evident that the previous generation of batters- Sachin Tendulkar, Rahul Dravid, VVS Laxman and to a lesser extent, V. Sehwag and S. Ganguly, constituted the best ever core India has had since it entered test cricket. Barring Sunil Gavaskar and Mohinder Amarnath in the earlier generation, no other batsman of any other era has a better overseas record than the aforementioned.

The current generation, meanwhile, has a long way to go to live upto the record of the previous one. Barring A. Rahane to some extent, most other Indian batsmen of the current team has been poor on overseas tours relative to the average batsman of this era.

Saturday, August 16, 2014

The place of Rangana Herath

Rangana Herath just completed 250 wickets after taking a fantastic 9/127 against Pakistan in an ongoing test (as I write this) played at the SSC, Colombo. 

The event prompted me to check out whether this diminutive, unheralded, unsung and hardworking bowler stood among his tribe of spin bowlers. 

I did a simple data comparison. Extracted the Top 20 spin bowlers (by wickets taken) from Cricinfo's Statsguru, and then calculated a metric - "Bowl Index" (copied from this source). "Bowl Index" basically takes into account both bowling average and strike rate. 

Here's the formula: (runs conceded)^2/(balls bowled * wickets taken)

And then I normalised the formula to account for total innings bowled (Bowl Index * 1000/Total Innings Bowled). 

The resulting data is as below: 




A Graphical representation of the above data list is below:





Rangana Herath, thus far, ranks just below Bishen Singh Bedi and Clarrie Grimmet in the all-time list. Not bad at all for the Lankan spin lynch-pin whom no one expected to take over the giant shoes of Muthiah Muralitharan. 

Tuesday, July 29, 2014

Summary of recent data related pieces written by me.

Over the past year, I have written a number of pieces as part of data journalism. All of the pieces are election-related (but of course it was a major election year in India). Links to the pieces (with short descriptions) that I have written are presented here: 

a) The Aam Aadmi Party's win in the Delhi Elections. (Written for the EPW Web Exclusives). The article tries to use GIS tools to understand the reasons favouring the AAP's win in the Delhi assembly elections. It comes up with interesting insights: Link

b) Articles related to the Lok Sabha elections in the EPW: 

i) Explaining the high turnout in the 2014 elections: Link . This article was written as elections were underway using preliminary voter turnout information released by various Chief Election Officers of different states (and UTs) in India. It seeks to explain the very high turnout in the 2014 elections and identifies variances, unexplored reasons. 

ii) Preliminary statistics from the 2014 election results: Link This article was written post the Lok Sabha elections and provides visualization of results - voteshares, constituency winners and losers, party performance and so on. 

c) A case for proportional representation in Uttar Pradesh: (written for the site, kafila.org in 2012 and basically an analysis of the election results in the state then. The slightly misleading headline is not mine). Link

d) Recent pieces in indiatogether.org, written in March-June this year (as part of a Data Journalism fellowship): 

i) On Fragmentation in India's political system over the years (since 1977) and regionalisation: Link. I seek to show the regionalisation emphasis in India since 1977. 

ii) What sways the urban voter? Results of a survey conducted by the Association of Democratic Reforms (ADR) among others. Link. This article uses survey data to highlight urban voter choices across the country. 

iii) The AAP's Performance in Punjab - its salient features. Link. I use polling booth data and other insights to find out whether the AAP's performance in the Punjab LS polls replicated its Delhi victory from last year. I find out that the reasons for the Punjab victory were different from the Delhi victory

iv) Explaining the many reasons for the UPA's defeat. Link. Here I used regression techniques to filter out reasons for the UPA's defeat through analysis of available empircal information. 

v) Voter Turnouts across India and explaining the variance: Link . An extended version of the article written in the EPW on voter turnouts, this time correlating these with survey data on voter preferences. 

e) I curated this election special page using data from all Lok Sabha elections since 1977 and narratives for the EPW:link. Please see bottom of page for election statistics from 1977 onwards. 

f) I managed to scrape data from Election Commission's live results page and run a visualisation on maps using Google Fusion Tables live during three assembly elections for North East states in 2013. Location: http://epw.in/elections

Forthcoming & Pending in late 2014: 

An article on the West Bengal Lok Sabha election results. An indepth look at the results using polling booth level data from both the 2014 LS polls and the 2011 assembly elections. 

A research article on explaining the presence/absence of the "incumbency effect" over the years using disaggregated survey data provided by the CSDS' Lokniti.