Exposing CBSE and ICSE: Statistical Insights into the True Lies on your Marksheets


What you see above is the score distribution of CBSE English Core scores for the Class 12 examination of 2013. Absolutely nothing can justify or explain the stupendous spike at 95 (and other erratic spikes) which clearly indicates irresponsible levels of arbitrariness in the grading of papers by the board. 
My interest in programatically scraping ICSE and CBSE results from their website, after their online release, over the last 2 years, will confirm the fears of many, that board examination evaluation in India, is a joke. I have inspected and analyzed over a million records from the CBSE, ICSE and ISC results over the past 2 years. 
Addition: I actually analyzed 10 years of data for CBSE, and have exposed their marking scandal in greater detail over here. An inspection of 2014 data has revealed similar score tampering as well.
While it is worrisome that boards did not seem to care about the privacy of their students, we need to be thankful that they were careless enough to leave their sites unguarded, so that we could all actually get a chance to get an insight into their non-transparent grading patterns. Apart from that, it severely disturbs the relative standings of students - something which apparently matters a lot for admission to the NIT system (on the basis of information which I've received). 


Since 2012, I have been analyzing and publishing some information and statistics, regarding something which matters a lot to any school student in India: the board examination. Very unfortunately, it does not seem to be something which is taken seriously by those who conduct it. They do not seem to consider it a valuable enough issue, to do this job conscientiously and transparently, as you will soon see.

Everything about those examinations, their evaluation and those answer-scripts are a secret. This post will make it clear in a moment why it is conveniently so. Almost 4 million students complete their schooling and their class 12 examinations. Till last year, these board examination scores mattered mostly for Commerce and Arts students looking for admission in colleges like the ones affiliated to Delhi University. Starting 2013, each and every mark will also matter to those who aspire to get into undergrad engineering and tech programs. Their national rankings will matter. Every 0.01 percentile point will count. 

About Me

I am Prashant Bhattacharji. I graduated from the Indian Institute of Technology, Kharagpur in 2006. I worked for a year at Lehman Brothers in Mumbai and then moved to the United States to work at the Microsoft Corp. Headquarters in Redmond. In 2010, I moved back to India, and have been working remotely for start-ups in the Data Science and Engineering space.My evenings and weekends have been spent in working on this portal, and also trying to analyze data related to education in India. The school leaving data I have obtained has shocking discrepancies. 

How I got this data

There's no super exciting story here. A regular student goes to the result portal, checks his result, and then goes on to check the results of fifty people before and after him or her. I just threw together a few lines of simple Ruby code to automatically send in numbers and note those results. And then, some data mining with a mix of Ruby and Python to generate aggregate statistics for CBSE, ICSE and ISC results, and plots of those scoring distributions.  Run a bunch of data mining and visualization scripts and hey! You get a bunch of secrets and stats like a rabbit jumping out of a magician's hat.

Please notice the shape of these ISC and CBSE Scoring curves below.



To give you an idea of approximately what they should look like:

Probability density function for the normal distribution

Okay, it will never really be so perfectly even and smooth, and depending on how hard the exam is, it will be somewhat skewed a bit to the left or right, but you get the point.

The boards might come up with an assertion that everyone's marks were rounded-up to something higher than what they actually were. That is hardly something good. Given, that people's relative standings are what determine their admission to a college, 
why should someone feel happy that he was given 3 extra marks, while his competitor was possibly given five.

True, people do try to fit their results to standardized curves. 
However, there seems to be no such standardized or expected curve or shape in either the ICSE or ISC or CBSE scores. Very few subjects have (presumably) not been tampered and do seem to resemble the expected bell or skewed-bell curve, but the implication of these curves is, that serious questions need to be asked about the credibility of the main examining school-boards in India.

I am releasing these plots because this kind of an analysis ought to be in the public domain, because from this year, these marks matter for admission to all reputed universities and engineering colleges.

(I initially downloaded these marks, not out of any interest in exam results, but just to get a practice data to "learn" stats and ML with. Unfortunately, the data was useless for that purpose. I made some quick observations and ignored the issue at the time.)

A year ago, I was having this email conversation with someone who was interested the patterns of school leaving marks. 

This was in a dataset of school leaving (ISC-2012) marks which I had analyzed on my site (ICSE and ISC 2012 and 2013 School Wise Result Analysis of Exam Results - the learning point ). 

1) The marks are discretized. You find people scoring 83, 86, 88 in a subject (all subjects). However, you don't find anyone scoring 81,82,84 in any subject. What kind of normalization is being done by the board ? Shouldn't this normalization formula be known to whoever uses these marks ?

This year, someone had the patience to give the issue the importance and coverage it deserved:
Hacking into the Indian Education System by Debarghya Das on On the Stepping Stone

1. AFAIK examiners do give out part marks in every exam. So the claim that certain marks aren't attainable is invalid.
2. Let's assume (giving them the benefit of doubt) that they do do some "rounding up of marks to certain marker points". So, if 30k students get 83 (and no one scores 81 and 82), let's flatten/smoothen out the distribution such that  81,82,83 map to 10k each, and draw the curves accordingly. Even then the curves are completey ridiculous as you will soon see.
3. The ONLY expected skew-bell curves in the plots shown below are those of the aggregate total (marked by ATE for ISC 2012) and the Best Four + English (marked as ISH for ISC-12). These are also the curve shapes which you expect for the subject wise scores.
4. The claim that marks are normalized to try to conform to a bell curve is clearly a lie. Almost all of these plots are as random as they can get. 
5. The analysis is based on approximately 139k ICSE and 65k ISC scores. Some records might have been absent in these, but they are quite close the the number of people who appeared for the board examination.
6. If one looks at the scores of 80 and above: 
In no subject in either class 10 or class 12 in 2012 or 2013, has anyone scored:
These are high-score ranges where every single mark could affect a cut-off.

7. CBSE-Class 12-2013 is a completely different story. The graphs have very random structures and completely unexpected shapes. A few of them like Computer Science do have the bell shaped curve which is expected. Some of them have very wierd triangular shapes with sudden spikes and jagged edges. The Physics, Mathematics and Chemistry scores resemble a terrible mix of triangular, spiky and somewhat-rounded sections. Please note that CBSE statistics have been generated on the basis of records for 7 lakh students (a total of 9.5 Lakh appeared for the examinations). However, it is enough to let us know that clearly CBSE also tampers its records.

8. Most examinations, have scores which end up forming a bell shaped curve. However, in a board examination, to account for varying levels of difficulty, and for the differences between the grading of  liberal and strict examiners, it is expected that some kind of normalization or moderation will be done. So that, the  final distribution of marks resembles a bell shaped or skew-bell shaped curves. In fact, some of the CBSE curves do show this behavior to a limited extent (such as their Computer Science scores). However, the kind of moderation which is being done, is NOT any kind of standardized normalization as you will see from the curves below. What is being done is, a very arbitrary inflation of everyone's scores. The reason for this is not hard to understand. Once a student gets higher marks than (s)he deserves, (s)he is unlikely to complain or ask for a re-evaluation. However, in this way, the utility of those inflated scores is very little. Because, the ranking among students becomes arbitrary. X's score was increased by 5. Y's score was increased by 7. Both have better marks than they deserved. But if this process was done without much thought, then X has actually been unfairly disadvantaged and doesn't have any real reason to be happy. 

Now, In layman's language, what makes one conclude that CBSE scores of several subjects have been tampered, or 'moderated' arbitrarily ? Basically, a usual distribution of scores, resembles a bell-shaped curve. It is not an erratic or jagged looking graph. Of course, it is not as smooth as a perfect bell curve because of the scoring blocks (2 mark questions vs 5 mark questions) but it does resemble that, by and large. Some of the CBSE marking for subjects like CS does seem to be decent and  resembles the bell shaped curve.  However,  many like science and mathematics are clearly not. The boards do some amount of moderation which is expected, however the moderation does some amount of proportional-increasing in the marks such that the frequency distribution resembles a new bell shaped curve. In the case of CBSE, you see very sudden spikes. Now, if all the people who got 85,86,87 are being rounded off to 90 (say), you are being unfair to the ranking of the person who got 87.

9. An ICSE teacher or official has dismissed the scoring pattern saying that moderation was done to fit a statistical distribution (bell curve). The curves we get to see, tell us that clearly this is NOT the case. 

10. What has started to matter in the engineering entrance (JEE) is not the raw marks, but the percentiles. So if everyone's scores have been inflated with a fair bit of arbitrariness, it means, their percentiles have been unfairly distorted, and just because everyone got extra marks, it doesn't mean that it will do them a lot of good.

Moral of the story? Arbitrary score inflation should not be allowed to pass off as moderation. A clearly prescribed normalization procedure should be publicly described by the board. I have analyzed and plotted Raw data (marks as they were)  and  Smoothed_data  using (2) which I mentioned above. 

Please note that the Smoothed data try to average out frequencies across missing numbers as I already explained. The non-smoothed curves are the plots of raw data marks.

And the fallout and reactions from board officials? Someone trying to justify the random moderation scheme: "They try to ensure the bell curve of the results does not look awkward. If it does, the implication is that the checking has been either too liberal or very strict.”

For ICSE-Class 10-2013, ISC-Class 12-2013, ISC-Class 12-2012 and CBSE-Class 12-2013

(These are links to clearer/hi-resolution plots if you're interested)

 ICSE - Class 10- 2013:

At a Glance:

Higher-Resolution and Larger Versions of the Plots:

 ISC- Class 12 - 2013:

At a Glance:

Higher-Resolution and Larger Versions of the Plots:
 ISC- Class 12- 2012:

At a Glance:

Higher-Resolution and Larger Versions of the Plots:
 CBSE - Class 12- 2013:








At a Glance:

Higher-Resolution and Larger Versions of the Plots: