Thursday, August 21, 2014

A Chart is not worth 1,000 words

Let’s say someone presents the following graph as “definitive proof that there is no connection between spending and achievement.”  Would you believe them?  Would you look at the lines on the chart and agree that the horizontal lines showing NAEP performance and the lines moving up the page showing per-pupil spending are enough to conclude the two things are not related?

After reading this blog post, I am hoping your answer will be “no.”


In his book “How to Lie with Charts,” Gerald Everett Jones states the following:

“All charts and graphs are forms of data reduction, or summary.  A summary gives an overall picture, or general shape, of the underlying detail, or source data.  So, a summary can be useful for highlighting a valid trend or can be misleading by obscuring exceptional results that might be significant.”

My goal with this blog post is to use the graph presented above to explore some issues around how data is presented, and what to consider when viewing data presented visually in this manner. Let’s start by talking about the data elements presented.

Jones says, “Inadvertently leaving off helpful labels is the quickest way to perjure yourself.  Be sure to document sources of information as chart notes.”  The chart above is guilty of having some pretty vague source notes that leave a lot of questions.

For NAEP, the chart presents 4th grade reading scores.  Why was this particular statistic chosen? From the NAEP site, these are some of the possible statistics available that could have been used along with the Kansas values for 2005 and 2013:

Grade Subject 2005 2013
Average Score 4th Reading 220 223
Average Score 4th Math 246 246
Average Score 8th Reading 284 290
Average Score 8th Math 267 267
Percent at Basic or Above 4th Reading 88 89
Percent at Basic or Above 4th Math 66 71
Percent at Basic or Above 8th Reading 77 79
Percent at Basic or Above 8th Math 78 78
Percent at Proficient or Above 4th Reading 47 48
Percent at Proficient or Above 4th Math 32 38
Percent at Proficient or Above 8th Reading 34 40
Percent at Proficient or Above 8th Math 35 36

As can be seen, some of these comparisons show no change (4th math scores, 8th math scores, 8th math percent basic); some show slight improvement (4th reading percent basic, 4th reading percent proficient, 8th math percent proficient); and some show larger improvement (4th reading scores, 8th reading scores, 4th math percent basic, 8th reading percent basic, 4th math percent proficient, 8th reading percent proficient).

In addition, there is the option of looking at state rankings based on the above, plus the choice of looking at all students, free and reduced-price lunch eligible students, free and reduced-price lunch ineligible students, and a variety of other divisions.

But if the goal is to show overall achievement changes, what if you calculated a composite score across the subjects and grades? Correlation analysis suggests that performance on these two subjects across two grade levels are highly related, as the following table demonstrates.  

Correlations for Combined and Individual Assessments, 
Percent Basic & Proficient, All Students, NSLP Eligible and Ineligible

NSLP Eligible
NSLP Eligible
4th Math 0.95 0.96 0.93 0.94
4th Reading 0.94 0.94 0.9 0.87
8th Math 0.96 0.96 0.93 0.92
8th Reading 0.96 0.95 0.92 0.9

Though some “noise” is introduced into the comparisons when multiple measures are averaged in this fashion, the highly correlated nature of the measures being combined suggests the impact of this noise should be minimal.

So what do the composite scores look like?

Grade Subject 2005 2013
Average Composite Composite 254 257
Percent at Basic or Above Composite Composite 77 79
Percent at Proficient or Above Composite Composite 37 40

This data would suggest that over time, outcomes as represented by NAEP scores have improved for Kansas students between 2005 and 2013.  

The point being that there are many ways to look at NAEP scores, and using different statistics yields apparently different interpretations of whether outcomes improved across time or not.  To take an exam of one subject at one grade level and to say it represents overall student achievement is called "overgeneralization."

Plus, NAEP is not the only measure for outcomes.  There are other assessments, such as those administered by the state, ACT, and SAT.  Then there are graduation rates, percent of students needing college remediation, and many of other measures of student outcomes.

Next the chart shows “Per Pupil Spending,” both in actual dollars and adjusted for inflation.  

There is no indication of what data element from KSDE’s data was used to represent “per pupil spending” on this chart.  The notes at the bottom of the chart indicate “KPERS added to total aid for 1993 to 2003.”  So, “per pupil spending” is represented by “total aid,” but again there is no definition given for “total aid.”  Further, why was KPERS added for these years and not the others?  What would the lines have looked like if this had not been added?  What other financial data might have been included in this figure?  

Investigation shows that starting in 2004, a law was passed to pass KPERS amounts through school district budgets. This is a reasonable justification for adding KPERS values to the earlier years, but additional research was required before that fact was apparent.

The chart gives no explanation for how dollars were adjusted for inflation.  Given the fact that the two lines converge at 2013, we can assume that the line is supposed to represent the amounts in 2013 dollars, but what calculation for inflation was used?  The Consumer Price Index, or some other calculation?  Was it a national or regional index figure?  

So again, there are a lot of variables that could be presented to represent spending compared to outcome data, and the choice of these variables impacts the story that is told.

Moving on from here, let’s talk about the chart itself.  

Looking at the horizontal, or “y” access, we see that the scale is supposed to represent years. Are these calendar, fiscal, or school years?  Knowing how the data is reported, I can tell you the data is provided in terms of the fiscal year for the financial data and school year for the NAEP scores. The fiscal and school years both typically run from July 1st to June 30th, so it is reasonable to look at them both on the same horizontal scale.

However, note that the values are as follows:  ‘98, ‘02, ‘03, ‘05, ‘07, ‘09, ‘11, ‘13.  There are four years between the first two values, one year between that and the next, then two years between each of the rest. But the values are presented evenly spaced along the horizontal axis.  This distorts the actual trends because you are looking at longitudinal data on an inconsistent scale.

Plus, though NAEP data is only available for every other year, the financial data is available for every year in the chart.  Were values for each year used to draw the line for the financial data, or only values from the years where NAEP data was available?  

As for the vertical scale, the biggest issue is that two different "y" axes are presented on the same graph. Jones indicates in his book that there is nothing inherently dishonest about presenting data this way, but you have to be cautious about the scales you use for each axes to ensure you are comparing accurately.  

One possible alternative to presenting data with multiple "y" axes would be to report both NAEP scores and funding in terms of percent change from the previous period.  This would allow for all four lines to follow the same "y" axis.  

Otherwise, you must be very cautious about the scales used.  Jones says:

“The impression given by an xy or radar chart can be changed dramatically by manipulating the scales of its axes.  You can flatten or exaggerate the spikes in a curve by scaling axis values up or down, or by expanding or contracting the range of axis values.”

The NAEP scores are calculated on a scale of 0 to 500, as our example scale shows.  However, when you look at the average scores across states, you see that the range of actual scores is pretty restricted.  For example, in 2013 the range of average 4th grade reading scores by state is 206 to 232.  Restricting the scale to the minimum and maximum observed scores would have shown more dramatically the increase in scores over time for Kansas, and also shown that the Kansas average score was in the upper half of states’ average scores.

The financial data is typically shown on a scale from 0 to somewhere just above the maximum value, so the example chart’s presentation of the per pupil funding is appropriate in this regard.

In conclusion, I hope this blog post has given you some things to think about the next time you are presented with a chart and told it shows you “the truth.”  Make sure you look for the questions that need to be asked and ask them.

In terms of the actual data being presented, KASB will soon be releasing an analysis of the relationship between spending and outcomes in Kansas, and we will do our best to present the data as accurately and completely as we can.  In the meantime, I wanted to share a chart KASB prepared last year that reviews similar data to the example provided above.  As you can see, the picture it paints is very different from the one at the beginning of this post.

No comments:

Post a Comment