Monday, October 20, 2014

Correlation, Prediction, and Causation

In an article published in Sunday’s Capital Journal, Celia Llopis-Jepsen reports that three well-known researchers in the field of educational outcomes reviewed a report published by KASB about the relationship of funding and outcomes, and they came to the conclusion that the report commits a common falsehood in social research; that it claims causation while only demonstrating correlation.

When Celia and I spoke last week, and she let me know what the researchers’ reaction to the report had been, I was disappointed. I had tried to be careful and avoid language that implied causation, because I knew that the statistics I had used for the analysis; correlation and simple linear regression, were not able to demonstrate a causal relationship between one variable and another. At most I was trying to demonstrate that higher education funding predicts higher student outcomes. Looking at the report now, I suspect that phrases such as “has an impact on” and “accounts for” are what led to this criticism.

Based on this feedback, I thought it was worthwhile to describe the differences, as I see them, between correlation, prediction, and causation.

Correlation analysis yields statistics that indicate the amount to which two variables move together.

Picture I-70 between Topeka and Lawrence at about 4:00 p.m. on a weekday afternoon. There are several cars headed eastbound, all at approximately the same speed and in the same direction. You can say that cars’ motion is correlated; they are moving together. But there is no prediction or causation between them. You cannot assert that any one of these cars exiting from the highway can predict or be shown to cause the exiting of any other cars. They are all moving in the same direction based on different causes; each driver has an independent and unrelated reason for traveling on that highway in that direction at that time.

Now picture a small fishing stream. We see two fish swimming in the same direction downstream. You can say that the fish's movement is correlated because they are moving together, but you cannot say that the movement of either predicts or causes the other one. You could, however, say that they are both traveling downstream based on the same cause; the current of the stream itself. It could also be possible that they are both traveling downstream for the same reason; they may both be seeking food or heading to some other common location, but you cannot, based on your observations, predict or assert anything about one's movements based on the movements of the other except to say that they seem to be moving in the same direction.

Regression analysis yields statistics that indicate the amount to which the value of one variable predicts the value of another.

The distinction between correlation and regression is difficult to describe, and despite the fact that I have a master's degree in educational psychology and have spent many an hour knee-deep in statistical analysis much more complex, I find it hard to explain the difference in ways that make sense to those who do not spend much time with statistics.

In short, correlation indicates the amount to which two variables move together. Regression indicates the extent to which changes in one variable (the independent variable) can predict changes in the other (the dependent variable). The very language used in identifying the variables is confusing because of how it implies causation, when the statistics themselves are not offering proof of causation.

In one of my statistics classes years ago, the instructor tried to explain the difference between prediction and causation via this example. He/she indicated there are insurance companies that will charge people higher premiums for red cars than for cars of any other color. They do so based on analysis showing that red cars are more likely to be involved in motor vehicle accidents than non-red cars.

Does this mean they think the color of the car actually causes more accidents? Of course not. But they acknowledge the research shows red cars are in more accidents, and that is enough for them to justify charging higher premiums for red cars.

Psychologists would probably argue that people who tend to be more adventurous and reckless are more likely to buy red cars, and it is this tendency that causes both the car color choice and the higher instance of accidents. But regardless of the underlying causation, they can argue based on the research that car color (the independent variable) predicts the likelihood of accidents (the dependent variable).

Taking an example from my own personal life, I remember driving through Kansas on the way home from a weekend trip with friends. One of my friends looked out into the cow pasture and asserted, “It is going to rain.”  The rest of us in the car asked her what the heck she was talking about.  She pointed at a group of cows, all lying on the ground, and said, “It’s going to rain. The cows are lying down.”  I looked up and saw that there wasn't a cloud in the sky, and was certain she was either making this up or she was crazy.

A couple of hours later after we had all made it to our respective homes, we experienced torrential downpours that none of us could have predicted based on the beautiful blue sky. But the one friend (whom the group later labeled “The Great Cow Oracle”) had been able to accurately predict the rain based on the behavior of the cows. I’ve told this story to others in the years since, and they have explained that cows sense barometric pressure changes or something else that tells them rain is coming, so they lie down in groups in order to rest up for the storm, where they will need to remain standing so as not to get too cold on the wet ground. Again, the cows lying down (the independent variable) does not cause the rain (the dependent variable), but it is a reliable predictor of it.

Causation is causation

Causation should be a pretty clear concept to most people.  In the Capital-Journal article I mentioned at the beginning of this post, I am quoted as saying, “At the level of analysis we’ve done, it definitely does not show causation...  In social research, the best we can hope for are indicators.”

I went on to discuss with Celia what most psychology and sociology classes will teach you; that in social research, it is very difficult to control or manipulate variables in order to find evidence of causation, and to do so in many cases would be unethical. We are unable to treat the real world as a controlled experiment, where we take two seemingly equal groups and manipulate the nature of a certain variable differently for each group to see what the outcomes are.

Instead, we are limited to looking at the data as it exists, and applying statistical methods that will allow us to determine what variables are significant predictors of which other variables. And though researchers have developed complex and sophisticated statistical means of doing this, in the end I believe the most we can do is identify those things which most effectively predict the outcomes we seek, and focus our energies in making improvements based on the relationships that seem to exist between them.

In the case of the article in question, myself and the Kansas Association of School Boards have a vested interest in determining what will increase the positive outcomes for students and schools in Kansas, and work to let folks know what we find and what steps might be effective in making improvements. That was my goal with this report, and it is unfortunate that others doubt the sincerity of that goal. I will strive to ensure that future articles much more explicitly indicate the absence of causal evidence and that they illustrate why it is still pertinent to pay attention to the predictors.

For more discussion on cause and effect in social research, take a look at the following links:

Tuesday, October 14, 2014

Manager Ratios

There is a lot of discussion in Kansas these days about the number of administrators versus the number of teachers and other instructional staff in the public school system.  Some argue that we are spending too much money on non-instructional staff, and that resources should be shifted to have more folks interacting with students and less folks sitting behind desks in offices.

John Heim, KASB's Executive Director, raised the question of what kind of ratios between managers and non-management employees exist in other industries, and how these ratios compare to the ratios between educational administration and instructional staff.

Looking to national numbers, the Bureau of Labor Statistics' "May 2013 National Occupational Employment and Wage Estimates" for the US gives us information on the number of management and non-management employees that we can use to compare.

Overall,  the data indicates there are 132,588,810 total employees included in the data, and 6,542,950 total management employees.  Doing the math, that means there are 126,045,860 non-management employees, for an overall national ratio of approximately 19 employees per manager.

The dataset does not draw direct connections between the manager and non-manager job classifications it provides, but working from the titles, we can determine ratios for some of the major categories included, as shown in the following table:

As can be seen, this is quite a range; from as few as 11 employees per manager to as many as 88.  Further analysis and potentially discussion with the Department of Labor and the Bureau of Labor statistics would be needed to come up with meaningful and consistent manager to employee ratios, but the table above at least illustrates that there is a wide variation in the number of managers needed to supervise employees, and it further suggests that the nature of the work being done determines (at least in part) the number of supervisors (or administrators) necessary.  

Comparing this to what we see in Kansas, the ratio of administrators (superintendents, associate/assistant superintendents, principals, assistant principals, directors/supervisors of special education, health, and vocational education, instructional coordinators/supervisors, and other directors/supervisors) to instructional staff (practical arts/vocational education teachers, special education teachers, pre-kindergarten teachers, kindergarten teachers, and other teachers) was approximately 11 instructional staff member to every educational administrator in 2013-14.  

However, what about all of the staff that are neither instructional staff nor administrators?  What about the library media specialists, school counselors, clinical/school psychologists, nurses, speech pathologists, audiologists, social workers, administrative assistants, food service workers, coaching assistants, security officers, clerical staff, and others?  Do they not also require supervision?  If we add them to our equation, we find the ratio of administrators to all staff was approximately 16 staff per administrator in 2013-14.

What can we conclude from all of this?  The main thing I would conclude is that more research and evaluation is needed to determine how the management needs and approaches for public schools are the same and how they are different than the management needs and approaches in different industries.  That being said, we can also conclude that Kansas public school administrators are responsible for, on average, a slightly lower number of employees than the national average across industries for manager to employee ratios.

The following graph and table show the number of Kansas Public School Employees (by FTE) classified as Supervisors/ Managers/ Administrators, instructional staff, and other staff.  


Next, we’ll talk specifically about the ratio of managers to teachers; both nationally and in Kansas.  The Bureau’s estimates show the following information related to educators and their managers:

This means that at the primary and secondary levels, the national ratio of teachers to administrators is about 15 to 1.  In Kansas, if we look at the ratio of administrators to instructional staff, we find a ratio of approximately 11 to 1 for 2013-14.  This ratio increased gradually from 2001-02 through 2011-12, then declined gradually from then through 2013-14; as the following graph illustrates.  

What can we take from this comparison?  I would say two main things.  

The first is that if we assume the national numbers only include instructional staff, then Kansas has fewer teachers per administrator than the national average, but if you instead assume the national numbers include all staff, then Kansas has on average more staff per administrator than the national average.  Further analysis of the definitions used by the Department of Labor suggests that more than just certified teachers are included in their counts, however it is not clear if their counts include all of the job categories we have included in the “All Staff” group for Kansas.

The second is that, despite claims made by some that the number of non-instructional staff in Kansas schools has been growing at an inflated rate in recent years; the ratio of these management positions to the non-management positions has remained substantially consistent over the past decade.  

Monday, October 6, 2014

Are we gaining or losing teachers?

Recently, a local district board member asked me, "What about the Governor's claim that we have 680 more educators in Kansas than when he was elected?  The Kansas Center for Economic growth claims that we have lost 650 teachers."

Let's look at the Governor's statement first, which comes from a flier passed out during a campaign event:

Governor Brownback was elected in the fall of 2010 and took office in January of 2011, so looking at the data on certified staff reported to KASB by its members, we find the following:

Based on this information, it is true that we have more educators than when the Governor was elected.  However, this does not speak to the fact that in the two years prior (from 2008-09 to 2010-11), we lost over 1,200 educators.

Note that the first sentence of the statement is referring to "educators."  Comparing the numbers above with the total number of students enrolled in public schools as reported by KSDE, we get this:

This shows that not only did the ratio of students to educator increase in the years when the number of educators decreased, but it also increased between the 2012-13 and 2013-14 school years; showing that the percent increase in educators was lower than the percent increase in students.  

But this ratio is not the same as the one quoted by the Governor's material; 15.1 to 1.  That is partially because the second sentence of the claim refers to "teachers" instead of "educators."  Here is the table of possible educator categories as collected by KASB, along with the FTEs for 2009-10 through 2013-14:

And, their corresponding ratios:

So, if we take just the five staff categories that have the word "teacher" in them, we come up with the following ratios:

This ratio is still below 15.1, and further, is not consistent over time as the quote suggests. However, it is important to note that we are using a different data set here than the one cited by Brownback's campaign, and as noted in a previous blog post, the labels used for similar staff positions vary greatly from district to district and the actual differences may not be captured by the broad reporting categories used by KSDE.

As for the Kansas Center for Economic Growth, in their "Quality at Risk" report, they state the following;

Looking strictly at the numbers for the 2008-09 and 2013-14 school years, we see the following:

This indicates that the KASB staff data, paired with the KSDE student data, supports the statement.  However, just looking at the endpoints over time does not show the entire story of what happened during that time. The following graph shows the change by year during this time:

Note that, though the number of teachers decreased each year from 2008-09 through 2011-12, it has actually been increasing since 2011-12 at roughly the same rate as the number of students.

So, to answer the board member's question, I would say both statements are true, and the apparent contradiction is based on the differences in the points in time, definitions, and data sources upon which they are based.