Big Data, Automated Essay Scoring, and Student Diversity

I was extremely excited to draw some connections between two of my classes this week, T510S – Data Science in Education and T509 – Massive: The Future of Learning at Scale.  The key to this link: when technology gets big, how do you think about ensuring fair representation of diverse populations?  I will be first to admit that this connection seems a bit obscure, but stay with me here and I promise I’ll get you there.

First, lets talk about T510S.  We had a wonderful HGSE Ed.L.D guest speaker (digitally) join us to discuss ways in which big data has the potential to completely exclude or misrepresent diverse groups.  She provided some awesome examples including the City of Boston’s Street Bump app.  This innovative app runs on smartphones and collects “bump” data while the user drives to determine what streets need repair.  What a great way to collect data, right?!  Well, as discussed at the end of this article, the app quickly led to many street repair reports coming from wealthier neighborhoods in Boston while less privileged neighborhoods received less attention.  A general assumption was made in the implementation of this program:  Everyone in Boston had access to a smart device needed to make this data robust.  In reality, gaps in access literally led to gaps in service.  This problem has since been addressed, but I think it is an important case in the type of inclusive mindset that is too often ignored in innovation.

Our T510S guest shared other cases in Big Data discrimination and noted some overall trends.  As a society highly focused on collecting data and information, it’s important to keep in mind that different groups of people contribute to data in different ways and amounts.  We need to remember that the narrative that data tells has not been equally contributed to.  She left us with the advice to actively think about the stories that are missing from a data set as well as the ones that are present.

Given that T510S meets on Monday night and I have been habitually tackling my T509 readings on Tuesday afternoons, these ideas were still very present in my brain when reading about Automated Essay Scoring (AES).  While reading Justin Reich’s Ed Tech Researcher post and Shermin’s research supporting the validity of AES scores, I asked myself “What story is this data missing?”  I found my answer on page 27 of Shermin’s paper:  “An important aspect of system performance to evaluate before operational use is fairness – whether subgroups of interest are treated deferentially by the scoring

When looking at Shermin’s data collection (Table 1, page 31), I noticed that the demographics of captured by his data sets represented populations of either majorty white or close to 50% white students and at most 46% free or reduced lunch students.  Why does this matter?  AES programs are based on comparing unscored essays to a set of already scored essays and assigning scores based on features that are similar.  So what happens when the scored essays being used as benchmarks for the system represent students whose backgrounds do not necessarily match the backgrounds of students being graded?

As an urban educator, I worked at a school where 50% of students are hispanic, 44% of students are black, and 99% of students receive free and reduced lunch.  When thinking about my kids and the essays they could potentially write to be graded by an AES system, I worry about the potential discrepancy between their writing and the benchmarks a machine is comparing them to.  I started thinking about the role that factors like cultural differences in language use and English language learning could have on a machine’s perception of their writing.  How can we make sure that AES systems can account for those differences and not leave them out of the picture of what a “good essay” looks like.

I did a quick google search and came across this annoted bibliography from the Council of Writing Program Administrators on AES systems.  Section 4 of this document covers the role of diversity in AES systems.  I was relieved to see that my question about diversity is being questioned by many.  From a quick skim of the different studies, it seems as though results are fairly mixed but there is evidence suggesting differences in AES scoring across student subgroups.  I think it is so important that this research is continued and expanded on to ensure that AES systems can address these disparities before wide implementation.

In summary, I have learned this week that it is very easy for innovation to get lost in the “bigness” of all of the newest data and technologies and forget about the smaller impacts generalized solutions have on specific people.  In education in particular, we are working in a system with a strong history of implementing “solutions” that leave out large populations of students because they don’t behave like the “average.”  I’m definitely interested in looking more into what we can uncover if we continue to look at data and ideas  through a lens of “who is missing?”

One thought on “Big Data, Automated Essay Scoring, and Student Diversity

  1. Your point about the implication of culture in the context of writing is something I did not think about when I was reading the article on AES grading systems. Your blog made me think of my classroom in West Philadelphia and the impact an AES grading system would have in the diverse set of students I had and the context they came from. I think that writing is very fluid despite the structures (argumentative, others) present and cultural context can play a critical role. Thanks for bringing this to my radar.


Leave a Reply to Shama Cancel reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s