Big Data, Automated Essay Scoring, and Student Diversity

I was extremely excited to draw some connections between two of my classes this week, T510S – Data Science in Education and T509 – Massive: The Future of Learning at Scale.  The key to this link: when technology gets big, how do you think about ensuring fair representation of diverse populations?  I will be first to admit that this connection seems a bit obscure, but stay with me here and I promise I’ll get you there.

First, lets talk about T510S.  We had a wonderful HGSE Ed.L.D guest speaker (digitally) join us to discuss ways in which big data has the potential to completely exclude or misrepresent diverse groups.  She provided some awesome examples including the City of Boston’s Street Bump app.  This innovative app runs on smartphones and collects “bump” data while the user drives to determine what streets need repair.  What a great way to collect data, right?!  Well, as discussed at the end of this article, the app quickly led to many street repair reports coming from wealthier neighborhoods in Boston while less privileged neighborhoods received less attention.  A general assumption was made in the implementation of this program:  Everyone in Boston had access to a smart device needed to make this data robust.  In reality, gaps in access literally led to gaps in service.  This problem has since been addressed, but I think it is an important case in the type of inclusive mindset that is too often ignored in innovation.

Our T510S guest shared other cases in Big Data discrimination and noted some overall trends.  As a society highly focused on collecting data and information, it’s important to keep in mind that different groups of people contribute to data in different ways and amounts.  We need to remember that the narrative that data tells has not been equally contributed to.  She left us with the advice to actively think about the stories that are missing from a data set as well as the ones that are present.

Given that T510S meets on Monday night and I have been habitually tackling my T509 readings on Tuesday afternoons, these ideas were still very present in my brain when reading about Automated Essay Scoring (AES).  While reading Justin Reich’s Ed Tech Researcher post and Shermin’s research supporting the validity of AES scores, I asked myself “What story is this data missing?”  I found my answer on page 27 of Shermin’s paper:  “An important aspect of system performance to evaluate before operational use is fairness – whether subgroups of interest are treated deferentially by the scoring
methodology.”

When looking at Shermin’s data collection (Table 1, page 31), I noticed that the demographics of captured by his data sets represented populations of either majorty white or close to 50% white students and at most 46% free or reduced lunch students.  Why does this matter?  AES programs are based on comparing unscored essays to a set of already scored essays and assigning scores based on features that are similar.  So what happens when the scored essays being used as benchmarks for the system represent students whose backgrounds do not necessarily match the backgrounds of students being graded?

As an urban educator, I worked at a school where 50% of students are hispanic, 44% of students are black, and 99% of students receive free and reduced lunch.  When thinking about my kids and the essays they could potentially write to be graded by an AES system, I worry about the potential discrepancy between their writing and the benchmarks a machine is comparing them to.  I started thinking about the role that factors like cultural differences in language use and English language learning could have on a machine’s perception of their writing.  How can we make sure that AES systems can account for those differences and not leave them out of the picture of what a “good essay” looks like.

I did a quick google search and came across this annoted bibliography from the Council of Writing Program Administrators on AES systems.  Section 4 of this document covers the role of diversity in AES systems.  I was relieved to see that my question about diversity is being questioned by many.  From a quick skim of the different studies, it seems as though results are fairly mixed but there is evidence suggesting differences in AES scoring across student subgroups.  I think it is so important that this research is continued and expanded on to ensure that AES systems can address these disparities before wide implementation.

In summary, I have learned this week that it is very easy for innovation to get lost in the “bigness” of all of the newest data and technologies and forget about the smaller impacts generalized solutions have on specific people.  In education in particular, we are working in a system with a strong history of implementing “solutions” that leave out large populations of students because they don’t behave like the “average.”  I’m definitely interested in looking more into what we can uncover if we continue to look at data and ideas  through a lens of “who is missing?”

Advertisements

My Experiences with Khan Academy and IRT in the Classroom

First of all, I absolutely loved the readings, videos, and assignments for this week’s T509 class.  All of the content was all around really awesome stuff and full of fascinating ideas.

I could not pass up the opportunity to write about some of my experiences with Kahn Academy.  I am a huge fan of their system and have used it in my math classroom for the past three years.  I also tend to find myself spending an hour or two here or there practicing my own math skills as well (kind of really proud of my Profile).

To establish some context, I started out my career teaching a 9th grade math intervention course.  I was plopped in a computer lab with absolutely zero curriculum tools and given the most behind 9th graders and told to do something about it.  After some testing, I found that my students were on average at a 5th grade math level and needed a lot of support if they were to be successful in Algebra 1.  Kahn Academy was one of the first places I went to to find solutions to this problem.  Back then, the interface looked something like this:

Students could navigate the “knowledge map” at their own pace.  Blue squares represented mastered skills, green recommended, and orange was a previously mastered skill that the system believes needs to be reviewed.  From my understanding, this system was not grounded in IRT at all.  It simply displayed a curriculum map under the assumption that completing one skill meant that the student was ready to move on to the next and if they struggled with that next skill, then they should go back and review the skill that came before it.

As a teacher, I did like how students could visually see the map and have an understanding of how different math skills were related and be able to track their progress.  I did not like almost everything else.  First and foremost, students and I could never figure out where to start.  The default for my students, of course, was to start with the easy stuff.  Once things got hard, they would go back up to a different branch they had not been down yet and start with the easy stuff again.  It was extremely difficult to figure out where students should be working or even have any control over what they were working on.  The system was set up here to allow users to simply choose what they wanted to work on and go from there, not to diagnose what was appropriate for them as a learner.

At the start of my third year of teaching, Kahn Academy introduced their current IRT-based model.  After a few years of working my butt off to figure out where my students were and what they should be working on, I finally had a system to do that for me!  It was wonderful.  The first time students logged in, they got their pre-test and were instantly directed to practice skills within that were within a pretty accurate difficulty window for each student.  At first this system was a little unruly since ALL math exercises were clumped into one domain (there’s nothing like consoling a very upset overachieving 14-year-old who suddenly has an integral pop up on their screen), but eventually they split all of their exercises into separate domains based on grade level common core standards and the specific topics we see now.

So now, let’s talk a little bit about what works well and what does not work well about this IRT system in the classroom:

What I loved:

1.  Students ALWAYS had something to do.  The system never stopped recommending what it thought students should do next.

2.  The gamification elements work wonderfully.  Students love collecting badges, points, and unlocking new characters for their avatar (I loved it a little bit too…).

3.  Mastery Challenges are wonderful.  Students have to constantly demonstrate mastery of old skills and if they slip up a few times, the system is quick to re-recommend that practice session.

4.  There are a TON of teacher resources.  Seriously, they have been really thoughtful about how to best support teachers using this system in their classroom.  I was constantly finding new things to reinvest students and help them keep moving.

What drives me nuts:

1.  The tips are terrible.  They are based on simply giving the students a procedure to follow, not based in content at all.  Using a tip also instantly counts a question  as wrong (no difference between using 1 or all of them and have the answer given to you).  Students NEVER wanted to use tips and rarely found them helpful.  Same thing with the videos.  Although this system is good at placing kids, it kind of stinks at teaching them.  Luckily my students had a great teacher to keep them moving.  Although some of my students would login at home and play around, very few of those made much progress without a human coach to work with and go over things they had never seen before.

2.  Although the new domain system is wonderful for directing students what grade level/content they should be working on, the IRT system does not work to place students in different domains.  I had to try to do my own testing and evaluation system to try and place kids in appropriate grade level or skill domains.  Nothing breaks your heart more than telling a 9th grader that they should start off in the “Early Math” section.

3.  If the IRT system decided that students did not know how to do something, it took absolutely forever for my students to prove it otherwise.  Many students would get very frustrated about working through the same problems over and over again that they already knew how to do just because they slipped up once. Similarly, students who got lucky and got a correct answer from guessing would get very frustrated when they encountered topics completely outside of their skill set.  I now know that this comes from the fact that IRT functions under the assumption that a wrong answer means zero mastery and correct answers represent full mastery. Although this assumption seems small, I definitely saw if play out in fairly stressful ways for my students who somehow got misplaced.

4.  Even though there is a lot of data available for teachers, none of it can really be used to show actual student growth and progress on an absolute scale.  As student data is beginning to make its way into teacher evaluations, this is a killer.  Teachers need to be able to show student growth and Kahn Academy just does not currently have that capacity.  It is good at placing students in exercises and getting them working, but does not currently function at the capacity to assess where they are and how much they are growing.

Again, I love Kahn Academy.  It is one of those free education tools out there that I truly believe in.  The addition of IRT into their system has helped in a lot of ways, but definitely has not fixed everything.  While reading and learning about IRT this week, I can now see where those limitations come from, but not necessarily how to work around them.  In the search for the perfect “Teaching Machine,” I think it is important to remember the role of IRT is in placement and not automated instruction.

Connectivism: I don’t get it.

While reading a few articles on connectivism for tomorrow’s class, I have to admit that I just do not get it.

Essentially the idea is that learning has nothing to do with actually knowing anything, but building connections and capacity for learning in the future. The arguments (at their most extreme) go as far as to claim that education should be restructured so that students study only what they are interested in and go about that study through working with peers and taking advantage of open online resources. In a sense, teachers are no longer teaching content. Instead, they are teaching students how to access existing knowledge to facilitate their own learning and add their own voice to that body of knowledge.

Now, call me old-fashioned, but I think these ideas are a touch absurd. Yes, in theory everything sounds wonderful and jolly and so much fun. I am having a very hard time, however, imaging what such a system would actually look like. In the end, what are students actually learning? What does success in a connectivist classroom actually look like? What outcomes do we look for that show student learning is real? How does connectivisim provide students with real future opportunities in actual society?

Not only do these ideas seem to me to be unrealistic, but also biased. I find the insinuation here that everyone in society has the capacity to uncover their true interests and deeply study those interests by forming connections with similar minded people in their community and on the web. Yes, these ideas sound like they would work wonderfully in a community full of strong role models and resources for learning. We all know that such communities are far from universal. Not only that, but many students in today’s society lack many of the most basic skills needed to be able to access quality online learning resources and direct themselves in their own learning. The connectivist theories I have read about this afternoon seem to apply only to the most privileged in our society and leave the others behind.

In my opinion, content is crucial in learning. If our mission is for students to be able to grow into contributing members of society, there are large sets of knowledge and skills they need to possess. I agree that good teaching includes growing students’ capacity for learning, but I do not think that importance overshadows what content students actually understand and can apply.

Fellow T509ers, what are your thoughts? Am I missing something? Has anyone found resources outside of the readings (I even tackled some of the rabbit holes!) that may clarify my confusion? What do you think successful connectivist teaching really looks like?