My first week as a research intern has been quite relaxed so far. As one might expect, this first week consisted more so of orienting the interns (there are 10 of us) with the campus, our projects, and available resources.
I’ve been assigned a project called MOOC Visualization and Analytics. MOOCs, short for Massive Open Online Courses, are exactly what their name describes them to be: educational courses offered to anyone with access to the internet. I’ve also been assigned a partner to work on this project with. His name is Jason, and he’s a rising senior at Carnegie Melon University studying Statistics with a minor in Business.
Specifically, our project is aiming to analyze the user data of a course called Statistics in Medicine, offered by Stanford. An enduring problem associated with MOOCs, is that they have extremely high attrition rates, meaning a high percentage of users do not end up completing the course. The goal of our project is to effectively mine and analyze data from server logs of the students who accessed this course, in hopes of utilizing machine learning to predict student grades and their risk of dropping the course.
This all sounds pretty technical and exciting, and I’m sure the work Jason and I will produce over the next 9 weeks can be all of that. But for now, I’ve really been spending a lot of time orienting myself with the NumPy package and pandas library to quickly organize clean data so that it’s ready for analysis (I’m coding in Python by the way). I’ve never actually used NumPy or pandas before, so there’s a learning curve I’m in the process of overcoming. All I’ve gotten so far is being able to pull all data relevant to a single user by specifying their ID number.
I’ve also been evaluating what this data means, and how we can interpret it as objectively as possible. I really enjoy coming up with metrics to assess data so I’m excited for the coming work! The next few days will likely consist of me becoming a more agile user of NumPy and pandas to manage data frames, as well as documenting how we might plan to break down the data. We’ve also been assigned a few papers on related research to give us a better idea on how to approach this project.
It will probably be a while before Jason and I will even be doing any real machine learning, let alone understanding what it really means, along with its nuances and many involved components. However, I really do prefer getting a good grasp of what our data entails and learning more what it means to do machine learning before getting ourselves knee-deep in a subject we both have little experience in.
We were also encouraged to share in our first REU post what we think it means to do research. I’ve linked here the short reflection that I’ve written, which is my broad take on research.
But until next time, keep it real.