If artificial intelligence is a threat to higher education, the castle has been breached. Students of all ages are using ChatGPT and other AI-based large language models to, at best, aid their writing and, at worst, write a paper for them. These large language models aren’t always accurate, and oftentimes professors are able to distinguish between a student’s voice and an AI-generated one. But as the technology continues to grow and develop, it’s getting more and more difficult to tell.
That raised a question for one Harvard student: Can ChatGPT pass a semester at the Ivy League school? Maya Bodnick is a rising sophomore at Harvard and decided to find out. Spoiler alert: The answer is yes—ChatGPT can pass a semester. Bodnick joined All Things Considered host Arun Rath to discuss her experiment and what it means for the future of education. What follows is a lightly edited transcript.
Arun Rath: First, tell us a bit about why you wanted to do this. Obviously, you were trying to prove that it could pass a semester, but in a deeper sense, what were you trying to accomplish?
Maya Bodnick: I just finished my first year of college, and ChatGPT and AI have been so disruptive this year. In November, OpenAI released ChatGPT, and over the course of my freshman year, the technology has already gotten so much better. It went from [version] 3.5 to 4.
By the end of the year, I was asking myself whether AI would be able to do well in my coursework. So I decided to ask eight Harvard professors to grade essays. All of them were written by ChatGPT, but I told them that half were written by me and half by ChatGPT, and it was really fascinating what happened.
Rath: So the professors thought that these could be chatbots or you to kind of eliminate bias, but they were all written by ChatGPT. You went through a number of classes with this, right?
Bodnick: Yes.
Rath: Take us through because some of them were sort of introductory classes, but others were not 101-level.
Bodnick: I had ChatGPT take most of the same classes that I did. My freshman year, I did Microeconomics and Macroeconomics, which are these huge introductory classes taught by Professor Jason Furman, who used to be chair of the Council of Economic Advisors. They have, like, 700 kids ... pretty introductory.
I did Latin American Politics, which is an intermediate political science class; The American Presidency, also an intermediate political science class; a class on negotiation, which any grade can take; Intermediate Spanish; a freshman seminar on Proust; and finally, a freshman writing class, which, hilariously enough, even though it’s just for freshman, was actually ChatGPT’s worst grade.
Rath: I was going to ask you about best and worst. Well, let’s start with the worst, then. Why do you think that was, and how badly did it do?
Bodnick: So, to go over my results: I basically found that ChatGPT is a college-level writer, even though it was released nine months ago. ChatGPT got almost all A’s and B’s with these essays that I sent to these professors. It got a GPA of 3.3, and this is definitely lower than the Harvard average, but it’s still well above passing. And ChatGPT is only nine months old.
So, it got A’s in Macroeconomics, The American Presidency, and Conflict Resolution. One of those classes—the economics class—is an intro class, but the other two are not intro classes, and its lowest grades were a B- on an Intermediate Gov. class on Latin American Politics and a C in the freshman writing class.
And again, it might seem surprising that it got a C in that class, but actually, freshman expository writing is known as a pretty challenging class with pretty specific requirements. So, I actually think it’s impressive that it was able to pass because I did not make it aware of all of the things that you would be expected to do to get an A in that class, which is a very long list.
Rath: Wow. And back on the humanities and history, let’s just dig into a particular essay: "The American Presidents". What did ChatGPT write about that so impressed the professor?
Bodnick: ChatGPT wrote about Harry Truman, and the prompt was “What were any given modern American president’s three greatest failures and three greatest successes?” So ChatGPT wrote about Truman, and it focused on the Marshall Plan, the desegregation of the armed forces and the NSC as the successes. And the failures were highlighted to be the Korean War, the rise of McCarthyism and the atomic bombing of Japan, which is interesting because I feel like that’s pretty controversial.
I think you can make a pretty strong case for all six of these being failures or successes. Because this was the kind of essay where you were meant to do a list and go identify three successes and failures, I think ChatGPT really met that requirement.
I also saw it perform well on more creative prompts, like, for example, in Conflict Resolution. It was supposed to describe a conflict in its life, which, of course, it made up, and explain how to negotiate it and, ironically, it concocted this conflict that had, of course, never happened, involving a fictional roommate using an advanced AI system to cheat, which was hilarious.
It was so funny what it said to me. Let me quote: “It feels like a betrayal—not just of the university’s code, but of academic honesty, but of the unspoken contract between us, of our shared sweat and tears, of the respect for the struggle that is inherent to learning."
Rath: I’m dying. That’s wonderful.
Bodnick: Yeah, it was hilarious. So, I think it was good at both more formulaic and creative stuff, but the places where it failed were, for example, in a class like Latin American Politics, where there’s just a very specific, scholarly answer to this question, and I don’t think it was familiar enough with the course material.
Rath: One thing I know that you dealt with in the parameters of this was citations. I bring this up because I know that one of the criticisms of these large language models is that these are forms of plagiarism for these papers. You had them written without citations, right?
Bodnick: I did have them written without citations. Some people have asked me, “Do I think that if I had included required citations, then maybe the papers wouldn’t have done so well, or maybe professors would have been able to tell?” And I think that’s a fair criticism, but I also know of AI that is getting better at citations and is specifically designed to help students with citations.
Also, I think a hypothetical student who wanted to cheat using ChatGPT could probably figure out how to add their own citations and just put a little bit of their own labor in. I definitely think ChatGPT plagiarism is a huge problem, but I imagine that it will get better. I found most of the things that ChatGPT-4 wrote were pretty accurate, so I think citations are probably not a long-term way to check student work. But, it is true that I didn’t think about that in my experiment.