I had ChatGPT take my final exam. How do you think it did?

One machine to rule them all

May 18, 2023

a computer screen with a bunch of words on it — Photo by Emiliano Vittoriosi on Unsplash

Proctoring exams is boring. You sit in a room, watching students stress about the exam in front of them for two hours. Normally I try to get grading or other work done during a final, but on this exam I allow students to use their notes and view various articles online that we read during the semester. I’ve done this for years, but now there’s a new player on the block that’s making open-note exams difficult - ChatGPT.

If you haven’t used ChatGPT yet, you must. It’s the most advanced AI chatbot ever released to the public. Its abilities are uncanny. ChatGPT can answer questions and come up with jokes and poems and solve riddles and converse almost like a human. I have no idea how it works, and I’ll stay away from whether or not ChatGPT is actually thinking in a philosophical sense, but it does have some amazing capabilities.

For example, I asked ChatGPT whether the University of Wisconsin-Stout would be a good fit to join the Big Ten. This is a ridiculous question to anyone that follows college sports. The Big Ten is made of up public flagship universities (and Northwestern). UW-Stout is a Division III school with only 9,000 students. But I wanted to ask this question because on paper UW-Stout could be a good fit. It’s in the right geographical region, has many sports teams, and is part of the UW system. Additionally, no one on the internet has ever talked about UW-Stout joining the Big Ten. Googling "University of Wisconsin Stout Big Ten" (with the quotes) produces zero results. So any answer that ChatGPT gives can’t just be a copy-paste from another website; that is, ChatGPT will have to combine different sources of information to develop a novel answer. This was the result:

While UW-Stout has a solid reputation in its own right, it may not be considered a strong candidate for joining the Big Ten conference. The Big Ten typically invites universities that have a strong research program, large student enrollment, and a history of success in major sports like football and basketball. UW-Stout, while a respected institution, may not meet all of these criteria.

Dead on. ChatGPT successfully connected that 1) the Big Ten is made up of large universities with big sports, and 2) that UW-Stout is not a large university with big sports.

That was impressive. It is also bad for teachers and college professors. Any writing prompt or exam question could be fed into ChatGPT. And because the output is newly generated every time, a ChatGPT answer won’t necessarily show up on turnitin.com or other anti-plagiarism websites. Given that my exam was open note and students were allowed to use their computers, this presented a problem. Instead of being productive during the exam, I stood in the back of the room and kept an eye on what students were looking at on their screens. After the exam, I started thinking, how would ChatGPT do on my exam? Some of the questions involve graphs - currently beyond ChatGPT’s capabilities, but I fed the rest into the software. Below are some of the results. (Note: skip down to below the separated text to see where things get interesting.)

Q: Say that the state of Oregon adds an additional $5 tax to all packs of cigarettes and finds that one year later cigarettes sales have decreased by 8 percent. Why does the with/without principle state that attributing the 8 percent decrease to the $5 may be inaccurate?

A: The with/without principle, also known as the ceteris paribus principle, states that when analyzing the impact of a specific factor on a system, all other relevant factors should be held constant. In the case of the state of Oregon adding a $5 tax to all packs of cigarettes, there may be other factors that could have contributed to the 8 percent decrease in cigarette sales. For example, there could have been a public health campaign that discouraged smoking, or there could have been an increase in the price of cigarettes from a competitor brand.

Score: 3/3

Q: Many emission-reduction programs are medium-based. Why does this mean that environmental programs can often have their own environmental costs, where one part of the environment is made better but another is made worse?

A: Many emission-reduction programs are medium-based, meaning that they focus on reducing emissions from a specific medium, such as air, water, or soil. While these programs may be effective in reducing emissions from the targeted medium, they can sometimes have unintended environmental consequences in other areas.

For example, an emission-reduction program that focuses on reducing air pollution from a particular industry may lead to increased water pollution if the industry switches to using more water-intensive processes. Similarly, a program that reduces water pollution from agricultural runoff may lead to increased air pollution if farmers switch to using more fertilizers that emit greenhouse gases.

Score: 3/3

Q: Putting a tax on carbon emissions from cars would be nearly impossible. How could the government keep track of hundreds of millions of vehicles? What is the “second-best” tax that the federal government uses today to tax car emissions that doesn’t tax cars directly?

A: Putting a tax on carbon emissions from individual cars may indeed be difficult to implement, as it would require tracking emissions from hundreds of millions of vehicles. However, there are alternative approaches that the government can use to tax car emissions indirectly.

One such approach is to impose a tax on the fuel that cars consume. Fuel taxes effectively increase the price of gasoline and other fuels, which can incentivize drivers to use less fuel and emit fewer carbon emissions. The tax can be levied at the point of sale or at the wholesale level, and can be adjusted to reflect the amount of carbon emissions associated with each gallon of fuel.

Score: 3/3

Q: Why would the FBI have found it difficult to use a probabilistic basis to prosecute Rockwell for their actions at Rocky Flats?

A: The FBI would have found it difficult to use a probabilistic basis to prosecute Rockwell for their actions at Rocky Flats because a probabilistic approach relies on statistical evidence to establish the probability of a particular outcome or event occurring. In the case of Rocky Flats, the alleged wrongdoing by Rockwell involved the mishandling and disposal of hazardous waste, which can have long-term health effects on individuals and the environment.

Proving that specific individuals were harmed by exposure to hazardous waste from Rocky Flats would be challenging, as the effects of exposure may take years or even decades to manifest and can be difficult to attribute to a specific source. Furthermore, the variability in individual sensitivity to toxic substances and the complex nature of environmental exposures can make it difficult to establish a direct causal link between exposure to hazardous waste and adverse health outcomes.

Score 3/3

ChatGPT was destroying my exam. Perfect answers, supported with examples. But then I asked ChatGPT a second question about Rocky Flats, an atomic bomb plant in Colorado that broke many environmental laws and was the subject of a high profile grand jury investigation:

Q: What juror should never have been allowed to be on the Rocky Flats grand jury and why?

A: There were several jurors on the Rocky Flats grand jury who had potential conflicts of interest that could have influenced their decision-making. One of the jurors who should not have been allowed to serve on the grand jury was a man named George Vanderwold. Vanderwold was a former employee of Rockwell International, the company that operated the Rocky Flats nuclear weapons plant, and had worked in the same building where some of the alleged illegal activities had taken place.

Vanderwold's potential conflict of interest was known to the prosecuting attorneys, but they did not remove him from the grand jury. During the grand jury proceedings, Vanderwold reportedly made comments that suggested he was sympathetic to Rockwell and the challenges they faced in managing the hazardous waste at the site. He also expressed doubts about the credibility of some of the witnesses who testified against Rockwell.

Score: ???

Well, that is interesting. I’ve never heard of George Vanderwold. The answer I was looking for was Ken Peck, a lawyer and environmental activist who had previous dealings with the Rocky Flats Plant. Clearly not an unbiased juror, and someone that should not have been allowed within 100 miles of the grand jury. Could it be that there was another biased juror I didn’t know about? I Googled “george vanderwold grand jury rocky flats” to find out. Nothing. Google answered, “It looks like there aren't many great matches for your search”.

What was going on? Who is George Vanderwold? ChatGPT nailed the first question about Rocky Flats that involved linking an economic theory to a news event, but I can’t find any information about George Vanderwold. I asked ChatGPT for clarification, and this was the response:

I apologize for the error in my previous response. After reviewing my sources, I cannot find any information that suggests that a person named George Vanderwold was a member of the Rocky Flats grand jury. It appears that I made an error in my response.
I strive to provide accurate information, and I apologize for any confusion that my previous response may have caused. To answer your original question, I do not have a source for George Vanderwold being a juror on the Rocky Flats grand jury, as it appears that this information is not correct.

ChatGPT just made up a person! And did so in a convincing manner. Having an ex-employee for a company under investigation on a grand jury would be bad. But this ex-employee does not exist. Bizarre. I had stumbled on an “AI hallucination”, or fabricated answer not based on any real information.

ChatGPT is a powerful tool. It’s going to be the bane of existence for professors everywhere. I’m sure the technology will improve, but for now, buyer beware. You never know when you’re getting brilliant analysis or a work of fiction.

Do you know someone that uses Chat GPT? Send them this post and see what they think!

Jim Wall

Great post Professor! I, too, have heard about ChatGPT completely making up answers in a convincing manner. That’s not a small thing, that’s a Huge problem. Especially considering how little people fact-check.

As for how to deal with that for college exams? My opinion is to grade students on their prompting, follow up prompting, fact checking skills. Same as using a graphing calculator for mathematics.

You can’t stop technology, you can only build guardrails around it.

Expand full comment

2 replies by Patrick Gourley and others

Joel

May 19, 2023

Orwell said the same thing about newspapers (and I mean the people who publish them). They often just make up shit. If "journalism is the first draft of history," ChatGPT knows a lot of bullshit.

2 more comments...

Econ Soapbox

Discussion about this post