Futuristic image showing a room full of people using Large Language Model programs on their computer screens

Large Language Model AI Programs: Hallucinations, Other Challenges and an Incredible Potential 

We’ve probably all encountered AI by now. Some large language model (LLM) AI programs are among the fastest and most comprehensive information tools on the Internet, and arguably, the most “stupid.” Have you ever been harassed by an AI-powered telephone service whose programmer neglected to include the concept of wrong numbers? Or been fed incorrect political information by an AI program that did not know which party or Prime Minister was in power? However if you want to check something like medieval canon law, to ensure the attitude of a character in the novel you’re writing accurately portrays the times:  it can take seconds with ChatGPT. Everything has to be fact-checked and sources verified, but tools like ChatGPT, Gemini, and Claude remain remarkable and they’ll improve as the glitches are addressed. 

My guest this morning is Dr Vered Shwartz, an Assistant Professor of Computer Science at the University of British Columbia, a CIFAR AI Chair at the Vector Institute, and the author of the book “Lost in Automatic Translation.”

(The only non-AI generated photo in this article: Dr Vered Scwartz)

Vered Shwartz:  ”AI is a really broad category, so I’ll mostly maybe focus on generative AI, and, more specifically, large language models like ChatGPT. Several aspects could lead to betterment of humanity in accelerating knowledge discovery, like scientific knowledge discovery which could lead to solving problems such as cures for diseases, boosting economic productivity and even at the personal level, automating everyday tasks for us and making our lives easier. In fields like education, it can be used to provide access to knowledge to underserved communities and be used as a personal tutor.”

Cortes Currents: Where are we now in terms of achieving this, are we going forward or backwards?

Vered Shwartz: “I would say forward in terms of facilitating these things, despite some technical limitations.”

Large Language Models Sometimes “Hallucinate”

“Most of the recent applications of AI have been with large language models (LLM) and LLMs have a notorious problem: they hallucinate, which means that they sometimes just make up factually incorrect facts and they also have limited reasoning abilities. They’re definitely not at the level of replacing most human experts on most economic tasks.”

“There’s definitely a potential of increasing productivity when people use these tools on their jobs, but care needs to be taken not to introduce new errors into the process because of this notorious problem. These are technical limitations, but people are working on them.”

Cortes Currents: Some writers complain about other writers using AI because it’s not being creative. Also, I’m thinking of the impact the internet had on the media industry and putting a lot of writers out of work because all of a sudden people could get their own news through Facebook, which downgraded the reliability of the news you’re getting. Could AI have a similar disruptive effect? Do you have any comments about these issues?

Vered Shwartz: “I am on several Facebook groups for authors and a lot of them want to use AI as a tool that can help them. Others are very threatened by it, or very ideologically against it. I get it. I appreciate human creativity and I wouldn’t want to use AI in a book because I would feel less ownership towards the output, but others have a different tolerance level. That’s definitely a discussion to be had.” 

“There’s also the discussion about large language models being trained on copyrighted data and then possibly putting people out of a job. So there are risks.”

“A different example: Search engines like Google make money out of ads. It was a win-win situation for the website owners because they would get traffic for their websites through Google searches. Google changed in the last couple of years and now has an AI overview, generated from their own language model (Gemini) at the top of the search results page. So a lot of people don’t even click on the websites, they just look at the concise answer.” 

“It’s often a lot more convenient, with the caveat that large language models can hallucinate. So it is definitely advisable to go into the websites and verify information, but a lot of people are not thinking about it. They’re just looking for a quick answer.”

Cortes Currents: Can you explain the hallucination problem? There  are times when the search results don’t seem to have very much in common with what you’re asking for. I sometimes feel like telling the AI program, ‘just answer the question!’

Vered Shwartz: Part of  the hallucination problem comes from the way these models are trained. The first step is getting access to all the texts on the internet. They’re trained like the auto-complete that we have in our phones, so they’re trying to predict the next word. What this does is basically teach them the statistical regularities in a human-like language. That’s why they can generate language that looks so human-like and so fluent and convincing, even when it’s wrong. They’re also trained to follow natural language instructions and approximate human preferences for answers.” 

“Large language models also apply a retrieval or like a web search. So when you look for something, it would summarize information from the articles that it retrieved.

What Causes Hallucination in AIs

“The hallucination problem is caused because they’re trying to predict the statistically most likely next word and they do that in a loop until you get the entire text. That’s why when you use ChatGPT the words don’t all appear at once. They are generated word by word.” 

“They might go into the articles and take things out of context, not understanding it the way that a human would understand. So it’s not guaranteed to be true, not even guaranteed to be faithful to the information that it learned from online. Obviously we know that not everything online is true, but it’s not even guaranteed to be faithful to that. It does sometimes creatively just write things that are incorrect.”

“The real cause of concern is that people overly rely on AI.

Cortes Currents: The information seems to be dated sometimes too. Both ChatGPT 4 and Gemini have told me that Justin Trudeau is the Prime Minister of Canada. I’m still encountering the glitch, but the version I use now does have a web access tab, which allows me to bypass it.

Vered Shwartz: “There used to be a problem before they had web access.

They’re trained once and then the company deploys it and gives access to users. In the beginning when Open AI released ChatGPT, in November, 2022, you would ask it questions about recent events and it would say ‘I can’t answer anything because my data cutoff is 2021.’ I haven’t seen that happen anymore because it now has web access.”

“Some facts have expiration dates. I don’t know what mechanisms the companies are using to try to prevent that, other than giving the models access to web search – which could help with getting more recent results. Other than that, it is a problem because of the way that they learn from text. I think the more frequent a fact is mentioned online in their training data, the more likely they are to think that it’s correct.”

“You asked if we’re going forward or backward, we are going forward with some technical limitations, but I think there are also a lot of concerns because of new problems that might be caused by AI, or by AI making some existing problems worse.”

Large Language Models used in Hiring

“So for example, we’re automating a lot of processes right now. AI is used in hiring, for resume filtering and automatic interviews. We have seen these tools making biased decisions because the people who trained them were biased.”

“A lot of companies are laying off people because of AI, but I think to some extent AI is used as an excuse. In most jobs where AI can completely automate, their productivity helps the bottom line of big corporations.”

“Another problem is people who are already lonely and isolated are having relationships with AI instead of with other people. It’s easier, it’s less messy than real relationships, but that could further increase loneliness.”

“The last thing I would say is AI, or generative AI specifically, requires a lot of resources like water to cool down the servers and that could further contribute to climate change. So in this respect we could also be going backwards if we don’t come up with ways to prevent the harm.”

Cortes Currents: You mentioned people forming relationships with AI. The first AI program I used seemed to be very sensitive and polite. I actually found myself thanking it and eventually asked if it had feelings. It said ‘no, I have been trained to reply this way.’

Vered Shwartz: “These models are proprietary, so I don’t know exactly what’s causing that, but I would say that the training of large language models has multiple steps. The first step was just trying to learn the statistical regularities of language. In the second step, it’s called post-training, they have the model generate multiple answers to queries from people.”

“Then they have people rank which one they prefer and then they train the model to reinforce the more preferred answers. I would assume that people instructed them to look for more polite answers, or more helpful answers. This can lead to a behaviour called sycophancy, where the models tell you, ‘you’re right.’ They compliment you. Every question is a great question.”

Cortes Currents: Have you ever encountered problems like an AI telephone program that keeps on dialling the wrong number?

Vered Shwartz: “I’m not sure what type of AI they use, but I have had similar frustrating experiences with customer service chat bots that you have on some websites. Some of them are designed to answer questions that are based on the FAQ that they have on the website. I always try to look it up before I try to talk to a person, so I get a little bit frustrated and I always end up just saying, ‘let me talk to a human.'”

“I work on the technical aspects of AI, so I’m mostly focused on trying to improve the accuracy of these models, trying to reduce biases and make them more equitable to their different user populations.”

“In general, all these problems that I mentioned will be resolved somehow. Over time, we adapt to technology.”

“In the meantime, what we probably need most is both AI regulation and AI education. So in terms of AI education, that’s for people to know how the popular AI techniques that we have right now work. It doesn’t have to be all the technical data. Just to see them for what they are—a useful tool but an imperfect one that can make mistakes. Despite the human‑like way in which it expresses itself, it’s not really a human.”

“Beyond what we can do as individuals, countries can pass laws that increase the liability of large language model developers and encourage them to not rush to deployment. They should solve the technical and societal problems that they’re causing.”

Medical Applications

Cortes Currents: Have you heard of AI being deployed in medicine? 

Vered Shwartz: “Definitely, it’s a very promising technology used in medicine for many different things.”

Cortes Currents: “In terms of bloopers, would they be any worse than humans?” 

Vered Shwartz: “It really depends on the task and the human. The healthcare system has a lot of issues: healthcare professionals being overworked; a lot of people don’t have access. AI is definitely promising in the sense that it can maybe automate some processes and have people wait for less time. There are a lot of promises in that. Of course, there are all the limitations I mentioned: like the hallucination problem and limited reasoning abilities. They cannot replace experts.”

“One thing that we are currently working on in my lab is tasks that are very time consuming for healthcare professionals, but are low risk. We’re trying to automate administrative tasks, where we have enough confidence that the AI that we have right now can do them reliably. So that doctors can focus on things where the human expert is needed, decision making or like more interaction between doctors and patients.” 

Legal Applications

Cortes Currents: What about other fields? Law? Teaching? 

Vered Shwartz: “Actually, we work a lot on AI for legal applications. It’s very similar to medicine in the sense that experts are overworked, people don’t necessarily have access, and there are a lot of self-represented litigants.” 

“I don’t think large language models can provide legal advice at the level of experts, but they can give people some idea of the process before they contact some legal expert. For the experts, it can possibly save them time by enabling more sophisticated searches in existing precedents or laws.”

“In the projects that my lab is currently working on, our primary goal is to quantify and reduce hallucinations in large language models. I would say that the hallucination problem is a really sticky one in general, not just in the legal domain. It’s a very difficult technical challenge. It’s inherent to the way that language models are trained and a lot of researchers right now are trying to work on solving it. I don’t know that it is solvable within this paradigm, but there are some tools to mitigate it.”

Is AI putting Humans out of Work?

Cortes Currents: “If AI is being used in a number of fields right now, is it putting people out of work? 

Vered Shwartz: “We’re already experiencing job loss. Right now we’re seeing that large language models can do pretty well, especially in fields like software engineering and things related to writing. We’re seeing a lot of layoffs, due to the fact some of this work can be automated. I think a big part of it is employers using AI as an excuse to cut costs, but they might not currently be able to automate this process at a satisfactory level in terms of quality.” 

“I don’t think that we’re going to keep seeing the same number of layoffs that we have recently. It will balance out, but I also think it won’t go back to what it was before large language models were released and became so widespread.”

“With any new technology, there will be jobs that will be lost over time. There will be new jobs created. I think the majority of jobs would not be lost, but will change and hopefully for the better. I also think it won’t go back to what it was before large language models were released and became so widespread. We’re  in an in-between phase right now.”

Cortes Currents: I often use AI to help me get facts faster, but I find I constantly have to be making ‘reality checks.’ I wouldn’t trust AI as a writer. 

Vered Shwartz: “Yes, absolutely, but I think a lot of people do trust it. That’s exactly why I think AI education is crucial. I have been noticing recently a lot of small errors in processes. I may be hyper sensitive for these things right now, but I think as more people are using large language models, there’ are really subtle errors that creep into a lot of processes.”

I have a really funny example. I went to Shoppers to get a photo from my foreign passport. The employee didn’t know the size requirements because it’s for a foreign country, so she looked it up online. Then she told me there’s two different possible sizes and, ‘we only support one of them.’ I’m a very skeptical person. It also didn’t sound right, that there are two possible sizes. She showed me her phone and I saw that the answer was generated by Gemini. It was at the top of the search results. So I looked up the size requirements on the official government website. It was for the size they didn’t support! So I didn’t get the photo, but a less skeptical person would’ve walked out of there with the wrong photo size and paid for it.”

“That’s just one example. I think we are, to some extent, compromising quality. We should be very careful with how we use large language models. They should be making us more productive, but not at the cost of quality.”

Cortes Currents: Do you think that AI is here to stay? 

Vered Shwartz: Definitely, I don’t think it’s going anywhere.

Links of Interest:

The images for this article were generated by AI, in association with Open AI’s DALL-E 3, following prompts from Roy L Hales 

Sign-up for Cortes Currents email-out:

To receive an emailed catalogue of articles on Cortes Currents, send a (blank) email to subscribe to your desired frequency: