How to Reduce Wrong Information in Large Language Models

Imagine stepping into a world where your digital assistant can write a poem, answer your math homework, and even paint a portrait for you, all at the command of your voice. Welcome to the fascinating realm of large language models (LLMs), super-smart computer programs trained on a treasure trove of text and images from the internet. (Think of LLMs as super-talented robots that read lots of books and websites to learn how to help us in all sorts of ways!)

These marvelous machines can do everything from answering your curious questions to crafting art. They’re becoming indispensable in fields as diverse as education, healthcare, and journalism. But hold on, they’re not perfect. Sometimes, they get a little too creative and make stuff up—something experts call “hallucination.” ( Imagine your robot friend saying something that sounds true but isn’t. Oops!)

While hallucinations might make for an interesting sci-fi plot, they can be problematic when we’re counting on these digital helpers for accurate and trustworthy information. This is especially concerning in crucial areas like your health or what’s happening worldwide. So, as cool as these intelligent machines are, it’s essential to remember that they have their hiccups, particularly when they veer into the realm of fiction instead of sticking to the facts.

Recently, a team of brilliant minds from Meta AI and ETH Zürich unveiled a game-changing approach to a long-standing challenge. In their captivating paper, “Chain-of-Verification Reduces Hallucination in Large Language Models,” released on the academic platform arXiv.org on September 20, 2023, they rolled out a revolutionary technique known as Chain-of-Verification, or CoVe for short. Spearheaded by a dream team of researchers—Shehzaad Dhuliawala, Mojtaba Komeili, Jing Xu, Roberta Raileanu, Xian Li, Asli Celikyilmaz, and Jason Weston—this research could reshape how we think about artificial intelligence.

In this research, they tackle the issue of “hallucination” in large language models (imagine a computer program that sometimes makes things up when it talks to you, like a friend who tells tall tales). The Chain-of-Verification method ensures that these computer programs can fact-check themselves, keeping the output reliable and accurate. Think of it like a detective inside the computer, making sure every piece of information is truthful before sharing it with you!

The CoVe method is like a smart four-step recipe that the LLM (Large Language Model, which is like a big computer brain trained to talk like us) follows to give the best answers. First, the LLM writes a “draft” to answer a question or complete a task. It’s like the LLM’s first try to get it right. Then, to make sure everything is accurate, the LLM makes up some “verification questions”. These questions are like a mini-quiz to test if what’s in the draft is true. So, it’s like the LLM is doing its homework and then checking it twice to make sure it’s just right!

Next, the LLM answers these verification questions all by itself, without peeking at its first draft or looking anywhere else for clues. This is important because it makes sure the quiz is fair and only based on what the LLM knows.

Lastly, the LLM goes back and fixes its first draft using the answers from its mini-quiz. This way, it can correct any mistakes or make-believe stuff that might have slipped into the first answer.

It’s like the LLM writes a first draft, gives itself a quiz to double-check, and makes the answer even better! All to ensure that you get the most accurate and helpful information.

The researchers showed that using CoVe helps cut down on mistakes, like imagining things that aren’t true, in different kinds of tasks. For example, when asked which countries are next to France, an LLM without CoVe wrongly said Monaco and Andorra. But with CoVe, it fixed this mistake and left these countries out of the answer.

The people who made CoVe also compared it with older methods designed to make LLMs less prone to mistakes. They argued that CoVe is special because it doesn’t need to look at other sources or ask people for help; it can just rely on its own “thinking” to check facts. This makes CoVe flexible and easy to adapt to lots of different questions and tasks.

CoVe is like a built-in “fact-checker” for the LLM, making sure it gives you the most accurate and reliable answers, all by itself!

The research paper puts CoVe to the test by measuring how well it does using different measures. These measures, or “metrics,” include things like accuracy (how often it’s right), precision (how many of the facts it makes up are true), recall (how many facts it includes), F1-score (a mix of precision and recall), and ROUGE-L (how well its text matches what it should be).

For example, when answering questions from Wikidata, CoVe boosts accuracy from 59% to 82%, precision from 66% to 91%, recall from 64% to 85%, F1-score from 65% to 88%, and ROUGE-L from 69% to 86%. When answering questions from MultiSpanQA, the improvements are from 18% to 38% in accuracy, from 30% to 54% in precision, from 23% to 43% in recall, from 26% to 48% in F1-score, and from 33% to 54% in ROUGE-L. When writing biographies, CoVe lifts accuracy from 55% to 72%, precision from 62% to 80%, recall from 60% to 77%, F1-score from 61% to 78%, and ROUGE-L from 67% to 81%.

The research paper is a big deal in the world of natural language processing and artificial intelligence. It shows that we can make LLMs, like our big computer brains, smarter and more trustworthy by having them double-check their work. This is good news for many areas where LLMs help us, like writing text or creating images.

But the paper is also honest about some hiccups with CoVe. For example, CoVe might struggle with tricky or unclear questions and answers. It may also miss tiny mistakes or inconsistencies in the first draft. Sometimes, it might not know how to even start checking certain types of questions or tasks. And in trying to make things better, CoVe might accidentally add new mistakes or biases.

So, the paper says we must keep studying CoVe and other ways to improve LLMs and more reliable. It also says we should be more open about how these computer brains are made and used.

If you want to read the full paper, you can find it here.

https://arxiv.org/pdf/2309.11495.pdf

Our vision is to lead the way in the age of Artificial Intelligence, fostering innovation through cutting-edge research and modern solutions. 

Quick Links
Contact

Phone:
+92 51 8912223

Email:
info@neurog.ai