On a Harvard business review podcast Last week, Socher said we can improve large language models by forcing them to respond to certain code prompts.
Currently, large language models simply predict the next token, given the previous set of tokens, Socher said – tokens being the smallest units of data with meaning in AI systems.So even though LLMs demonstrate impressive reading comprehension and coding skills and can ace tough exams, AI models still tend to hallucinate – a phenomenon where they convincingly spew factual errors as the truth.
And it’s especially problematic when they’re asked complex math questions, Socher said.
He gave an example that a large linguistic model might fumble over: “If I gave a baby $5,000 at birth to invest in a no-load stock index fund, and I assume a certain percentage of the average annual returns , how much will he be at the age of two. at five o’clock?”
A large language model, he said, would simply start generating text based on similar questions it had been exposed to in the past. He doesn’t actually say, “Well, this requires me to think very carefully and do some real calculations and then give the answer,” he explained.
But if you can force the model to translate that question into computer code and generate an answer based on the result of that code, you’re more likely to get an accurate answer, he said.
Socher did not provide details on the process, but said that on You.com they were able to translate the questions into Python. Generally speaking, the programming will give them a lot more fuel for the next few years in terms of what they can do, he added.
Socher’s comments come as the growing list of large language models struggles to outsmart OpenAI’s GPT-4. Gemini, “Google’s most powerful AI model to date, barely outperforms GPT-4 in important criteria like the MMLU, one of the most popular methods for assessing the knowledge and problem-solving skills of AI models. And while the favored approach has simply been to scale these models in terms of the data and computing power provided to them, Socher suggests that this approach could lead to a dead end.
There is only so much additional data the model can train on, he said.