Lateral thinking in LLMs
My previous blog 'Humans as a bottleneck' segues into this - how would LLMs perform in visual connect quizzes. For the un-initiated, these quizzes involve a set of pictures and the ask is to connect them i.e. find the common theme across them. That requires 2 things: knowledge of the pictures themselves and then interpreting them across multiple dimensions to find the common theme. Interesting there was a recent post by Ingram (a European RnD AI lab) on evaluating GPT-5 on this. Link here. They used ‘Only connect’ a British game show. The focus is on evaluating reasoning capabilities rather knowledge recall. Since it was tested on gpt-5; it’s likely that gpt-5 has already consumed ‘only connect’ answers which are freely available on the web as part of its training. What would be interesting to see how it performs on the most recent absolutely new set of questions. That is stil open as of now. The results also did not show a clear comparison between a average human score vs the LLM performance. It primarily stack ranked different LLMs in terms of accuracy of answers. Unsurprisingly, the more time the models reasoned for, the more correct their answers got - the additional reasoning time allowed it to think through along multiple dimensions -> it consumed more tokens that way as well.
This space of whether LLMs with their current transformer-based architecture can help us discover novel ideas through lateral thinking is till very nascent. It will be interesting to see how this pans out.
Some interesting links to check:
- Hackernews discussion on Ingram’s findings: https://news.ycombinator.com/item?id=44876205
- Connecting the Dots: Evaluating Abstract Reasoning Capabilities of LLMs Using the New York Times Connections Word Game: https://arxiv.org/html/2406.11012v1
- Solving NYTimes Connections with LLMs: https://wandb.ai/llm-finetuning-course/connections/reports/Solving-NYTimes-Connections-with-LLMs--Vmlldzo5MDc3ODcz
- Only Connect questions database: https://ocdb.cc/