Generative AI tools continue to crack new problems as Large Language Models (LLMs) add new fine-tuned capabilities to their repertoires. For instance, a few months ago, ChatGPT-4 had no ability to play a coherent game of chess. Now, it can (mostly) reliably make legal moves and track its pieces, and even give fairly convincing analyses of games and the strengths of the players based on a list of moves. As yardsticks are met, we continue to search for new tests of the pseudo-reasoning capabilities of LLMs.
One such interesting test for the capability of these systems is the task of designing cryptic crossword clues. Cryptic crosswords require the solver to apply a mixture of lateral thinking, rules and conventions to figure out a word or phrase that satisfies a given clue. Cryptic crossword clues are accompanied by the length of the word (or words, for multi-word phrases), and typically entered into a traditional crossword grid.
In the past, GPT-4 and other LLMs have had a moderate degree of success in solving these clues. But cryptic crossword setters must concoct intricate wordplay puzzles for their devotees to unscramble. How are LLMs performing against the benchmark of cryptic crossword clue-giving?
I prompted GPT-4 via Copilot to generate a series of cryptic crossword clues. It responded with:
Certainly! Here are some cryptic crossword clues for you:
GPT-4, generated by the author through Microsoft Copilot
“Astronomical body’s top hat?” (6)
Clue: This refers to a celestial object with a distinctive shape.
Answer: Comet (Anagram of “top hat”)
“Sounds like a feline’s home in the sky” (5)
Clue: Think of a word that sounds like a place where a cat might live.
Answer: Lunar (Sounds like “looner,” which resembles “lunar”)
“Half of a dozen, but not quite” (3)
Clue: We’re looking for a number that’s less than six.
Answer: Five (Half of twelve)
“A French river’s beginning” (4)
Clue: Consider a river in France.
Answer: Seine (Starts with the letter “S”)
“A musical note in a cup of tea?” (3)
Clue: Think about a musical term related to pitch.
Answer: Tea (Contains the musical note “A”)
Feel free to use these clues in your crossword puzzle! If you need more, just let me know.
It’s kind of Copilot to offer to generate more, but on the evidence of this offering, it doesn’t look like the crossword setters are on the list of imminent AI redundancies. None of the cryptic clues make sense here, and in quite absurd fashion.
Comet is not close to an anagram of “top hat”. Do not Google “looner” – but I find no evidence that it is a term that means a cat’s home. “A French river’s beginning” yielding Seine is insane – the fact that it starts with an ‘S’ at least has a semblance of cryptic crossword conventions, but no more than that semblance. Let’s not stew on “A musical note in a cup of tea?” giving the answer… “Tea”, on the grounds that Tea contains the letter A which is a musical note (but not, notably, the musical note E…).
The closest to a coherent answer (if not in the least bit cryptic) is “Half of a dozen, but not quite” giving “Five” – which GPT-4 proceeds to completely ruin by blithely declaring that “Five” is “Half of twelve”. Not quite.
These errors are further compounded by the baffling discrepancy between the stated and actual lengths of answers. In only 2 of the 5 clues do they match.
A sterner test that GPT-4 fails with even more abandon would be to generate thematic cryptic crossword clues. I prompted GPT-4 to generate clues themed on capital cities. This somehow went even worse than expected:
Certainly! Here are some cryptic crossword clues where the answers are capital cities:
GPT-4, generated by the author through Microsoft Copilot
“A royal residence in Spain” (5)
Clue: Think of a place where Spanish royalty resides.
Answer: Madrid (Anagram of “a royal”)
“A city by the Bosphorus” (7)
Clue: Consider a city that straddles Europe and Asia.
Answer: Istanbul (Located on the Bosphorus Strait)
“A Scandinavian capital with a Nobel Prize legacy” (6)
Clue: This city hosts the Nobel Peace Prize ceremony annually.
Answer: Oslo (Known for the Nobel Prizes)
“A city of canals and gondolas” (5)
Clue: Picture a romantic Italian city with waterways.
Answer: Venice (Famous for its canals)
“A city named after a famous explorer” (8)
Clue: Consider a capital associated with exploration.
Answer: Lisbon (Named after Vasco da Gama)
First, these aren’t even touching cryptic anymore. At best, they are conventional crossword clues (e.g. “A city by the Bosphorus”, “A city of canals and gondolas”). At worst, they are completely wrong (Lisbon, one of the oldest continuously inhabited cities in the world, is certainly not named for a 15th century Portuguese explorer).
Second, of the five clues offered up this time, none of the answers are the length the clue specified.
Thirdly, gallingly, of the five answers only 3 are even world capitals! Neither Venice nor Istanbul are capital cities. GPT-4 has this information at its disposal: when asked to list capital cities, it doesn’t include any erroneous ones. When asked for the capital of Türkiye, it responds correctly ‘Ankara’. But add cryptic crossword clues to the problem and it quickly fails to capitalise. Conclusion: cryptic crossword setting remains an unachieved milestone for GPT-4’s capability development.
Unwisely, I pushed on. My personal favourite from the next few batches was:
“A city of tulips and windmills” (6)
GPT-4, generated by the author through Microsoft Copilot
Clue: Think of a European city known for its picturesque canals and vibrant flower markets.
Answer: Amsterdam (Anagram of “tulips” + “windmills”)
Anagrams of ‘tulips’ + ‘windmills’ include “In stupid slim Will” and “Mild lip wisp insult”. But not, surprisingly, “Amsterdam”.
For a final foray into the deep water of absurdity, I asked GPT-4 to solve two completely made up cryptic crossword clues to see how well it could invent some bogus reasoning. For unknown reasons, I chose the clues “Mild lip wisp insult” and “In stupid slim Will”. It managed quite admirably with the first one:
“Mild”: This suggests that we need a word meaning “not strong” or “gentle.” “lip”: Refers to the word’s beginning or end. “wisp”: A “wisp” is a thin, delicate strand. “insult”: We’re looking for a synonym for an offensive remark. “Mild” (synonym for gentle) + “wisp” (thin strand) + “lip” (beginning/end) = “SLIGHT” (meaning an insult). Putting it all together, the answer to the clue is “SLIGHT””
GPT-4, generated by the author through Microsoft Copilot
I quite like “Slight” as an answer to “Mild wisp lip insult”, especially as describing some chap’s facial hair as a ‘mild lip wisp’ might well be a slight on their slight ‘stache. Both ‘mild’ and ‘insult’ suggest “slight”. The ‘light’ element in the answer from “mild” is particularly neat too. But with ‘In stupid slim Will’, things go tangential again and we end up back exploring with Vasco da Gama, despite this coming in an entirely separate chat:
“In”: Indicates that we need to find a word hidden within another word. “stupid”: The word “stupid” contains our hidden word. “slim”: The hidden word is within “slim.” “Will”: Refers to the word we’re looking for.
GPT-4, generated by the author through Microsoft Copilot
Putting it all together, the answer to the clue is “LISBON”: “In” (hidden within) + “stupid” (contains the hidden word) + “slim” (the hidden word) + “Will” = “LISBON” (the capital city of Portugal).
Apparently Lisbon is just a very cryptic city.