Six researchers at Apple, Iman Mirzadeh, Keivan Alizadeh, Hooman Shahrokhi, Oncel Tuzel, Samy Bengio, and Mehrdad Farajtabar, have found ample evidence that Large Language Models are incapable of logical reasoning:
Our findings support the hypothesis that current LLMs are not capable of performing formal mathematical reasoning
LLMs pattern match and do so at a scale that surpasses any single human brain's own innate ability to pattern match. This energy-intensive parlour trick has done a pretty good job of passing for intelligence across a range of domains and legitimate tests aimed at measuring cognitive ability. While these models may be able to fool us into thinking they can think, there is an abundance of evidence demonstrating that they lack the capacity to think and reason.
This impressive ability to pattern match at immense scale simultaneously serves as the Achilles' heel of modern LLMs. The massive data sets these models have been trained on has not and, indeed, cannot be vetted for accuracy by domain experts in any reasonable human-scale timeline. A significant amount of noise exists in the training data and the signal-to-noise ratio is liable to get much worse given the rate at which LLM-generated text, imagery, and video are flooding the pool of human knowledge. Feeding ever-bigger models more of this data is not the path forward. The next big breakthrough in silicon mimicking human intelligence will come from discrete, highly-specialized convolutional neural networks running alongside infallible logic systems (like calculators) via which a silicon-based prefrontal cortex orchestrates and carries out tasks.