Psycholinguistic experiments reveal that efficiency of human language use is founded on predictions at both syntactic and lexical levels. Previous models of human prediction exploiting LLMs have used an information theoretic measure called surprisal, with success on naturalistic text in a wide variety of languages, but under-performance on challenging text such as garden path sentences. This paper introduces a novel framework that combines the lexical predictions of an LLM with the syntactic structures provided by a dependency parser. The framework is based on sheaf theory, developed to study the differences between local and global consistencies in data and which gives rise to an incompatibility fraction. When tested our framework using this fraction on two garden path datasets. The results correlated well with human reading times, distinguished between easy and hard garden path, and outperformed surprisal. These findings could pave the way for LLMs that are not only more precise but also more intuitive in understanding language at a human level, transforming how we interact with AI technologies.
Back to Workshop III: Naturalistic Approaches to Artificial Intelligence