U of T study uses algorithms to predict how word meanings change over time
Using the Historical Thesaurus of English as a dataset, which dates back nearly 1,000 years to the period of Old English, researchers at the University of California, Berkeley, Lehigh University and the 含羞草传媒 have developed an algorithm to demonstrate how words evolve.
鈥淚f you look at the history of a word, meanings of that word tend to shift or extend over time,鈥 says Yang Xu, an assistant professor in the department of computer science and University College鈥檚 cognitive science program. 鈥淭he question is: Why is this happening? How is it happening? And whether there are computational algorithms we can leverage to make predictions about the historical development of word meanings.鈥
in the Proceedings of the National Academy of Sciences (PNAS).
As Xu explains, the word 鈥渇ace鈥 was used to reference a body part. It then extended to include senses of facial expressions, such as 鈥渟miley face鈥 or 鈥渇unny face鈥. Later, the meaning covers novel senses such as the surface of an object, 鈥渇ace of a table鈥 or 鈥渇ace of a cube鈥. But it doesn鈥檛 end there. 鈥淔ace鈥 can be used in remote context, such as 鈥渇ace danger鈥 or 鈥渇ace risks鈥 鈥 resulting in a web or network of meanings.
鈥淭he [algorithm鈥檚] prediction is that a word should connect closely to related meanings in the space available 鈥 similar to finding nearest neighbours in semantic space 鈥 resulting in a chain that efficiently links novel meanings to the existing meanings of a word.鈥
鈥淸What] we didn't know from the past, is how this chaining process can be implemented computationally and tested at a broad scale.鈥
Xu says an ongoing study will explore the basis of chaining across a diverse array of languages, to see whether it can explain some recurring patterns, like why do many languages use the same word to describe 鈥渇ire鈥 and 鈥渇lame鈥, and to leverage current digital resources to predict word usage over time. The work could have further implications in the area of natural language processing, training computers to understand novel word usage accurately.
鈥淎 potential research direction is the machine interpretation of novel word usages such as those in non-literal expressions,鈥 says Xu. 鈥淚f I say 'grasp鈥 I can refer to 鈥榞rasping an object鈥 versus 鈥榞rasping an idea鈥. Humans understand this usage fairly quickly, even though it appears novel. Understanding these phenomena would require the development of computational tools that would go beyond the algorithms of chaining.鈥
U of T鈥檚 undergraduate , sponsored by University College, merges cross-disciplinary studies in computer science, linguistics, philosophy, psychology and neuroscience, and challenges students on questions of the human mind and its application to machines. Xu, whose earlier graduate work looked at machine learning applications in cognitive neuroscience, is the first joint research appointment, strengthening ties between computer and cognitive sciences.
He joins core faculty Ana P茅rez-Leroux, professor of Spanish and linguistics and the program鈥檚 director; John Vervaeke, a lecturer in the department of psychology, and Assistant Professor James John of the department of philosophy. Xu plans to teach a course on data science in the cognitive sciences.
Future research questions for Xu include how children learn language.
鈥淚'm curious about the parallels between historical language change and child language acquisition,鈥 says Xu. 鈥淔or example, a child might use 鈥榖us鈥 to refer to a lot of things that move on the road, so I can imagine some sort of chaining mechanism going on there.鈥
鈥淲hether that's necessarily the same as chaining observed in historical language change, we don't know. But I think it opens up new questions that we can explore.鈥