A team of researchers with Apple came out with a paper recently, introducing a new AI model called ReALM (Reference Resolution As Language Modeling) which is designed to aid voice assistants like Siri by providing context and understanding ambiguous references.
ReaLM is capable of understanding more domain-specific questions and can “substantially outperform” OpenAI's GPT-3.5 and GPT-4 LLMs (large language models). It seeks to improve existing AI models, making them faster, smarter, and more efficient.
The paper sought to demonstrate "how LLMs can be used to create an extremely effective system to resolve references of various types, by showing how reference resolution can be converted into a language modeling problem, despite involving forms of entities like those on screen that are not traditionally conducive to being reduced to a text-only modality."
ReALM functions by reconstructing the screen visually, labeling each entity and location on screen to offer context clues for user requests to voice assistants like Siri.
By converting contextual information into text, ReALM simplifies the process for language models, making it more efficient and less resource-intensive, potentially leading to faster and more accurate responses from voice assistants like Siri.
“ReaLM an ideal choice for a practical reference resolution system that can exist on-device without compromising on performance,” the research paper noted, adding, “We find that due to finetuning on user requests, ReaLM is able to understand more domain-specific questions.”