What is it about?
This paper discusses how social robots can better understand and facilitate conversations among multiple people. Currently, systems use various features like voice pitch and eye movements to identify who is speaking, but this requires extra sensors or complex processing. Alternatively, the paper suggests using automatic speech recognition (ASR) technology to directly identify speakers from transcribed text. The researchers leveraged ChatGPT, a large language model, to identify speaker labels and achieved a high level of accuracy. The study shows how language models like ChatGPT can be used in social robots and embodied conversational agents to facilitate better interactions between humans and machines.
Featured Image
Photo by Alex Knight on Unsplash
Why is it important?
Our paper demonstrates the potential of using large language models (LLMs), such as ChatGPT, to improve speaker diarization in social robots and embodied conversational agents. This approach has several advantages over current methods that rely on additional sensors or complex processing. Our study shows that using ASR transcriptions with LLMs can achieve high accuracy in identifying who is speaking in a conversation. As interest in social robots continues to grow, our findings provide a timely and unique contribution to the field of human-computer interaction (HCI) and natural language processing (NLP), with implications for improving the design and implementation of future systems. Our work is accessible to a broad audience interested in these topics and may help increase readership among researchers and practitioners in the fields of robotics, AI, NLP, and HCI.
Read the Original
This page is a summary of: Improving Multiparty Interactions with a Robot Using Large Language Models, April 2023, ACM (Association for Computing Machinery),
DOI: 10.1145/3544549.3585602.
You can read the full text:
Contributors
The following have contributed to this page