Socially assistive robots have been increasingly discussed as solutions in care and domestic use for the support of senior adults; however, a few of them (e.g., Jack and Sophie) include automatic emotion recognition. The emotions could be used by a formal (e.g., nurse) or an informal caretaker (e.g., family member) to support the senior in their everyday life.
For example, the senior could feel safer at home knowing their caretakers can provide emotional support if needed. The caretaker can also be reassured about the senior state via the Guardian’s robot. The goal is to help seniors to maintain independence in their own homes. Hence, the emotional state is an essential indicator of day-to-day life status. The companion robot should be able to determine the emotions of a senior and react accordingly to this information. As such, the robot could change its behavior through communication modalities (i.e., eye color and shape, voice tone, and head position) based on the detected emotions. Thus, the robot would play more the role of a companion than an assistive machine.
The Guardian consortium selected the Misty 2 robot due to its sensors and development platform. This robot has multiple microphones and cameras. Speech emotion recognition (SER) was selected, instead of an image-based process [1], due to its smaller footprint (i.e., data acquisition, processing, and transfer from the robot) and its acceptability by the end-user population [2]. The seniors preferred the use of microphones over cameras[3] to sense their state due to the higher privacy impact caused by the camera.
The design system is simple; each time Misty records a sound identified as a human voice, the audio is sent to our emotion recognition system. The system is always updated with a better model providing higher performance. The emotional labels are then displayed in an online dashboard for the caretakers to see. We wish to enable automatic decision-making from the emotion label in the future. Misty would change its appearance and speech to help the elderly.
[1] M. B. Akçay and K. Oğuz, “Speech emotion recognition: Emotional models, databases, features, preprocessing methods, supporting modalities, and classifiers,” Speech Commun., vol. 116, pp. 56–76, Jan. 2020, doi: 10.1016/j.specom.2019.12.001.
[2] F. Portet, M. Vacher, C. Golanski, C. Roux, and B. Meillon, “Design and evaluation of a smart home voice interface for the elderly: acceptability and objection aspects,” Pers. Ubiquitous Comput., vol. 17, no. 1, pp. 127–144, Jan. 2013, doi: 10.1007/s00779-011-0470-5.
[3] M. Ziefle, S. Himmel, and W. Wilkowska, “When Your Living Space Knows What You Do: Acceptance of Medical Home Monitoring by Different Technologies,” in Information Quality in e-Health, Berlin, Heidelberg, 2011, pp. 607–624. doi: 10.1007/978-3-642-25364-5_43.