Intelligent Systems Lab Project: The Multilingual CITEC-Receptionist

Participants

Konstantin Buschmeier
Hendrik ter Horst
Marc Otto

Supervisors

Prof. Dr. Philipp Cimiano
Judith Gaspers
apl. Prof. Dr.-Ing. Stefan Kopp
Dr. Christina Unger
Sebastian Walter
Dr. Herwin van Welbergen

Motivation

Navigation in an unfamiliar environment is a common problem in many scenarios of everyday life. Searching for a room in a new building, one can ask someone for help, who is familiar with the building. However, when there's no one around to help (one can also imagine the situation in a hardware store), an embodied conversational agent may help out. The current work presents a multilingual receptionist — MULIREC — with the task to help visitors of the Interactive Intelligent Systems research building to find their destination.

Application Scenario

The user enters an unfamiliar environment and searches for a person or a room. To get help the user asks the multilingual receptionist for directions.
For example he needs to go to the support but does not know how to get there, so he asks the MULIREC: "Wo ist das Büro des Supports?" (Where is the support office?)
One possible answer could be: "Das Büro des Supports ist in M3-100." (The support office is in M3-100.)

The user is somehow distracted by his english speaking friend and thus prompts the agent in english to repeat the answer: "Could you please repeat that?"
Whereupon the agent says: "I said the support office is in M3-100."

Objectives

The project goals are

Integration of multilingual speech recognition
Developing a grammar and an interpreter
Set up a dialogue model for this specific domain
Developing an interface for requesting data about scientific staff and places
Integration of all components with an appealing virtual agent (capable of mimic, gesture and multilingual natural language generation) by means of the middleware IPAACA (Incremental Processing Architecture for Artificial Conversational Agents)

Description

The multilingual receptionist:

Understands spoken language (German and English)
Is able to understand several request types to obtain information about
- room numbers
- phone numbers
- email addresses
- special purpose rooms (currently implemented: toilets)
When needed prompts the user to provide more precise information
Retrieves needed information from the database of the CITEC staff
Produces spoken language (German and English)
The prototype of the multilingual receptionist uses the following components:
- Windows Speech Recognition by Microsoft.
- Lightweight Directory Access Protocol database of the CITEC staff
- Incremental Processing Architecture for Artificial Conversational Agents (IPAACA)
- Articulated Social Agents Platform (ASAP Realizer)
- Behavior Markup Language (BML)
- Modular Architecture for Research on speech sYnthesis (MARY TTS)

Results

A prototype of the multilingual receptionist has been implemented using the described components
The agent can deliver information about: room numbers, phone numbers, email addresses and toilets
The agent forms an appropriate answer and generates natural language

The video shows a typical conversation with the virtual agent, demonstrating a variety of user requests and the agent's reaction.

Discussion and Conclusion

A functioning prototype has been implemented meeting the project's goals
During the implementation phase it became clear that the robustness of the speech recognition is an integral part of the project, thus it might be useful to test and compare different speech recognition systems
Windows Speech Recognition runs only on Windows systems whereas the other parts of our system run on Linux, therefore two computers or software virtualization are needed
Given the goal to handle multilingual input it is currently necessary to run two speech recognizers in parallel, which could prove to be not the most sophisticated solution

Outlook

The current system could be further improved as follows:

Enhance the robustness of the speech recognition
Improve reliability of language identification
Extend the grammar to cover a wider range of natural language and improve the robustness of the interpreter
Implement additional request types
Improve and extend facial expressions, gesture and natural language generation features
Implement the ability to localise destinations on a map and give turn-by-turn directions
Add smalltalk capabilities
Enable the agent to recognise potential dialogue partners in sight (using appropriate hardware, e.g. Kinect)
Make the agent proactive, i.e. the agent tries to initiate conversations
Test the agent in a real world context

References

Articulated Social Agents Platform (ASAP Realizer)
Behavior Markup Language (BML)
Incremental Processing Architecture for Artificial Conversational Agents (IPAACA)
Modular Architecture for Research on speech sYnthesis (MARY TTS)
Sociable Agents Group

Navigation

Activities