Intelligent Systems Lab Project: The Multilingual CITEC-Receptionist
Participants
- Konstantin Buschmeier
- Hendrik ter Horst
- Marc Otto
Supervisors
- Prof. Dr. Philipp Cimiano
- Judith Gaspers
- apl. Prof. Dr.-Ing. Stefan Kopp
- Dr. Christina Unger
- Sebastian Walter
- Dr. Herwin van Welbergen
Motivation
Navigation in an unfamiliar environment is a common problem in many scenarios of everyday life. Searching for a room in a new building, one can ask someone for help, who is familiar with the building. However, when there's no one around to help (one can also imagine the situation in a hardware store), an embodied conversational agent may help out. The current work presents a multilingual receptionist — MULIREC — with the task to help visitors of the Interactive Intelligent Systems research building to find their destination.
Application Scenario
The user enters an unfamiliar environment and searches for a person or a room. To get help the user asks the multilingual receptionist for directions.
For example he needs to go to the support but does not know how to get there, so he asks the MULIREC: "Wo ist das Büro des Supports?" (Where is the support office?)
One possible answer could be: "Das Büro des Supports ist in M3-100." (The support office is in M3-100.)
The user is somehow distracted by his english speaking friend and thus prompts the agent in english to repeat the answer: "Could you please repeat that?"
Whereupon the agent says: "I said the support office is in M3-100."
Objectives
The project goals are- Integration of multilingual speech recognition
- Developing a grammar and an interpreter
- Set up a dialogue model for this specific domain
- Developing an interface for requesting data about scientific staff and places
- Integration of all components with an appealing virtual agent (capable of mimic, gesture and multilingual natural language generation) by means of the middleware IPAACA (Incremental Processing Architecture for Artificial Conversational Agents)
Description
The multilingual receptionist:- Understands spoken language (German and English)
- Is able to understand several request types to obtain information about
- room numbers
- phone numbers
- email addresses
- special purpose rooms (currently implemented: toilets)
- When needed prompts the user to provide more precise information
- Retrieves needed information from the database of the CITEC staff
- Produces spoken language (German and English)
- The prototype of the multilingual receptionist uses the following components:
- Windows Speech Recognition by Microsoft.
- Lightweight Directory Access Protocol database of the CITEC staff
- Incremental Processing Architecture for Artificial Conversational Agents (IPAACA)
- Articulated Social Agents Platform (ASAP Realizer)
- Behavior Markup Language (BML)
- Modular Architecture for Research on speech sYnthesis (MARY TTS)
Results
- A prototype of the multilingual receptionist has been implemented using the described components
- The agent can deliver information about: room numbers, phone numbers, email addresses and toilets
- The agent forms an appropriate answer and generates natural language
The video shows a typical conversation with the virtual agent, demonstrating a variety of user requests and the agent's reaction.
Discussion and Conclusion
- A functioning prototype has been implemented meeting the project's goals
- During the implementation phase it became clear that the robustness of the speech recognition is an integral part of the project, thus it might be useful to test and compare different speech recognition systems
- Windows Speech Recognition runs only on Windows systems whereas the other parts of our system run on Linux, therefore two computers or software virtualization are needed
- Given the goal to handle multilingual input it is currently necessary to run two speech recognizers in parallel, which could prove to be not the most sophisticated solution
Outlook
The current system could be further improved as follows:- Enhance the robustness of the speech recognition
- Improve reliability of language identification
- Extend the grammar to cover a wider range of natural language and improve the robustness of the interpreter
- Implement additional request types
- Improve and extend facial expressions, gesture and natural language generation features
- Implement the ability to localise destinations on a map and give turn-by-turn directions
- Add smalltalk capabilities
- Enable the agent to recognise potential dialogue partners in sight (using appropriate hardware, e.g. Kinect)
- Make the agent proactive, i.e. the agent tries to initiate conversations
- Test the agent in a real world context
References
- Articulated Social Agents Platform (ASAP Realizer)
- Behavior Markup Language (BML)
- Incremental Processing Architecture for Artificial Conversational Agents (IPAACA)
- Modular Architecture for Research on speech sYnthesis (MARY TTS)
- Sociable Agents Group