Technology

The Facial Animation Engine is a software library for real-time animation of 3D virtual characters, and represents the core of the EPTAMEDIA's technology. It can create animations at extremely low bitrate: 2 kbit per second @ 25 frame per second (around 70 bits per frame!), independently from model complexity or image resolution (see fig. 1). Animations are driven by the animation parameters recently standardized by MPEG-4 or by any subset of them.


Fig.1 The Facial Animation Engine is a software for real-time animation of 3D virtual characters

A block diagram of the Facial Animation Engine

The core of the Facial Animation Engine (see fig. 2) is the Animation Module; it is responsible for the conversion of the semantic information associated to a 3D model into animation rules. Any animation rule is function of a Facial Animation Parameter (FAP) defined by MPEG-4 and it is created according to the standard specifications (Simple FA Profile).


Fig.2 A block diagram of the Facial Animation Engine Software

The virtual faces are represented with triangular meshes, created with any 3D authoring tool. A semantic description is then associated to the geometric description and provided to the animation software. Almost any model, either human or cartoon-like, can be animated. The semantic description is used by the animation software to create the animation rules. The semantic description is at very hi-level since it contains just indices of vertices. This means that animating a virtual face is now as simple as labeling vertices of a 3D model, and requires almost no expertise in facial animation.

MPEG-4 codec, audio codec and the Facial Animation Engine

Though the core of the FAE can be fully compliant with the MPEG-4 Simple FA Profile specifications, the software is provided with a proprietary Animation Parameter Decoder. On Windows (TM) platforms the FAE uses the TrueSpeech 8.5 (TM) audio decoder, for voice compression at 8 kbps. On different platforms, efficient techniques for audio encoding can be supported but are not provided. Contact us if you are interested in using the core of the FAE for developing MPEG-4 compliant products. Through a fruitful collaboration between EPTAMEDIA and another Italian company, bSoft, we can provide support and products on several aspects of the MPEG-4 standard.

Creation of Animation Sequences

The are several ways to create animation sequences. Figure 3 shows different approaches for the creation of animation sequences.


Fig.3 Creation of animation sequences

In the first case (top) a Text-to-Speech synthesizer (TTS) is used to create synthetic audio from plain text. Together with the audio samples, the TTS provides the sequence of pronounced phonemes and their duration. This information is used by a proprietary phoneme-to-FAP converter to infer the mouth movements corresponding to the pronounced text.
In the second case (middle), natural audio is used as input. By processing the audio with a phoneme recognizer, the sequence of pronounced phonemes can be obtained and, from that, the mouth movements.
The third case (bottom) makes use of dedicated tracking hardware, capable of capturing audio and facial movements of a real actor. In this case, the captured facial movements are encoded into animation parameter and then used to drive the virtual face. The quality of this last approach is obviously higher if compared with the other solutions; in addition, facial expression are also captured and synthesized, unlike the former cases, where only mouth movements can be synthesized.

Code Implementation

The core of the FAE is implemented in ANSI C code, and uses OpenGL as graphic interface. This ensure the portability of the code on several different platforms. As long as a platform has a 3D graphic library and audio libraries, the FAE can be reasonably ported on that platform.