Facial Animation Implementation And Performance Computer Science Essay

Published: November 9, 2015 Words: 3471

Facial animation is primarily an area of computer graphics that encapsulates models and techniques for generating and animating images of the human head and face (Wiki, 2010).

Facial animation has been a research topic for more than 25 years, aiming at building models of human faces which can be animated and used to produce facial expressions reflecting emotions and mouth movements for spoken text (F. Parke 1974). The first effort to animate a face by using a computer was more than 25 years ago and as a result of today's search for new ways to use computers to convey information to humans this field has increased greatly in the past few years as fast computer graphics workstations have made the modelling and real-time animation of thousands of polygons affordable and almost commonplace. The human face is a complex communication channel and thus hard to model. It is a very familiar and sensitive object of human perception.

It is today that the research is paying off and the animated faces actually begin to resemble human ones. With the constant increase in computing power and more advanced hardware emerging today realistic results of facial animation can now be seen in games and movies such as Matrix, World of Warcraft, The Lord of the Rings etc.

2. History of Facial Animation

The first computer-generated images of faces were generated by Parke as part of Ivan Sutherland's computer graphic course at the University of Utah in the early 1970s (E. McCracken 1995). In the beginning the face was animated very crudely with polygonal representation of the head. A few years later he managed with the new polygon shading techniques just emerging to create a somewhat realistic animation. He did this by collecting data from photos, transferring their features to his polygon model and then interpolating between these expressions (F. Parke, 1996).

In 1971, Chernoff first published his work using computer-generated two-dimensional face drawings to represent a k-dimensional space (Journal of American, 1973). By using a simple graphical representation of the face, an elaborate encoding scheme was derived.

In 1976, Paul Ekman and Wallace Friesen developed Facial Action Coding System (FACS), the system defines 64 basic facial Action Units (AUs).

In 1980, Platt at the University of Pennsylvania developed the first physically-based muscle-controlled facial expression model (E. Chuang, 2004) and the development of techniques for computer produced two-dimensional facial caricatures by Brennan. There was also concurrent research in the field of 2D facial animation.

In 1985, an animation short film, "Tony De Peltrie" produced by Bergeron and Lachapelle was a landmark for facial animation (P. DeWitt, 1985), they used 3D facial animation with synchronized speech. For the first time computer facial expression and speech animation were a fundamental part of telling the story.

In 1987, Waters developed a new muscle-based model approach to facial expression animation that allowed a wide range of expressions to be created by controlling the underlying muscles.

In 1988 Pixar won an Academy Award for their short film "Tin Toy", a story containing a computer animated baby. The baby's face was depicted using facial animation (F. Parke, 1996), and the computer game such as "The Sims" developed by Maxis Software is also a milestone in this period.

The emerging of optical range scanners in the early nineties provided a much easier way to gather data for the facial animation. Up till then the most common way of gathering facial data was simply using photos. With the more enhanced field of computer image processing, more powerful computers and graphics cards facial animation is today closing in on realistic results. Also there has been great interest from the computer image processing research in tracking facial features. That is, through image processing extract the facial expressions and position of a human head. This data can then be used to control a facial model (F. Parke, 1996).

In the 2000s, another milestone in facial animation was reached by the film "The Lord of the Rings" produced by Peter Jackson where a character specific shape base system was developed.

3. Facial Parameterisation

Directly linked with the actual animation is the parameterisation of a face. That is a way of describing a face using parameters to control it so that different systems can exchange information on the animated faces. The first person to actually try describing human faces was Charles Darwin in 1872 with the book "Expressions of the Emotions in Man and Animals" where he tried to categorise human expressions (F. Parke, 1996).

There has also been extensive research in this area dating back to the 70ies when the FACS system was developed. This was intended to be a way of scoring human expressions, but with the systematic approach this also appealed to the facial animation researchers. The FACS method defines 46 action units, or basic facial movements, on a face (I. Pandiz, 2002).

Up until today there has been presented several models using direct parameterisation, pseudo-muscle based, muscle-based or interpolation. The ideal parameterisation should of course be able to express every possible expression on a human face with the model. This is not achievable as the amount of parameters would be tremendous.

4. Facial Modelling and Animation Techniques

4.1 Facial Modelling

Face modelling can be considered in two proposes: to create a new face or clone a real face. In the first case, an artist will do the design of the face because there is no data available for a face that does not exist. The designer can start from a generic basic three-dimensional face and use modelling software to modify it.

In case of cloning a real face different techniques are available, which can be split into two parts in regard of the data input used: three-dimensional and two dimensional input.

Most facial modelling systems describe facial actions with either muscle notation or FACS (P. Ekman, 1978). FACS is based on anatomical studies and denotes any visible movement.

4.2 Facial Animation

The facial animation is a complex task due to the real structure of the face composed of muscles, bones and skin. The motion of this complex structure is difficult to simulate because small changes make different expressions and humans are used to reading them intuitively.

In different facial animation techniques a scheme is used which describes the relation of the muscle actions and their effects on the facial expression. FACS describes the AUs, which are the smaller visible changes on the human face, in order to associate them with muscles responsible of these changes.

4.3 Model Based Coding

When using model-based coding, a face is constructed according to a specific model. The feature of a face is converted to a set of parameters. The parameters characterizing a face are chosen so that the feature of the face can be recreated using these parameters. The obvious gain using model-based coding compared to sending pre rendered images is when communicating in low bit-rate networks. This stream of parameters can also be compressed to save even more bandwidth. They are then sent over the network to the receiving software that re-constructs the face using the same coding scheme (I. Pandzic, 2002).

Another advantage is that the receiver is given the liberty to choose whichever representation that is wanted. If the representation presented is not satisfactory, just change it, as long as the desired face complies with the coding model. Of course both coder and decoder of the parameters must use the same scheme. The only industrial standard coding scheme today is the MPEG-4 standard (Moving Pictures Group, 2002).

4.4 Morph Targets

Morph targets are a very common way of describing the different expressions of a parameterised face. Just describing the parameters on a face is not enough; it has to be known how they move on a face. A morph target describes how a face looks in a specific position. If a parameter is the corner of the lip it has to be known how the face looks like when this parameter is at its minimum and maximum position. From this new faces can be produced using these two morph targets and then morphing them together, creating an animation of the face moving from one expression to another.

There are two major ways of morphing: weighted morphing and segmented morphing.

Weighted morphing concerns when a base face is morphed with two or more targets, and the result then is combined (B. Flemming, 1999). This way for example a face pronouncing a vowel and a face frowning can be morphed into one, creating a frowning face pronouncing a vowel. The morph targets can be assigned weights of how great percentage they should affect the outcome.

Segmented morphing is when separate areas of the face are morphed individually (B. Flemming, 1999). The power of this technique is total control of the separate parts individually. Motion in one group does not affect the others. First of all, separate areas of the face have to be defined. Then different morph targets affecting that area have to be modelled. This is often a very tedious task for the artists. There is really no limit on how many morph targets that can be created, only the artists level of perfection sets the boundaries.

Both techniques can be used in facial animation, though the latter one is more widely used as it coincides more with the method of a parameterised face.

4.5 Keyframes

This technique is the same as the one used to make conventional animations, where the animator creates key poses of the model and the animation system interpolates the in-between frames from the keyframes. Keyframes are defined as positions (expressions) in the animation time and an algorithm calculates the frames between the keyframes. The keyframes can be built by an artist or by motion capture.

This method is used in a lot of animations and gives good results but it has some disadvantages. First one is the use of keyframes, which restrict the range of new animations at the number of existing keyframes. Second one is by using this technique in facial animation, is the unreal displacement of the vertices between two keyframes due to the fact that every point moves with the same motion. However what is searched is not a correct simulation but a good visual rending. The linear interpolation is not the only one to be used, better results can be achieved by using Bezier or B-Spline curves.

4.6 Parametric Control

This technique is to use a small set of parameters to control a facial mesh (F. Parke, 1982). These parameters are connected to a particular facial geometry, and are only loosely based on the dynamics of the face. The disadvantages of this technique is the parameters are connected directly to the mesh and the set of parameters must be redefined for any new mesh.

4.7 Physical Based Model

Physical based model implemented by Waters based on three levels to simulate the skin structure: cutaneous tissue, subcutaneous tissues layer and muscles (K. Waters, 1987). Each level is represented by mass-spring models and the spring stiffness' simulate the muscle actions. The disadvantage of this technique is expensive and difficult to control with force based functions.

4.8 Muscle Based Model

The abstract muscle based model has been developed by Waters in 1987. Muscle based model is based on a simple level of the action of the face muscles. The deformation of the polygon mesh is made through two types of muscle:

1. The Linear Muscle, which pulls the mesh, is represented by a point of attachment and a vector.

2. The Sphincter muscle, which squeezes, is represented by an ellipse

The advantage of this technique is that the system mesh modification is independent of the topology of the face. The disadvantages of this technique are neither of them are connected to the polygon mesh and they map directly into the muscle based coding systems. The muscle based system has also extended to B-Spline patches by Carol Wang (C. Wang, 1994).

4.9 Motion Capture

This technique tracks via some reflective elements fixed on the person's face and applied the digitalized performance to the facial model. The reflective elements should be visible from the camera all the time. A video stream could also be used to track the motion of pixels from frame to frame to extracting the facial position and expression. This enables the recognition of very small changes in the face.

5. MPEG-4 Facial Animation

The international standard MPEG-4, ratified in 1999, includes definitions for the coding of parameters for Facial Animation. The MPEG-4 facial animation is mainly aimed at the animation of virtual faces, defining how to control a face’s shape and movements. However, MPEG-4 facial animation can also support model-based coding when used for the extraction of parameters from images depicting real faces (I. Pandzic, 2003).

Figure 1: MPEG-4 Feature Points. (I. Pandzic, 2003)

Figure 2: Reference distances considered by the MPEG-4 Facial Animation standard in order to define the FAPUs. (I. Pandzic, 2003)

The standard defines a model of the generic human face in its neutral state. On this model, located in key positions, are 84 Feature Points (FPs) which make up the shape of the face. The FPs are arranged in groups and are described by their three-dimensional coordinates, see Figure 1. The disposition of the FPs (and thus the animation of the face model) is controlled by a set of Facial Animation Parameters (FAPs), each corresponding to a specific facial movement. The 68 FAPs, again divided in groups, describe basic movements of the face, allowing the

morphing of the model and thus the representation of a complete set of expressions and "visemes". A viseme represents a basic part of the speech in the visual domain, describing the facial movements related to the corresponding "phoneme", a basic unit of speech in the acoustic domain. Several phonemes can share the same viseme. Knowing the model and the intensities of the FAPs, a generic face can be reshaped to perform a particular expression or movement. (I. Pandzic, 2003).

To be able to animate faces of heterogeneous size and proportions, the FAPs are model-independent: their values are measured in relation to the neutral face’s proportions. In fact, the FAPs are expressed in Facial Animation Parameter Units (FAPUs), defined as fractions of the distances between some key facial features. Figure 2 shows these reference distances. (I. Pandzic, 2003).

5.1 Facial Animation Parameters

The feature points themselves only define the vertices that must be known on a face. They cannot themselves produce animatable results. The feature points have to be controlled somehow so that at a certain point in time, the feature points are positioned to form the facial expression desired by the artist. Thus arises the need for defining morph targets for the face. These morph targets define how the face is deformed when a feature point is moving.

The MPEG-4 standard defines a way to control these 84 feature points using the notion of Facial Animation Parameters (FAP). The MPEG-4 standard defines 66 low level FAP and 2 high level FAP (Moving Pictures Group, 2002). The low level FAP are defined as the position of the 84 feature points, as shown in Figure 1, and is closely related to the movement of facial muscles. Each of the 66 low level FAP define a movement for a specified number of feature points, for example FAP

number 46 denotes raise_tounge. When FAP 46 is issued the feature points associated with this FAP is morphed in some manner from the current look of the face to the new look. This way an animation of the face is achieved.

In the MPEG-4 the FAP are stored in a compressed stream and then fed to the facial animation system, which decodes them and uses the values of the FAP to position the feature points of a face. With this stream a flowing animation of several sequences can be achieved as the decoder animates between the key frames of the face.

There are also 2 high level FAPs â€" one for expressions and one for visemes. Expressions can at most be two out of a list of 6 pre-defined modes such as anger, fear, surprise, neutral. The viseme parameter can contain at most two out of a list of 14 pre-defined visemes. These expressions and visemes are also fed to the animation system, which adds their influence to the feature points. This way the same talk sequence can be made to look angry, surprised, terrified etc.

The actual animation is produced by an animating system decoding the MPEG-4 stream. It receives frames consisting of different FAP and morphs the face in-between frames to animate it.

5.2 Facial Motion Cloning

If the artists conform to use a MPEG-4 stream enabled facial animation, it is still necessary to animate every single FAP that MPEG-4 defines. This work is tedious and often fairly the same from face to face. Would it not be nice if the artist could just copy these expressions from an already defined face onto a new, just having to fine-tune the new model? A method to create all of these morph-targets called facial motion cloning has been proposed.

Although not stated in the MPEG-4 standard, the FAP could be called morph targets since they define the face in a certain key position, a position in which the artist has to model the face. Facial motion cloning (FMC) copies the low and high-level FAP from one model to another. Defining a subset of MPEG-4 feature points, the motion of one defined face is copied to a new static face. The role of the feature points is to define the correspondence between the two faces. With the feature points defined, the FMC software knows for example where the nose is on both faces. This is then used to copy the facial movements (I. Pandzic, 2001).

6. Use of Facial Animation Today

Today, facial animation is widely used in films, computer games and medical equipment. The latest addition to the facial animation family is the use of interactive figures on the web (I. Pandzic, 2001). The adding of an animated face can make the user experience more natural than using a keyboard or a mouse. Also there is a vast field for the physically disabled persons who are unable to use today’s means of interaction with a computer (I. Pandiz, 1994).

The aim for facial animation today is of course real-time rendering. In web applications the interactive face must immediately give some feedback to the user. A face turning into an hourglass while generating an answer would not be a nice solution.

7. Conclusion and Recommendations

Computer facial animation is now being used in a multitude of important fields. It brings a more human, social and dramatic reality to computer games, films and interactive multimedia, and is growing in both use and importance. Authoring computer facial animation with complex and subtle expressions is still difficult and fraught with problems. It is currently mostly authored using generalized computer animation techniques, which often limit the quality and quantity of facial animation production. Given additional computer power, facial understanding and software sophistication, new face-centric methods are emerging but typically are ad-hoc in nature.

For many years now the facial animation field has not been uniform. There have been several camps, none eager to communicate with the other. These persons are interested in the actual modelling of a face; how it is best parameterised, coded and re-constructed. Achieving realistic results is of course the main goal. But the collaboration with the end users - the facial artists producing animations and modelling faces has not been so good. The task of actually constructing an animatable face was often not considered, often leaving the facial artists with tremendous tasks of designing the faces.

The facial artists on the other hand were only interested in modelling the faces and animating them. There was often a new solution every time a face would be animated and of course a lot of time was spent on reinventing the wheel at every occasion. They did not care much about

how the face was parameterised or coded, simply that the appearance of the face was realistic.

The third camp is the image coding researchers. Model-based image coding research focuses amongst other things on how to recognize objects from still pictures. For example to supervise traffic moving cars must be recognized from images. This is applicable when dealing with facial animation as well, namely how the parameters could be extracted from moving images and then used to make a facial animation. This way an actor could be recorded, and his features be captured from video and transferred to an animated face. This gives higher realism since the animated face would behave exactly as a human. There has been some collaboration between the image coding camp and the facial animation camp.