Paper reading report: motion synthesis from annotations

Published: November 9, 2015 Words: 1206

1.Abstract

For every new motion, the process of recording movement and translating that movement onto a digital model is an expensive process. How can new natural-look like motion be achieved with cost effective? The motion Synthesis from Annotations [Arikan et al., 2003] paper describe a framework that allows a user to synthesize human motion by combining already available motion sequences from the database without disrupting its natural look like appearance. First, the motion in the database is annotated with certain vocabulary like walk, run or jump. When a new motion is required, the user paints a timeline with annotations from a vocabulary. The system finds the sequence of motions which match the user query and puts it them together to generate smooth, natural-looking motion. In order to avoid the tedious process of manually annotating the whole database, the paper describes to use the automatic classifier tool, support vector machine, SVM-classifier. For efficient searching, the motion in the database is clustered and the dynamic programming is used to obtain a solution efficiently.

2.Introduction

Motion capturing seems to be the most suitable solution to bring realistic natural motion into the computer. But the limitation is that most motion capture systems are very expensive to use and it lacks flexibility and re-usability. Thus it will be the best if a new desired motion can be generated from combining already captured motion in the database. There are already a number of algorithms for motion synthesis from motion data. The paper [Arikan et al., 2003] describes a practical algorithm that synthesizes the motion. The rest of this report will describe the synthesis process proposed by the paper.

3.Choosing Vocabulary

The first step is to choose vocabulary. The vocabulary chosen will reflect the motion database and will defines the level of control of the synthesis. There is no limitation on choosing vocabulary. However the vocabulary chosen should have an intuitive control for the synthesis.

4.Annotate motions in database

After choosing suitable vocabulary, all of the motion in the database must be annotated. Annotation all the motions manually is tedious work. The system use the automatic classifier which is so called support vector machine, SVM-classifier. For certain action every motion frame in the database is partitioned into two groups. The first group can be called positive group (+1) that mean the frame performing the action and the second group is negative group (-1) the frame that do not. The user needs to annotate only a small fraction of the database manually and the annotation process will complete with the SVM-classifier.

In order to use support vector machine every frame in the database need to be transformed into a torso coordinate system. The motion attributes are the coordinates of the 30 joints (each joint has a discrete 3D trajectory in space) for one second of motion centre at the frame being classified. To place a new frame into two groups, a radial basis function can be used. After classifying new frame with the help of SVM, the user can verified the result and make correction if necessary.

5.User Input

The user can give the desired motion to be performed by painting along the timeline on the bottom of the editor [Fig 1]. By painting the green bar on the timeline, the user can direct the character to do specific action. The user can also give negative annotation which mean don't do that action by painting with blue bar. The green triangle is used for geometric constraints which point the direction for the character. Often the user would like a motion to arrive at the particular spot in a particular orientation. In that case position constraints are used. For that purpose, there is a second arrow on editor to compose motion to arrive at the desire state. A frame constraint is used for a desired pose of the character (right on the screen). With this the user can drag the position constraint onto the character to pose specific configuration at specific frame.

The user query is stored as a vector for every frame. If frame have positive annotation it is 1. If frame have negative annotation it is -1. And other is 0 which mean don't care.

6.Searching the desired motion with best performance (Optimization)

Once all motion frames in the database is annotated, the user desired motion frame sequence is ready to search. f1.fT represents all of the frame in database (T is the total number of Frame). The system need to search user desired frame fi.fi where i ∈[1T] . The following is the objective function of the desired motion sequence.

mini..n[ i=1nD(i,A(ƒi))+( 1-)i=1n-1C(ƒi,ƒi+1)] (1)

In the above function, there are two main parts. The first part is the summation of the function D(i,Aƒi) and the other part is C(ƒi,ƒi+1) is be used as a weight parameter to specify the higher priority between these two function.

6.1Distance Function

D(i,Aƒi) compares the distance between the annotation vector for frame ƒ which is A(ƒ) and the frame i of the motion to synthesize. The function for D(i,Aƒi) is as follow.

Di,Afj= -j=1mQij ×Af[j] (2)

The minimum distance is the best match. For example, assume there are 3 annotations (run, jump and wave) in the annotation vector Af. Let consider two sample frames in the database, the first frame is annotated as (1, -1, 1) which mean the character is run, not jump and wave. The second frame is annotated as (1, 1, -1) which mean run, jump but not wave. Now the user want the character is running and jumping. The user doesn't care the character is waving hand or not. Thus the user query is (1, 1, 0). The distance function applies on the first frame with user query as follow.

Di,Aƒ1=110 ×1-11=-10=0

Then, the distance function also applies on the second frame with user query.

Di,Aƒ2=110 ×11-1=-1 2= -2

The second frame is better match frame because it has the minimum result.

6.2Continuity Function

The second part of the function (1) compute continuity between ƒi and ƒj. Actually this is the continuity of frame ƒj and ƒi+1. The storage cost will be O(T2) if every distance between every frame in the database is saved. Thus the system use feature vector of a frame f, which is given by F(f). These vectors consist of the joint position, velocity and acceleration for every single joint. So for two frames ƒiand ƒj the functionC is defined as :

Cfi, fj= Ffi+1- F(fj) (3)

In the objective function (1) both functionsD andC are working locally by measuring the goodness of individual frames. The dynamic programming whichis a method of solving complex problems by breaking them down into simpler steps can be used to solve the problem. But if dynamic programming is applied on every frame, the total cost will be O(n ×T2) which is expensive in practical usage. The next section will describe the practical algorithm which search the frame block instead of single frames. The algorithm is efficient for an interactive program and becomes feasible for a real-time application.

7.Practical Algorithm

Instead of considering each frame, the algorithm applies on frame block (according to the observation, annotation usually run as a sequence). The system use 32 frames block. So the DP performs on 32 frame blocks.