research-article

3D Visual Speech Animation from Image Sequences

Authors:
Utpala Musti

Center for Machine Vision Research, University of Oulu

Center for Machine Vision Research, University of Oulu
View Profile

,
Slim Ouni

Université de Lorraine, LORIA, UMR 7503

Université de Lorraine, LORIA, UMR 7503
Search about this author

,
Ziheng Zhou

Center for Machine Vision Research, University of Oulu

Center for Machine Vision Research, University of Oulu
View Profile

,
Matti Pietikäinen

Center for Machine Vision Research, University of Oulu

Center for Machine Vision Research, University of Oulu
View Profile

ICVGIP '14: Proceedings of the 2014 Indian Conference on Computer Vision Graphics and Image ProcessingDecember 2014Article No.: 47Pages 1–7https://doi.org/10.1145/2683483.2683530

Published:14 December 2014Publication History

ICVGIP '14: Proceedings of the 2014 Indian Conference on Computer Vision Graphics and Image Processing

Pages 1–7

ABSTRACT

In this paper we describe an early version of our system which synthesizes 3D visual speech including tongue and teeth from frontal facial image sequences. This system is developed for 3D Visual Speech Animation (VSA) using images generated by an existing state-of-the-art image-based VSA system. In fact, the prime motivation for this system is to have a 3D VSA system from limited amount of training data when compared to that required for developing a conventional corpus based 3D VSA system. It consists of two modules. The first module iteratively estimates the 3D shape of the external facial surface for each image in the input sequence. The second module complements the external face with 3D tongue and teeth to complete the perceptually crucial visual speech information. This has the added advantages of 3D visual speech, which are renderability of the face in different poses and illumination conditions and, enhanced visual information of tongue and teeth. The first module for 3D shape estimation is based on the detection of facial landmarks in images. It uses a prior 3D Morphable Model (3D-MM) trained using 3D facial data. For the time being it is developed for a person-specific domain, i.e., the 3D-MM and the 2D facial landmark detector are trained using the data of a single person and tested with the same person-specific data. The estimated 3D shape sequences are provided as input to the second module along with the phonetic segmentation. For any particular 3D shape, tongue and teeth information is generated by rotating the lower jaw based on few skin points on the jaw and animating a rigid 3D tongue through keyframe interpolation.

References

R. Anderson, B. Stenger, V. Wan, and R. Cipolla. An expressive text-driven 3D talking head. In SIGGRAPH 2013 Posters. Google ScholarDigital Library
A. Asthana, S. Zafeiriou, S. Cheng, and M. Pantic. Robust discriminative response map fitting with constrained local models. In CVPR 2013. Google ScholarDigital Library
P. Badin, G. Bailly, L. Revéret, M. Baciu, C. Segebarth, and C. Savariaux. Three-dimensional linear articulatory modeling of tongue, lips and face, based on MRI and video images. J. Phon. 2002.Google Scholar
V. Blanz, C. Basso, T. Poggio, and T. Vetter. Reanimating faces in images and video. Comput. Graph. Forum 2003.Google Scholar
V. Blanz and T. Vetter. A morphable model for the synthesis of 3D faces. In SIGGRAPH 1999. Google ScholarDigital Library
M. M. Cohen and D. W. Massaro. Modeling coarticulation in synthetic visual speech. In Models and Techniques in Computer Animation. Springer-Verlag, 1993.Google ScholarCross Ref
T. Cootes, G. Edwards, and C. Taylor. Active appearance models. TPAMI 2001. Google ScholarDigital Library
D. Cristinacce and T. Cootes. Facial feature detection and tracking with automatic template selection. In AFGR 2006. Google ScholarDigital Library
dmitrij leppée. Teeth model set. http://www.badking.com.au/site/shop/medical-models/human-teethby-dmitrij-leppee/.Google Scholar
O. Engwall. Combining MRI, EMA and EPG measurements in a three-dimensional tongue model. Speech Communication 2003.Google Scholar
G. Fanelli, J. Gall, H. Romsdorfer, T. Weise, and L. V. Gool. A 3-D audio-visual corpus of affective communication. IEEE Trans. Multimedia 2010. Google ScholarDigital Library
G. Gibert, V. Attina, M. Tiede, R. Bundgaard-Nielsen, C. Kroos, B. Kasisopa, E. Vatikiotis-Bateson, and C. Best. Multimodal speech animation from electromagnetic articulography data. In EUSIPCO 2012.Google Scholar
J. Hesch and S. Roumeliotis. A direct least-squares (DLS) method for PnP. In ICCV 2011. Google ScholarDigital Library
M. D. Ilie, C. Negrescu, and D. Stanomir. An efficient parametric model for real-time 3D tongue skeletal animation. In ICCIT 2012. Google ScholarDigital Library
N. H. Kassab. The selection of maxillary anterior teeth width in relation to facial measurements at different types of face form. Al-Rafidain Dental Journal, 2005.Google Scholar
I. Kemelmacher-Shlizerman and R. Basri. 3D face reconstruction from a single image using a single reference face shape. TPAMI 2011. Google ScholarDigital Library
S. A. King and R. E. Parent. A 3D parametric tongue model for animated speech. J. Visual. Comput. Animat. 2001.Google Scholar
M. D. Levine and Y. C. Yu. State-of-the-art of 3D facial reconstruction methods for face recognition based on a single 2D training image per person. Pattern Recogn. Lett. 2009. Google ScholarDigital Library
I. Matthews and S. Baker. Active appearance models revisited. IJCV 2004. Google ScholarDigital Library
E. Murphy-Chutorian and M. Trivedi. Head pose estimation in computer vision: A survey. TPAMI 2009. Google ScholarDigital Library
S. Ouni, L. Mangeonjean, and I. Steiner. VisArtico: a visualization tool for articulatory data. In INTERSPEECH 2012.Google Scholar
C. Pelachaud, C. van Overveld, and C. Seah. Modeling and animating the human tongue during speech production. In Proc. Computer Animation, 1994.Google ScholarCross Ref
C. Qin and M. Carreira-Perpinan. Reconstructing the full tongue contour from EMA/X-ray microbeam. In ICASSP 2010.Google ScholarCross Ref
J. Saragih, S. Lucey, and J. Cohn. Deformable model fitting by regularized landmark mean-shift. IJCV 2011. Google ScholarDigital Library
I. Steiner, K. Richmond, and S. Ouni. Speech animation using electromagnetic articulography as motion capture data. In AVSP 2013.Google Scholar
L. Vezzaro. ICAAM - inverse compositional active appearance models. http://sourceforge.net/projects/icaam/.Google Scholar
F. Vogt, J. E. Lloyd, S. Buchaillard, P. Perrier, M. Chabanas, Y. Payan, and S. S. Fels. An efficient biomechanical tongue model for speech research. In ISSP 2006.Google Scholar
A. Wrench. The MOCHA-TIMIT articulatory database. http://www.cstr.ed.ac.uk/research/projects/artic/mocha.html, 1999.Google Scholar
M.-H. Yang, D. Kriegman, and N. Ahuja. Detecting faces in images: a survey. TPAMI 2002. Google ScholarDigital Library
Z. Zhang, Z. Liu, D. Adler, M. F. Cohen, E. Hanson, and Y. Shan. Robust and rapid generation of animated faces from video images: A model-based modeling approach. IJCV 2004. Google ScholarDigital Library
Z. Zhou, G. Zhao, Y. Guo, and M. Pietikäinen. An image-based visual speech animation system. IEEE Trans. Circuits Syst. Video Technol. 2012. Google ScholarDigital Library
Z. Zhou, G. Zhao, X. Hong, and M. Pietikäinen. A review of recent advances in visual speech decoding. Image and Vision Computing 2014.Google Scholar
X. Zhu and D. Ramanan. Face detection, pose estimation, and landmark localization in the wild. In CVPR 2012. Google ScholarDigital Library

Index Terms

3D Visual Speech Animation from Image Sequences
1. Computing methodologies
  1. Computer graphics
    1. Animation

Recommendations

Facial 3D Shape Estimation from Images for Visual Speech Animation
ICPR '14: Proceedings of the 2014 22nd International Conference on Pattern Recognition

In this paper we describe the first version of our system for estimating 3D shape sequences from images of the frontal face. This approach is developed with 3D Visual Speech Animation (VSA) as the target application. In particular, the focus is on the ...
Read More
Fully automatic face normalization and single sample face recognition in unconstrained environments

We present a fully automatic face normalization and recognition system.It normalizes the face images for both in-plane and out-of-plane pose variations.The performance of AAM fitting is improved using a novel initialization technique.HOG and Gabor ...
Read More
Subtle facial expression recognition using motion magnification

This paper proposes a novel method for subtle facial expression recognition that uses motion magnification to transform subtle expressions into corresponding exaggerated ones. Motion magnification consists of four steps: First, active appearance model (...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICVGIP '14: Proceedings of the 2014 Indian Conference on Computer Vision Graphics and Image Processing
December 2014
692 pages
ISBN:9781450330619
DOI:10.1145/2683483
General Chairs:
A. G. Ramakrishnan
IISc, Bangalore
,
Jitendra Malik
University California, Berkeley
,
Program Chairs:
Alex Efros
UC-Berkeley
,
C. V. Jawahar
IIIT Hyderabad
,
Manik Varma
Microsoft Research
Copyright © 2014 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 14 December 2014
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
3D facial shape estimation from images
3D morphable models
3D visual speech
active appearance models
facial landmark detection
tongue animation
visual speech animation
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate95of286submissions,33%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 138
  Total Downloads
- Downloads (Last 12 months)1
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

3D Visual Speech Animation from Image Sequences

ICVGIP '14: Proceedings of the 2014 Indian Conference on Computer Vision Graphics and Image Processing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Facial 3D Shape Estimation from Images for Visual Speech Animation

Fully automatic face normalization and single sample face recognition in unconstrained environments

Subtle facial expression recognition using motion magnification