skip to main content
article

Object tracking: A survey

Published:25 December 2006Publication History
Skip Abstract Section

Abstract

The goal of this article is to review the state-of-the-art tracking methods, classify them into different categories, and identify new trends. Object tracking, in general, is a challenging problem. Difficulties in tracking objects can arise due to abrupt object motion, changing appearance patterns of both the object and the scene, nonrigid object structures, object-to-object and object-to-scene occlusions, and camera motion. Tracking is usually performed in the context of higher-level applications that require the location and/or shape of the object in every frame. Typically, assumptions are made to constrain the tracking problem in the context of a particular application. In this survey, we categorize the tracking methods on the basis of the object and motion representations used, provide detailed descriptions of representative methods in each category, and examine their pros and cons. Moreover, we discuss the important issues related to tracking including the use of appropriate image features, selection of motion models, and detection of objects.

References

  1. Aggarwal, J. K. and Cai, Q. 1999. Human motion analysis: A review. Comput. Vision Image Understand. 73, 3, 428--440.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Ali, A. and Aggarwal, J. 2001. Segmentation and recognition of continuous human activity. In IEEE Workshop on Detection and Recognition of Events in Video. 28--35.]]Google ScholarGoogle Scholar
  3. Avidan, S. 2001. Support vector tracking. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 184--191.]]Google ScholarGoogle ScholarCross RefCross Ref
  4. Baddeley, A. 1992. Errors in binary images and an l version of the haus- dorff metric. Nieuw Archief voor Wiskunde 10, 157--183.]]Google ScholarGoogle Scholar
  5. Ballard, D. and Brown, C. 1982. Computer Vision. Prentice-Hall.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Bar-Shalom, Y. and Foreman, T. 1988. Tracking and Data Association. Academic Press Inc.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Barron, J., Fleet, D., and Beauchemin, S. 1994. Performance of optical flow techniques. Int. J. Comput. Vision 12, 43--77.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Beaulieu, J. and Goldberg, M. 1989. Hierarchy in picture image segmentation: A step wise optimization approach. IEEE Trans. Patt. Analy. Mach. Intell. 11, 150--163.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Bertalmio, M., Sapiro, G., and Randall, G. 2000. Morphing active contours. IEEE Trans. Patt. Analy. Mach. Intell. 22, 7, 733--737.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Beymer, D. and Konolige, K. 1999. Real-time tracking of multiple people using continuous detection. In IEEE International Conference on Computer Vision (ICCV) Frame-Rate Workshop..]]Google ScholarGoogle Scholar
  11. Birchfield, S. 1998. Elliptical head tracking using intensity gradients and color histograms. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 232--237.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Black, M. and Anandan, P. 1996. The robust estimation of multiple motions: Parametric and piecewise-smooth flow fields. Comput. Vision Image Understand. 63, 1, 75--104.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Black, M. and Jepson, A. 1998. Eigentracking: Robust matching and tracking of articulated objects using a view-based representation. Int. J. Comput. Vision 26, 1, 63--84.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Blake, A. and Isard, M. 2000. Active Contours: The Application of Techniques from Graphics, Vision, Control Theory and Statistics to Visual Tracking of Shapes in Motion. Springer.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Blum, A. and Mitchell, T. 1998. Combining labeled and unlabeled data with co-training. In 11th Annual Conference on Computational Learning Theory. 92--100.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Blum, A. L. and Langley, P. 1997. Selection of relevant features and examples in machine learning. Artific. Intell. 97, 1-2, 245--271.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Boser, B., Guyon, I. M., and Vapnik, V. 1992. A training algorithm for optimal margin classifiers. In ACM Workshop on Conference on Computational Learning Theory (COLT). 142--152.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Bowyer, K., Kranenburg, C., and Dougherty, S. 2001. Edge detector evaluation using empirical roc curve. Comput. Vision Image Understand. 10, 77--103.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Bregler, C., Hertzmann, A., and Biermann, H. 2000. Recovering nonrigid 3d shape from image streams. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 690--696.]]Google ScholarGoogle Scholar
  20. Broida, T. and Chellappa, R. 1986. Estimation of object motion parameters from noisy images. IEEE Trans. Patt. Analy. Mach. Intell. 8, 1, 90--99.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Cai, Q. and Aggarwal, J. 1999. Tracking human motion in structured environments using a distributed camera system. IEEE Trans. Patt. Analy. Mach. Intell. 2, 11, 1241--1247.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Canny, J. 1986. A computational approach to edge detection. IEEE Trans. Patt. Analy. Mach. Intell. 8, 6, 679--698.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Caselles, V., Kimmel, R., and Sapiro, G. 1995. Geodesic active contours. In IEEE International Conference on Computer Vision (ICCV). 694--699.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Cham, T. and Rehg, J. M. 1999. A multiple hypothesis approach to figure tracking. In IEEE International Conference on Computer Vision and Pattern Recognition. 239--245.]]Google ScholarGoogle Scholar
  25. Chang, Y. L. and Aggarwal, J. K. 1991. 3d structure reconstruction from an ego motion sequence using statistical estimation and detection theory. In Workshop on Visual Motion. 268--273.]]Google ScholarGoogle Scholar
  26. Chen, Y., Rui, Y., and Huang, T. 2001. Jpdaf based hmm for real-time contour tracking. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 543--550.]]Google ScholarGoogle Scholar
  27. Collins, R., Lipton, A., Fujiyoshi, H., and Kanade, T. 2001. Algorithms for cooperative multisensor surveillance. Proceedings of IEEE 89, 10, 1456--1477.]]Google ScholarGoogle ScholarCross RefCross Ref
  28. Comaniciu, D. 2002. Bayesian kernel tracking. In Annual Conference of the German Society for Pattern Recognition. 438--445.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Comaniciu, D. and Meer, P. 1999. Mean shift analysis and applications. In IEEE International Conference on Computer Vision (ICCV). Vol. 2. 1197--1203.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Comaniciu, D. and Meer, P. 2002. Mean shift: A robust approach toward feature space analysis. IEEE Trans. Patt. Analy. Mach. Intell. 24, 5, 603--619.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Comaniciu, D., Ramesh, V., and Meer, P. 2003. Kernel-based object tracking. IEEE Trans. Patt. Analy. Mach. Intell. 25, 564--575.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Cootes, T., Edwards, G., and Taylor, C. 2001. Robust real-time periodic motion detection, analysis, and applications. IEEE Trans. Patt. Analy. Mach. Intell. 23, 6, 681--685.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Costeira, J. and Kanade, T. 1998. A multibody factorization method for motion analysis. Int. J. Comput. Vision 29, 3, 159--180.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Cox, I. and Hingorani, S. 1996. An efficient implementation of reid's multiple hypothesis tracking algorithm and its evaluation for the purpose of visual tracking. IEEE Trans. Patt. Analy. Mach. Intell. 18, 2, 138--150.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Cox, I. J. 1993. A review of statistical data association techniques for motion correspondence. Int. J. Comput. Vision 10, 1, 53--66.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Cremers, D., Kohlberger, T., and Schnorr, C. 2002. Non-linear shape statistics in mumford-shah based segmentation. In European Conference on Computer Vision (ECCV).]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Cremers, D. and Schnorr, C. 2003. Statistical shape knowledge in variational motion segmentation. I. Srael Nent. Cap. J. 21, 77--86.]]Google ScholarGoogle Scholar
  38. Dockstader, S. and Tekalp, A. M. 2001a. Multiple camera tracking of interacting and occluded human motion. Proceedings of the IEEE 89, 1441--1455.]]Google ScholarGoogle ScholarCross RefCross Ref
  39. Dockstader, S. and Tekalp, M. 2001b. On the tracking of articulated and occluded video object motion. Real Time Image 7, 5, 415--432.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Edwards, G., Taylor, C., and Cootes, T. 1998. Interpreting face images using active appearance models. In International Conference on Face and Gesture Recognition. 300--305.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Elgammal, A., Duraiswami, R., Harwood, D., and Davis, L. 2002. Background and foreground modeling using nonparametric kernel density estimation for visual surveillance. Proceedings of IEEE 90, 7, 1151--1163.]]Google ScholarGoogle ScholarCross RefCross Ref
  42. Elgammal, A., Harwood, D., and Davis, L. 2000. Non-parametric model for background subtraction. In European Conference on Computer Vision (ECCV). 751--767.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Fieguth, P. and Terzopoulos, D. 1997. Color-based tracking of heads and other mobile objects at video frame rates. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 21--27.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Freund, Y. and Schapire, R. 1995. A decision-theoretic generalization of on-line learning and an application to boosting. Computat. Learn. Theory. 23--37.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Friedman, J., Hastie, T., and Tibshirani, R. 2000. Additive logistic regression: A statistical view of boosting. annals of statistics. Ann. Stat. 38, 2, 337--374.]]Google ScholarGoogle ScholarCross RefCross Ref
  46. Gao, X., Boult, T., Coetzee, F., and Ramesh, V. 2000. Error analysis of background adaption. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 503--510.]]Google ScholarGoogle Scholar
  47. Gavrila, D. M. 1999. The visual analysis of human movement: A survey. Comput. Vision Image Understand. 73, 1, 82--98.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Gear, C. W. 1998. Multibody grouping from motion images. Int. J. Comput. Vision 29, 2, 133--150.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Greenspan, H., Belongie, S., Goodman, R., Perona, P., Rakshit, S., and Anderson, C. 1994. Overcomplete steerable pyramid filters and rotation invariance. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 222--228.]]Google ScholarGoogle Scholar
  50. Grewe, L. and Kak, A. 1995. Interactive learning of a multi-attribute hash table classifier for fast object recognition. Comput. Vision Image Understand. 61, 3, 387--416.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Haralick, R., Shanmugam, B., and Dinstein, I. 1973. Textural features for image classification. IEEE Trans. Syst. Man Cybern. 33, 3, 610--622.]]Google ScholarGoogle ScholarCross RefCross Ref
  52. Haritaoglu, I., Harwood, D., and Davis, L. 2000. W4: real-time surveillance of people and their activities. IEEE Trans. Patt. Analy. Mach. Intell. 22, 8, 809--830.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Harris, C. and Stephens, M. 1988. A combined corner and edge detector. In 4th Alvey Vision Conference. 147--151.]]Google ScholarGoogle Scholar
  54. HarrisSrc. Harris Source Code. http://www.cs.uwa.edu.au/~pk/Research/MatlabFns/Spatial/harris.m.]]Google ScholarGoogle Scholar
  55. Hausdorff, F. 1962. Set Theory. Chelsea, New York, NY.]]Google ScholarGoogle Scholar
  56. Horn, B. and Schunk, B. 1981. Determining optical flow. Artific. Intell. 17, 185--203.]]Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Huang, T. and Russell, S. 1997. Object identification in a bayesian context. In Proceedings of International Joint Conference on Artificial Intelligence. 1276--1283.]]Google ScholarGoogle Scholar
  58. Hue, C., Cadre, J. L., and Prez, P. 2002. Sequential monte carlo methods for multiple targettracking and data fusion. IEEE Trans. Sign. Process. 50, 2, 309--325.]]Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Huttenlocher, D., Noh, J., and Rucklidge, W. 1993. Tracking nonrigid objects in complex scenes. In IEEE International Conference on Computer Vision (ICCV). 93--101.]]Google ScholarGoogle Scholar
  60. Intille, S., Davis, J., and Bobick, A. 1997. Real-time closed-world tracking. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 697--703.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Irani, M. and Anandan, P. 1998. Video indexing based on mosaic representations. IEEE Trans. Patt. Analy. Mach. Intell. 20, 6, 577--589.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Isard, M. and Blake, A. 1998. Condensation - conditional density propagation for visual tracking. Int. J. Comput. Vision 29, 1, 5--28.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Isard, M. and MacCormick, J. 2001. Bramble: A bayesian multiple-blob tracker. In IEEE International Conference on Computer Vision (ICCV). 34--41.]]Google ScholarGoogle ScholarCross RefCross Ref
  64. Jain, R. and Nagel, H. 1979. On the analysis of accumulative difference pictures from image sequences of real world scenes. IEEE Trans. Patt. Analy. Mach. Intell. 1, 2, 206--214.]]Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Javed, O., Rasheed, Z., Shafique, K., and Shah, M. 2003. Tracking across multiple cameras with disjoint views. In IEEE International Conference on Computer Vision (ICCV). 952--957.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Jensen, F. V. 2001. Bayesian Networks and Decision Graphs. Springer.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Jepson, A., Fleet, D., and ElMaraghi, T. 2003. Robust online appearance models for visual tracking. IEEE Trans. Patt. Analy. Mach. Intell. 25, 10, 1296--1311.]]Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. Joachims, T. 1999. Transductive inference for text classification using support vector machines. In International Conference on Machine Learning. 200--209.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. KalmanSrc. Kalman Filtering Source Code. http://www.ai.mit.edu/~murphyk/Software/index.html.]]Google ScholarGoogle Scholar
  70. Kanade, T., Collins, R., Lipton, A., Burt, P., and Wixson, L. 1998. Advances in cooperative multi-sensor video surveillance. Darpa IU Workshop. 3--24.]]Google ScholarGoogle Scholar
  71. Kang, J., Cohen, I., and Medioni, G. 2003. Continuous tracking within and across camera streams. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 267--272.]]Google ScholarGoogle Scholar
  72. Kang, J., Cohen, I., and Medioni, G. 2004. Object reacquisition using geometric invariant appearance model. In International Conference on Pattern Recongnition (ICPR). 759--762.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  73. Kass, M., Witkin, A., and Terzopoulos, D. 1988. Snakes: active contour models. Int. J. Comput. Vision 1, 321--332.]]Google ScholarGoogle ScholarCross RefCross Ref
  74. Kettnaker, V. and Zabih, R. 1999. Bayesian multi-camera surveillance. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 117--123.]]Google ScholarGoogle Scholar
  75. Khan, S. and Shah, M. 2003. Consistent labeling of tracked objects in multiple cameras with overlapping fields of view. IEEE Trans. Patt. Analy. Mach. Intell. 25, 10, 1355--1360.]]Google ScholarGoogle ScholarDigital LibraryDigital Library
  76. KLTSrc. KLT Source Code. http://www.ces.clemson.edu/~stb/klt/.]]Google ScholarGoogle Scholar
  77. Kockelkorn, M., Luneburg, A., and Scheffer, T. 2003. Using transduction and multiview learning to answer emails. In European Conference on Principle and Practice of Knowledge Discovery in Databases. 266--277.]]Google ScholarGoogle Scholar
  78. Kuhn, H. 1955. The hungarian method for solving the assignment problem. Naval Research Logistics Quart. 2, 83--97.]]Google ScholarGoogle ScholarCross RefCross Ref
  79. Kumar, S. and Hebert, M. 2003. Discriminative random fields: A discriminative framework for contextual interaction in classification. In IEEE International Conference on Computer Vision (ICCV). 1150--1157.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  80. Laws, K. 1980. Textured image segmentation. PhD thesis, Electrical Engineering, University of Southern California.]]Google ScholarGoogle Scholar
  81. Lee, L., Romano, R., and Stein, G. 2000. Monitoring activities from multiple video streams: Establishing a common coordinate frame. IEEE Trans. Patt. Recogn. Mach. Intell. 22, 8 (Aug.), 758--768.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  82. LevelSetSrc. Level Set Source Code. http://www.cs.utah.edu/~whitaker/vispack/.]]Google ScholarGoogle Scholar
  83. Levin, A., Viola, P., and Freund, Y. 2003. Unsupervised improvement of visual detectors using co-training. In IEEE International Conference on Computer Vision (ICCV). 626--633.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  84. Li, B., Chellappa, R., Zheng, Q., and Der, S. 2001. Model-based temporal object verification using video. IEEE Trans. Image Process. 10, 6, 897--908.]]Google ScholarGoogle ScholarDigital LibraryDigital Library
  85. Liyuan, L. and Maylor, L. 2002. Integrating intensity and texture differences for robust change detection. IEEE Trans. Image Process. 11, 2, 105--112.]]Google ScholarGoogle ScholarDigital LibraryDigital Library
  86. Lowe, D. 2004. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vision 60, 2, 91--110.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  87. Lucas, B. D. and Kanade., T. 1981. An iterative image registration technique with an application to stereo vision. In International Joint Conference on Artificial Intelligence.]]Google ScholarGoogle Scholar
  88. MacCormick, J. and Blake, A. 2000. Probabilistic exclusion and partitioned sampling for multiple object tracking. Int. J. Comput. Vision 39, 1, 57--71.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  89. MacKay, D. J. C. 1998. Introduction to Monte Carlo methods. In Learning in Graphical Models, M. I. Jordan, Ed. NATO Science Series. Kluwer Academic Press, 175--204.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  90. Mallat, S. 1989. A theory for multiresolution signal decomposition: The wavelet representation. IEEE Trans. Patt. Analy. Mach. Intell. 11, 7, 674--693.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  91. Mansouri, A. 2002. Region tracking via level set pdes without motion computation. IEEE Trans. Patt. Analy. Mach. Intell. 24, 7, 947--961.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  92. Matthies, L., Szeliski, R., and Kanade, T. 1989. Kalman filter-based algorithms for estimating depth from image sequences. Int. J. Comput. Vision 3, 3, 209--238.]]Google ScholarGoogle ScholarCross RefCross Ref
  93. MeanShiftSegmentSrc. Mean-Shift Segmentation Source Code. http://www.caip.rutgers.edu/riul/research/code.html.]]Google ScholarGoogle Scholar
  94. MeanShiftTrackSrc. Mean-Shift Tracking Source Code. http://www.intel.com/technology/computing/opencv/index.htm.]]Google ScholarGoogle Scholar
  95. Mikolajczyk, K. and Schmid, C. 2002. An affine invariant interest point detector. In European Conference on Computer Vision (ECCV). Vol. 1. 128--142.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  96. Mikolajczyk, K. and Schmid, C. 2003. A performance evaluation of local descriptors. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1615--1630.]]Google ScholarGoogle Scholar
  97. Mittal, A. and Davis, L. 2003. M2 tracker: A multiview approach to segmenting and tracking people in a cluttered scene. Int. J. Comput. Vision 51, 3, 189--203.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  98. Moeslund, T. and Granum, E. 2001. A survey of computer vision-based human motion capture. Comput. Vision Image Understand. 81, 3, 231--268.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  99. Monnet, A., Mittal, A., Paragios, N., and Ramesh, V. 2003. Background modeling and subtraction of dynamic scenes. In IEEE International Conference on Computer Vision (ICCV). 1305--1312.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  100. Moravec, H. 1979. Visual mapping by a robot rover. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI). 598--600.]]Google ScholarGoogle Scholar
  101. Mughadam, B. and Pentland, A. 1997. Probabilistic visual learning for object representation. IEEE Trans. Patt. Analy. Mach. Intell. 19, 7, 696--710.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  102. Mumford, D. and Shah, J. 1989. Optimal approximations by piecewise smooth functions and variational problems. Comm. Pure Appl. Mathemat. 42, 5, 677--685.]]Google ScholarGoogle Scholar
  103. Murty, K. 1968. An algorithm for ranking all the assignments in order of increasing cost. Operations Resear. 16, 682--686.]]Google ScholarGoogle ScholarCross RefCross Ref
  104. Oliver, N., Rosario, B., and Pentland, A. 2000. A bayesian computer vision system for modeling human interactions. IEEE Trans. Patt. Analy. Mach. Intell. 22, 8, 831--843.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  105. Oza, N. C. 2002. Online ensemble learning. PhD Thesis, University of California, Berkeley.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  106. Papageorgiou, C., Oren, M., and Poggio, T. 1998. A general framework for object detection. In IEEE International Conference on Computer Vision (ICCV). 555--562.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  107. Paragios, N. and Deriche, R. 2000. Geodesic active contours and level sets for the detection and tracking of moving objects. IEEE Trans. Patt. Analy. Mach. Intell. 22, 3, 266--280.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  108. Paragios, N. and Deriche, R. 2002. Geodesic active regions and level set methods for supervised texture segmentation. Int. J. Comput. Vision 46, 3, 223--247.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  109. Park, S. and Aggarwal, J. K. 2004. A hierarchical bayesian network for event recognition of human actions and interactions. Multimed. Syst. 10, 2, 164--179.]]Google ScholarGoogle ScholarDigital LibraryDigital Library
  110. ParticleFltSrc. Particle Filtering Source Code. http://www-sigproc.eng.cam.ac.uk/smc/software.html.]]Google ScholarGoogle Scholar
  111. Paschos, G. 2001. Perceptually uniform color spaces for color texture analysis: an empirical evaluation. IEEE Trans. Image Process. 10, 932--937.]]Google ScholarGoogle ScholarDigital LibraryDigital Library
  112. Rabiner, L. R. 1989. A tutorial on hidden markov models and selected applications in speech recognition. Proceedings of the IEEE 77, 2, 257--286.]]Google ScholarGoogle ScholarCross RefCross Ref
  113. Rangarajan, K. and Shah, M. 1991. Establishing motion correspondence. Conference Vision Graphies Image Process 54, 1, 56--73.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  114. Rasmussen, C. and Hager, G. 2001. Probabilistic data association methods for tracking complex visual objects. IEEE Trans. Patt. Analy. Mach. Intell. 23, 6, 560--576.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  115. Collins, R. and Liu, Y.. 2003. On-line selection of discriminative tracking features. In IEEE International Conference on Computer Vision (ICCV). 346--352.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  116. Reid, D. B. 1979. An algorithm for tracking multiple targets. IEEE Trans. Autom. Control 24, 6, 843--854.]]Google ScholarGoogle ScholarCross RefCross Ref
  117. Rittscher, J., Kato, J., Joga, S., and Blake, A. 2000. A probabilistic background model for tracking. In European Conference on Computer Vision (ECCV). Vol. 2. 336--350.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  118. Ronfard, R. 1994. Region based strategies for active contour models. Int. J. Comput. Vision 13, 2, 229--251.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  119. Rosales, R. and Sclaroff, S. 1999. 3d trajectory recovery for tracking multiple objects and trajectory guided recognition of actions. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 117--123.]]Google ScholarGoogle Scholar
  120. Rowe, S. and Blake, A. 1996. Statistical mosaics for tracking. Israel Verj. Cap. J. 14, 549--564.]]Google ScholarGoogle Scholar
  121. Rowley, H., Baluja, S., and Kanade, T. 1998. Neural network-based face detection. IEEE Trans. Patt. Analy. Mach. Intell. 20, 1, 23--38.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  122. Salari, V. and Sethi, I. K. 1990. Feature point correspondence in the presence of occlusion. IEEE Trans. Patt. Analy. Mach. Intell. 12, 1, 87--91.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  123. Sato, K. and Aggarwal, J. 2004. Temporal spatio-velocity transform and its application to tracking and interaction. Comput. Vision Image Understand. 96, 2, 100--128.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  124. Schunk, B. 1986. The image flow constraint equation. Comput. Visison Graphics Image Process. 35, 20--46.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  125. Schweitzer, H., Bell, J. W., and Wu, F. 2002. Very fast template matching. In European Conference on Computer Vision (ECCV). 358--372.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  126. Serby, D., Koller-Meier, S., and Gool, L. V. 2004. Probabilistic object tracking using multiple features. In IEEE International Conference of Pattern Recognition (ICPR). 184--187.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  127. Sethi, I. and Jain, R. 1987. Finding trajectories of feature points in a monocular image sequence. IEEE Trans. Patt. Analy. Mach. Intell. 9, 1, 56--73.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  128. Sethian, J. 1999. Level Set Methods: Evolving Interfaces in Geometry, Fluid Mechanics Computer Vision and Material Sciences. Cambridge University Press.]]Google ScholarGoogle Scholar
  129. Shafique, K. and Shah, M. 2003. A non-iterative greedy algorithm for multi-frame point correspondence. In IEEE International Conference on Computer Vision (ICCV). 110--115.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  130. Shi, J. and Malik, J. 2000. Normalized cuts and image segmentation. IEEE Trans. Patt. Analy. Mach. Intell. 22, 8, 888--905.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  131. Shi, J. and Tomasi, C. 1994. Good features to track. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 593--600.]]Google ScholarGoogle Scholar
  132. SIFTSrc. SIFT Source Code. http://www.cs.ucla.edu/~vedaldi/code/siftpp/assets/siftpp/versions/.]]Google ScholarGoogle Scholar
  133. Song, K. Y., Kittler, J., and Petrou, M. 1996. Defect detection in random color textures. Israel Verj. Cap. J. 14, 9, 667--683.]]Google ScholarGoogle Scholar
  134. Stauffer, C. and Grimson, W. 2000. Learning patterns of activity using real time tracking. IEEE Trans. Patt. Analy. Mach. Intell. 22, 8, 747--767.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  135. Stenger, B., Ramesh, V., Paragios, N., Coetzee, F., and Buhmann, J. 2001. Topology free hidden markov models: Application to background modeling. In IEEE International Conference on Computer Vision (ICCV). 294--301.]]Google ScholarGoogle Scholar
  136. Stern, H. and Efros, B. 2002. Adaptive color space switching for face tracking in multi-colored lighting environments. In IEEE International Conference on Automatic Face and Gesture Recognition. 0249.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  137. Streit, R. L. and Luginbuhl, T. E. 1994. Maximum likelihood method for probabilistic multi-hypothesis tracking. In Proceedings of the International Society for Optical Engineering (SPIE.) vol. 2235. 394--405.]]Google ScholarGoogle Scholar
  138. Szeliski, R. and Coughlan, J. 1997. Spline-based image registration. Int. J. Comput. Vision 16, 1-3, 185--203.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  139. Tanizaki, H. 1987. Non-gaussian state-space modeling of nonstationary time series. J. Amer. Statist. Assoc. 82, 1032--1063.]]Google ScholarGoogle Scholar
  140. Tao, H., Sawhney, H., and Kumar, R. 2002. Object tracking with bayesian estimation of dynamic layer representations. IEEE Trans. Patt. Analy. Mach. Intell. 24, 1, 75--89.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  141. Terzopoulos, D. and Szeliski, R. 1992. Tracking with kalman snakes. In Active Vision, A. Blake and A. Yuille, Eds. MIT Press.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  142. Tieu, K. and Viola, P. 2004. Boosting image retrival. Int. J. Comput. Vision 56, 1, 17--36.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  143. Tipping, M. E. 2001. Sparse bayesian learning and the relevance vector machine. J. Mach. Learn. Resear. 1, 1, 211--244.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  144. Torralba, A. 2003. Contextual priming for object detection. Int. J. Comput. Vision 53, 2, 169--191.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  145. Torresani, L. and Bregler, C. 2002. Space-time tracking. In European Conference on Computer Vision (ECCV). 801--812.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  146. Toyama, K., J. Krumm, B. B., and Meyers, B. 1999. Wallflower: Principles and practices of background maintenance. In IEEE International Conference on Computer Vision (ICCV). 255--261.]]Google ScholarGoogle Scholar
  147. Vapnik, V. 1998. Statistical Learning Theory. John Wiley NY.]]Google ScholarGoogle Scholar
  148. Vaswani, N., RoyChowdhury, A., and Chellappa, R. 2003. Activity recognition using the dynamics of the configuration ofinteracting objects. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 633--640.]]Google ScholarGoogle Scholar
  149. Veenman, C., Reinders, M., and Backer, E. 2001. Resolving motion correspondence for densely moving points. IEEE Trans. Patt. Analy. Mach. Intell. 23, 1, 54--72.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  150. Vidal, R. and Ma, Y. 2004. A unified algebraic approach to 2-d and 3-d motion segmentation. In European Conference on Computer Vision (ECCV). 1--15.]]Google ScholarGoogle Scholar
  151. Viola, P., Jones, M., and Snow, D. 2003. Detecting pedestrians using patterns of motion and appearance. In IEEE International Conference on Computer Vision (ICCV). 734--741.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  152. Viterbi, A. J. 1967. Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Trans. Inform. Theory 13, 260--269.]]Google ScholarGoogle ScholarDigital LibraryDigital Library
  153. Wang, J. and Adelson, E. 1994. Representing moving images with layers. IEEE Image Process. 3, 5, 625--638.]]Google ScholarGoogle ScholarDigital LibraryDigital Library
  154. Wren, C., Azarbayejani, A., and Pentland, A. 1997. Pfinder: Real-time tracking of the human body. IEEE Trans. Patt. Analy. Mach. Intell. 19, 7, 780--785.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  155. Wu, Z. and Leahy, R. 1993. An optimal graph theoretic approach to data clustering: Theory and its applications to image segmentation. IEEE Trans. Patt. Analy. Mach. Intell. 11, 1101--1113.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  156. Xu, N. and Ahuja, N. 2002. Object contour tracking using graph cuts based active contours. In IEEE International Conference on Image Processing (ICIP). 277--280.]]Google ScholarGoogle Scholar
  157. Yilmaz, A., Li, X., and Shah, M. 2004. Contour based object tracking with occlusion handling in video acquired using mobile cameras. IEEE Trans. Patt. Analy. Mach. Intell. 26, 11, 1531--1536.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  158. Yilmaz, A., Shafique, K., and Shah, M. 2003. Target tracking in airborne forward looking imagery. J. Image Vision Comput. 21, 7, 623--635.]]Google ScholarGoogle ScholarCross RefCross Ref
  159. Yu, S. X. and Shi, J. 2004. Segmentation given partial grouping constraints. IEEE Trans. Patt. Analy. Mach. Intell. 26, 2, 173--183.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  160. Zhong, J. and Sclaroff, S. 2003. Segmenting foreground objects from a dynamic textured background via a robust kalman filter. In IEEE International Conference on Computer Vision (ICCV). 44--50.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  161. Zhou, S., Chellapa, R., and Moghadam, B. 2003. Adaptive visual tracking and recognition using particle filters. In Proceedings IEEE International Conference on Multimedia and Expo (ICME). 349--352.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  162. Zhu, S. and Yuille, A. 1996. Region competition: unifying snakes, region growing, and bayes/mdl for multiband image segmentation. IEEE Trans. Patt. Analy. Mach. Intell. 18, 9, 884--900.]] Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Object tracking: A survey

      Recommendations

      Reviews

      Sebastien Lefevre

      Object tracking is one of the major steps toward understanding video content. Indeed, its goal is to give object positions in the successive frames of a video sequence. This spatio-temporal information can then be used to analyze the actions or behavior of the related objects. Object tracking is a mandatory step in many video-based applications, such as surveillance, traffic monitoring, sport event analysis, active vision and robotics, and medical image sequence analysis. Thus, there has been a lot of research in this field over the last 20 years, and it is quite difficult to determine the method to be used when a particular video application is considered. The survey proposed by Yilmaz, Javed, and Shah intends to point out the key aspects and to describe the major (context-free) approaches for object tracking in color video sequences. An entire book could be devoted to this subject. In this 46-page paper, the authors have decided to present most of the main elements in object tracking rather than trying to give an exhaustive view of some object tracking-related problems. The paper is aimed at the image processing engineer or scientist. This comprehensive and well-illustrated survey contains several parts, each dedicated to one of the main elements involved in object tracking. The paper answers the following questions. How is the object to be tracked modeled__?__ How are the object model and the image data associated__?__ How is the object extracted from the sequence__?__ How is the tracking process performed__?__ The authors describe some applications where object tracking is necessary and explain why this can be a particularly difficult task. They then describe the different shape and appearance models that can be associated to an object and present the image features to be used in object tracking. Next, they tackle the problem of object detection, which should often be solved before dealing with tracking itself. The core part of this paper, Section 5 on object tracking, contains a presentation of the main existing approaches gathered in three classes: point, kernel, and silhouette tracking. The paper ends with tackling some related issues, such as occlusion (when an object is temporarily hidden by another one) and multiple camera tracking (particularly useful for video surveillance in large and complex environments). Yilmaz, Javed, and Shah assert, finally, that a generic tracking system can be reached only if it involves contextual information in some way. The authors'; attempt to give an overview of object tracking is nearly successful. Trying to deal with all aspects of object tracking in a single paper (even of 46 pages) leads to some omissions. In particular, the reader will not find in this survey descriptions of the different motion models, computational complexities, and parameter settings. Moreover, the case of very small objects (a few pixels) is not considered. To cover all of this material, though, the authors would have had to consider only one aspect of tracking, and the reader would not receive a global presentation of the object tracking problem.

      Access critical reviews of Computing literature here

      Become a reviewer for Computing Reviews.

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Computing Surveys
        ACM Computing Surveys  Volume 38, Issue 4
        2006
        153 pages
        ISSN:0360-0300
        EISSN:1557-7341
        DOI:10.1145/1177352
        Issue’s Table of Contents

        Copyright © 2006 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 25 December 2006
        Published in csur Volume 38, Issue 4

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader