skip to main content
research-article

Multi-View Low-Rank Analysis with Applications to Outlier Detection

Published:23 March 2018Publication History
Skip Abstract Section

Abstract

Detecting outliers or anomalies is a fundamental problem in various machine learning and data mining applications. Conventional outlier detection algorithms are mainly designed for single-view data. Nowadays, data can be easily collected from multiple views, and many learning tasks such as clustering and classification have benefited from multi-view data. However, outlier detection from multi-view data is still a very challenging problem, as the data in multiple views usually have more complicated distributions and exhibit inconsistent behaviors. To address this problem, we propose a multi-view low-rank analysis (MLRA) framework for outlier detection in this article. MLRA pursuits outliers from a new perspective, robust data representation. It contains two major components. First, the cross-view low-rank coding is performed to reveal the intrinsic structures of data. In particular, we formulate a regularized rank-minimization problem, which is solved by an efficient optimization algorithm. Second, the outliers are identified through an outlier score estimation procedure. Different from the existing multi-view outlier detection methods, MLRA is able to detect two different types of outliers from multiple views simultaneously. To this end, we design a criterion to estimate the outlier scores by analyzing the obtained representation coefficients. Moreover, we extend MLRA to tackle the multi-view group outlier detection problem. Extensive evaluations on seven UCI datasets, the MovieLens, the USPS-MNIST, and the WebKB datasets demon strate that our approach outperforms several state-of-the-art outlier detection methods.

References

  1. Alejandro Marcos Alvarez, Makoto Yamada, Akisato Kimura, and Tomoharu Iwata. 2013. Clustering-based anomaly detection in multi-view data. In CIKM. 1545--1548. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Fabrizio Angiulli and Fabio Fassetti. 2009. Outlier detection using inductive logic programming. In ICDM. 693--698. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Ira Assent, Xuan Hong Dang, Barbora Micenková, and Raymond T. Ng. 2013. Outlier detection with space transformation and spectral analysis. In SDM. 225--233.Google ScholarGoogle Scholar
  4. F. R. Bach. 2008. Consistency of trace norm minimization. Journal of Machine Learning Research 9 (2008), 1019--1048. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. K. Bache and M. Lichman. 2013. UCI Machine Learning Repository. (2013). Retrieved from http://archive.ics.uci.edu/ml.Google ScholarGoogle Scholar
  6. Avrim Blum and Tom M. Mitchell. 1998. Combining labeled and unlabeled data with co-training. In COLT. ACM, 92--100. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J. F. Cai, E. J. Candes, and Z. W. Shen. 2010. A singular value thresholding algorithm for matrix completion. SIAM Journal on Optimization 20, 4 (2010), 1956--1982.Google ScholarGoogle ScholarCross RefCross Ref
  8. E. J. Candès, X. D. Li, Y. Ma, and J. Wright. 2011. Robust principal component analysis?Journal of ACM 58, 3 (2011), 11.Google ScholarGoogle Scholar
  9. Jianhui Chen, Jiayu Zhou, and Jieping Ye. 2011. Integrating low-rank and group-sparse structures for robust multi-task learning. In KDD. 42--50. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Bin Cheng, Guangcan Liu, Jingdong Wang, ZhongYang Huang, and Shuicheng Yan. 2011. Multi-task low-rank affinity pursuit for image segmentation. In ICCV. 2439--2446. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Santanu Das, Bryan L. Matthews, Ashok N. Srivastava, and Nikunj C. Oza. 2010. Multiple kernel learning for heterogeneous anomaly detection: Algorithm and aviation safety case study. In KDD. 47--56. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Bo Du and Liangpei Zhang. 2014. A discriminative metric learning based anomaly detection method. IEEE Transactions on Geoscience and Remote Sensing 52, 11 (2014), 6844--6857.Google ScholarGoogle ScholarCross RefCross Ref
  13. Andrew F. Emmott, Shubhomoy Das, Thomas Dietterich, Alan Fern, and Weng-Keen Wong. 2013. Systematic construction of anomaly detection benchmarks from real data. In KDD Workshop on Outlier Detection and Description. 16--21. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Jing Gao, Wei Fan, Deepak S. Turaga, Srinivasan Parthasarathy, and Jiawei Han. 2011. A spectral framework for detecting inconsistency across multi-source object relationships. In ICDM. 1050--1055. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Yuhong Guo. 2013. Convex subspace representation learning from multi-view data. In AAAI. Vol. 1, 2. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Ko-Jen Hsiao, Kevin S. Xu, Jeff Calder, and Alfred O. Hero III. 2012. Multi-criteria anomaly detection using pareto depth analysis. In NIPS. 854--862. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Han Hu, Zhouchen Lin, Jianjiang Feng, and Jie Zhou. 2014. Smooth representation clustering. In CVPR. 3834--3841. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Jonathan Hull. 1994. A database for handwritten text recognition research. IEEE Transactions on Pattern Analysis and Machine 16, 5 (1994), 550--554. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Vandana Pursnani Janeja and Revathi Palanisamy. 2013. Multi-domain anomaly detection in spatial datasets. Knowledge and Information Systems 36, 3 (2013), 749--788.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. R. H. Keshavan, A. Montanari, and S. Oh. 2009. Matrix completion from noisy entries. In NIPS. 952--960. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Yann LeCun, Leon Bottou, Yoshua Bengio, and Patrick Haaffner. 1998. Gradient-based learning applied to document recognition. Proceedings of the IEEE 86, 11 (1998), 2278--2324.Google ScholarGoogle ScholarCross RefCross Ref
  22. Yuh-Jye Lee, Yi-Ren Yeh, and Yu-Chiang Frank Wang. 2013. Anomaly detection via online oversampling principal component analysis. IEEE Transactions on Knowledge and Data Engineering 25, 7 (2013), 1460--1470. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Liangyue Li, Sheng Li, and Yun Fu. 2014. Learning low-rank and discriminative dictionary for image classification. Image and Vision Computing 32, 10 (2014), 814--823.Google ScholarGoogle ScholarCross RefCross Ref
  24. Sheng Li and Yun Fu. 2013. Low-rank coding with b-matching constraint for semi-supervised classification. In IJCAI. 1472--1478. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Sheng Li and Yun Fu. 2014. Robust subspace discovery through supervised low-rank constraints. In SDM. 163--171.Google ScholarGoogle Scholar
  26. Sheng Li and Yun Fu. 2015. Multi-view low-rank analysis for outlier detection. In SDM.Google ScholarGoogle Scholar
  27. Sheng Li and Yun Fu. 2017. Robust Representation for Data Analytics. Springer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Sheng Li, Ming Shao, and Yun Fu. 2014. Locality linear fitting one-class SVM with low-rank constraints for outlier detection. In IJCNN. 676--683.Google ScholarGoogle Scholar
  29. Shao-Yuan Li, Yuan Jiang, and Zhi-Hua Zhou. 2014. Partial multi-view clustering. In AAAI. Citeseer, 1968--1974. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Z. C. Lin, M. M. Chen, L. Q. Wu, and Y. Ma. 2009. The Augmented Lagrange Multiplier Method for Exact Recovery of Corrupted Low-Rank Matrices. Technical Report, University of Illinois at Urbana-Champaign.Google ScholarGoogle Scholar
  31. Alexander Liu and Dung N. Lam. 2012. Using consensus clustering for multi-view anomaly detection. In IEEE Symposium on Security and Privacy Workshops. 117--124. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Bo Liu, Yanshan Xiao, Longbing Cao, Zhifeng Hao, and Feiqi Deng. 2013. SVDD-based outlier detection on uncertain data. Knowledge and Information Systems 34, 3 (2013), 597--618.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Bo Liu, Yanshan Xiao, Philip S. Yu, Zhifeng Hao, and Longbing Cao. 2014. An efficient approach for outlier detection with imperfect data labels. IEEE Transactions on Knowledge and Data Engineering 26, 7 (2014), 1602--1616.Google ScholarGoogle ScholarCross RefCross Ref
  34. Fei Tony Liu, Kai Ming Ting, and Zhi-Hua Zhou. 2012. Isolation-based anomaly detection. TKDD 6, 1 (2012), 3. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Guangcan Liu, Zhouchen Lin, Shuicheng Yan, Ju Sun, Yong Yu, and Yi Ma. 2013. Robust recovery of subspace structures by low-rank representation. IEEE Transactions on Pattern Analysis and Machine 35, 1 (2013), 171--184. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Guangcan Liu, Qingshan Liu, and Ping Li. 2017. Blessing of dimensionality: Recovering mixture data via dictionary pursuit. IEEE Transactions on Pattern Analysis and Machine Intelligence 39, 1 (2017), 47--60. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Guangcan Liu, Huan Xu, Jinhui Tang, Qingshan Liu, and Shuicheng Yan. 2016. A deterministic analysis for LRR. IEEE Transactions on Pattern Analysis and Machine Intelligence 38, 3 (2016), 417--430. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Guangcan Liu, Huan Xu, and Shuicheng Yan. 2012. Exact subspace segmentation and outlier detection by low-rank representation. In AISTATS. 703--711.Google ScholarGoogle Scholar
  39. G. C. Liu, Z. C. Lin, and Y. Yu. 2010. Robust subspace segmentation by low-rank representation. In ICML. 663--670. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Roland Memisevic. 2012. On multi-view feature learning. In ICML. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Krikamol Muandet and Bernhard Schölkopf. 2013. One-class support measure machines for group anomaly detection. In UAI. DOI:https://dslpitt.org/uai/displayArticleDetails.jsp?mmnu=1&smnu===2&article_id===2406&proceeding_id===29. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Emmanuel Müller, Ira Assent, Patricia Iglesias Sanchez, Yvonne Mülle, and Klemens Böhm. 2012. Outlier ranking via subspace analysis in multiple views of the data. In ICDM. 529--538. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Colin O’Reilly, Alexander Gluhak, and Muhammad Ali Imran. 2015. Adaptive anomaly detection with kernel eigenspace splitting and merging. IEEE Transactions on Knowledge and Data Engineering 27, 1 (2015), 3--16.Google ScholarGoogle ScholarCross RefCross Ref
  44. Yaling Pei, Osmar R. Zaïane, and Yong Gao. 2006. An efficient reference-based approach to outlier detection in large datasets. In ICDM. 478--487. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Bryan Perozzi, Leman Akoglu, Patricia Iglesias Sanchez, and Emmanuel Müller. 2014. Focused clustering and outlier detection in large attributed graphs. In KDD. 1346--1355. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Ninh Pham and Rasmus Pagh. 2012. A near-linear time approximation algorithm for angle-based outlier detection in high-dimensional data. In KDD. 877--885. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Erich Schubert, Arthur Zimek, and Hans-Peter Kriegel. 2014. Generalized outlier detection with flexible kernel density estimates. In SDM. 542--550.Google ScholarGoogle Scholar
  48. Ming Shao, Dmitry Kit, and Yun Fu. 2014. Generalized transfer subspace learning through low-rank constraint. International Journal of Computer Vision 109, 1--2 (2014), 74--93. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Vikas Sindhwani and David S. Rosenberg. 2008. An RKHS for multi-view learning and manifold co-regularization. In ICML. 976--983. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Karthik Sridharan and Sham M. Kakade. 2008. An information theoretic framework for multi-view learning. In COLT. 403--414.Google ScholarGoogle Scholar
  51. Hanghang Tong and Ching-Yung Lin. 2011. Non-negative residual matrix factorization with application to graph anomaly detection. In SDM. 143--153.Google ScholarGoogle Scholar
  52. Grigorios Tzortzis and Aristidis Likas. 2012. Kernel-based weighted multi-view clustering. In ICDM. 675--684. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Martha White, Yaoliang Yu, Xinhua Zhang, and Dale Schuurmans. 2012. Convex multi-view subspace learning. In NIPS. 1682--1690. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Shu Wu and Shengrui Wang. 2013. Information-theoretic outlier detection for large-scale categorical data. IEEE Transactions on Knowledge and Data Engineering 25, 3 (2013), 589--602. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Liang Xiong, Xi Chen, and Jeff Schneider. 2011. Direct robust matrix factorization for anomaly detection. In ICDM. IEEE, 844--853. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Liang Xiong, Barnabás Póczos, and Jeff G. Schneider. 2011. Group anomaly detection using flexible genre models. In NIPS. 1071--1079. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Chang Xu, Dacheng Tao, and Chao Xu. 2013. A survey on multi-view learning. CoRR abs/1304.5634 (2013).Google ScholarGoogle Scholar
  58. Huan Xu, Constantine Caramanis, and Sujay Sanghavi. 2010. Robust PCA via outlier pursuit. In NIPS. 2496--2504. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Qi Rose Yu, Xinran He, and Yan Liu. 2014. GLAD: Group anomaly detection in social media analysis. In KDD. 372--381. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Xiaowei Zhou, Can Yang, and Weichuan Yu. 2012. Automatic mitral leaflet tracking in echocardiography by outlier detection in the low-rank representation. In CVPR. 972--979. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Arthur Zimek, Matthew Gaudet, Ricardo J. G. B. Campello, and Jörg Sander. 2013. Subsampling for efficient and effective unsupervised outlier detection ensembles. In KDD. 428--436. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Multi-View Low-Rank Analysis with Applications to Outlier Detection

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Knowledge Discovery from Data
        ACM Transactions on Knowledge Discovery from Data  Volume 12, Issue 3
        June 2018
        360 pages
        ISSN:1556-4681
        EISSN:1556-472X
        DOI:10.1145/3178546
        Issue’s Table of Contents

        Copyright © 2018 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 23 March 2018
        • Accepted: 1 November 2017
        • Revised: 1 April 2017
        • Received: 1 September 2016
        Published in tkdd Volume 12, Issue 3

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader