Abstract
Detecting outliers or anomalies is a fundamental problem in various machine learning and data mining applications. Conventional outlier detection algorithms are mainly designed for single-view data. Nowadays, data can be easily collected from multiple views, and many learning tasks such as clustering and classification have benefited from multi-view data. However, outlier detection from multi-view data is still a very challenging problem, as the data in multiple views usually have more complicated distributions and exhibit inconsistent behaviors. To address this problem, we propose a multi-view low-rank analysis (MLRA) framework for outlier detection in this article. MLRA pursuits outliers from a new perspective, robust data representation. It contains two major components. First, the cross-view low-rank coding is performed to reveal the intrinsic structures of data. In particular, we formulate a regularized rank-minimization problem, which is solved by an efficient optimization algorithm. Second, the outliers are identified through an outlier score estimation procedure. Different from the existing multi-view outlier detection methods, MLRA is able to detect two different types of outliers from multiple views simultaneously. To this end, we design a criterion to estimate the outlier scores by analyzing the obtained representation coefficients. Moreover, we extend MLRA to tackle the multi-view group outlier detection problem. Extensive evaluations on seven UCI datasets, the MovieLens, the USPS-MNIST, and the WebKB datasets demon strate that our approach outperforms several state-of-the-art outlier detection methods.
- Alejandro Marcos Alvarez, Makoto Yamada, Akisato Kimura, and Tomoharu Iwata. 2013. Clustering-based anomaly detection in multi-view data. In CIKM. 1545--1548. Google ScholarDigital Library
- Fabrizio Angiulli and Fabio Fassetti. 2009. Outlier detection using inductive logic programming. In ICDM. 693--698. Google ScholarDigital Library
- Ira Assent, Xuan Hong Dang, Barbora Micenková, and Raymond T. Ng. 2013. Outlier detection with space transformation and spectral analysis. In SDM. 225--233.Google Scholar
- F. R. Bach. 2008. Consistency of trace norm minimization. Journal of Machine Learning Research 9 (2008), 1019--1048. Google ScholarDigital Library
- K. Bache and M. Lichman. 2013. UCI Machine Learning Repository. (2013). Retrieved from http://archive.ics.uci.edu/ml.Google Scholar
- Avrim Blum and Tom M. Mitchell. 1998. Combining labeled and unlabeled data with co-training. In COLT. ACM, 92--100. Google ScholarDigital Library
- J. F. Cai, E. J. Candes, and Z. W. Shen. 2010. A singular value thresholding algorithm for matrix completion. SIAM Journal on Optimization 20, 4 (2010), 1956--1982.Google ScholarCross Ref
- E. J. Candès, X. D. Li, Y. Ma, and J. Wright. 2011. Robust principal component analysis?Journal of ACM 58, 3 (2011), 11.Google Scholar
- Jianhui Chen, Jiayu Zhou, and Jieping Ye. 2011. Integrating low-rank and group-sparse structures for robust multi-task learning. In KDD. 42--50. Google ScholarDigital Library
- Bin Cheng, Guangcan Liu, Jingdong Wang, ZhongYang Huang, and Shuicheng Yan. 2011. Multi-task low-rank affinity pursuit for image segmentation. In ICCV. 2439--2446. Google ScholarDigital Library
- Santanu Das, Bryan L. Matthews, Ashok N. Srivastava, and Nikunj C. Oza. 2010. Multiple kernel learning for heterogeneous anomaly detection: Algorithm and aviation safety case study. In KDD. 47--56. Google ScholarDigital Library
- Bo Du and Liangpei Zhang. 2014. A discriminative metric learning based anomaly detection method. IEEE Transactions on Geoscience and Remote Sensing 52, 11 (2014), 6844--6857.Google ScholarCross Ref
- Andrew F. Emmott, Shubhomoy Das, Thomas Dietterich, Alan Fern, and Weng-Keen Wong. 2013. Systematic construction of anomaly detection benchmarks from real data. In KDD Workshop on Outlier Detection and Description. 16--21. Google ScholarDigital Library
- Jing Gao, Wei Fan, Deepak S. Turaga, Srinivasan Parthasarathy, and Jiawei Han. 2011. A spectral framework for detecting inconsistency across multi-source object relationships. In ICDM. 1050--1055. Google ScholarDigital Library
- Yuhong Guo. 2013. Convex subspace representation learning from multi-view data. In AAAI. Vol. 1, 2. Google ScholarDigital Library
- Ko-Jen Hsiao, Kevin S. Xu, Jeff Calder, and Alfred O. Hero III. 2012. Multi-criteria anomaly detection using pareto depth analysis. In NIPS. 854--862. Google ScholarDigital Library
- Han Hu, Zhouchen Lin, Jianjiang Feng, and Jie Zhou. 2014. Smooth representation clustering. In CVPR. 3834--3841. Google ScholarDigital Library
- Jonathan Hull. 1994. A database for handwritten text recognition research. IEEE Transactions on Pattern Analysis and Machine 16, 5 (1994), 550--554. Google ScholarDigital Library
- Vandana Pursnani Janeja and Revathi Palanisamy. 2013. Multi-domain anomaly detection in spatial datasets. Knowledge and Information Systems 36, 3 (2013), 749--788.Google ScholarDigital Library
- R. H. Keshavan, A. Montanari, and S. Oh. 2009. Matrix completion from noisy entries. In NIPS. 952--960. Google ScholarDigital Library
- Yann LeCun, Leon Bottou, Yoshua Bengio, and Patrick Haaffner. 1998. Gradient-based learning applied to document recognition. Proceedings of the IEEE 86, 11 (1998), 2278--2324.Google ScholarCross Ref
- Yuh-Jye Lee, Yi-Ren Yeh, and Yu-Chiang Frank Wang. 2013. Anomaly detection via online oversampling principal component analysis. IEEE Transactions on Knowledge and Data Engineering 25, 7 (2013), 1460--1470. Google ScholarDigital Library
- Liangyue Li, Sheng Li, and Yun Fu. 2014. Learning low-rank and discriminative dictionary for image classification. Image and Vision Computing 32, 10 (2014), 814--823.Google ScholarCross Ref
- Sheng Li and Yun Fu. 2013. Low-rank coding with b-matching constraint for semi-supervised classification. In IJCAI. 1472--1478. Google ScholarDigital Library
- Sheng Li and Yun Fu. 2014. Robust subspace discovery through supervised low-rank constraints. In SDM. 163--171.Google Scholar
- Sheng Li and Yun Fu. 2015. Multi-view low-rank analysis for outlier detection. In SDM.Google Scholar
- Sheng Li and Yun Fu. 2017. Robust Representation for Data Analytics. Springer. Google ScholarDigital Library
- Sheng Li, Ming Shao, and Yun Fu. 2014. Locality linear fitting one-class SVM with low-rank constraints for outlier detection. In IJCNN. 676--683.Google Scholar
- Shao-Yuan Li, Yuan Jiang, and Zhi-Hua Zhou. 2014. Partial multi-view clustering. In AAAI. Citeseer, 1968--1974. Google ScholarDigital Library
- Z. C. Lin, M. M. Chen, L. Q. Wu, and Y. Ma. 2009. The Augmented Lagrange Multiplier Method for Exact Recovery of Corrupted Low-Rank Matrices. Technical Report, University of Illinois at Urbana-Champaign.Google Scholar
- Alexander Liu and Dung N. Lam. 2012. Using consensus clustering for multi-view anomaly detection. In IEEE Symposium on Security and Privacy Workshops. 117--124. Google ScholarDigital Library
- Bo Liu, Yanshan Xiao, Longbing Cao, Zhifeng Hao, and Feiqi Deng. 2013. SVDD-based outlier detection on uncertain data. Knowledge and Information Systems 34, 3 (2013), 597--618.Google ScholarDigital Library
- Bo Liu, Yanshan Xiao, Philip S. Yu, Zhifeng Hao, and Longbing Cao. 2014. An efficient approach for outlier detection with imperfect data labels. IEEE Transactions on Knowledge and Data Engineering 26, 7 (2014), 1602--1616.Google ScholarCross Ref
- Fei Tony Liu, Kai Ming Ting, and Zhi-Hua Zhou. 2012. Isolation-based anomaly detection. TKDD 6, 1 (2012), 3. Google ScholarDigital Library
- Guangcan Liu, Zhouchen Lin, Shuicheng Yan, Ju Sun, Yong Yu, and Yi Ma. 2013. Robust recovery of subspace structures by low-rank representation. IEEE Transactions on Pattern Analysis and Machine 35, 1 (2013), 171--184. Google ScholarDigital Library
- Guangcan Liu, Qingshan Liu, and Ping Li. 2017. Blessing of dimensionality: Recovering mixture data via dictionary pursuit. IEEE Transactions on Pattern Analysis and Machine Intelligence 39, 1 (2017), 47--60. Google ScholarDigital Library
- Guangcan Liu, Huan Xu, Jinhui Tang, Qingshan Liu, and Shuicheng Yan. 2016. A deterministic analysis for LRR. IEEE Transactions on Pattern Analysis and Machine Intelligence 38, 3 (2016), 417--430. Google ScholarDigital Library
- Guangcan Liu, Huan Xu, and Shuicheng Yan. 2012. Exact subspace segmentation and outlier detection by low-rank representation. In AISTATS. 703--711.Google Scholar
- G. C. Liu, Z. C. Lin, and Y. Yu. 2010. Robust subspace segmentation by low-rank representation. In ICML. 663--670. Google ScholarDigital Library
- Roland Memisevic. 2012. On multi-view feature learning. In ICML. Google ScholarDigital Library
- Krikamol Muandet and Bernhard Schölkopf. 2013. One-class support measure machines for group anomaly detection. In UAI. DOI:https://dslpitt.org/uai/displayArticleDetails.jsp?mmnu=1&smnu===2&article_id===2406&proceeding_id===29. Google ScholarDigital Library
- Emmanuel Müller, Ira Assent, Patricia Iglesias Sanchez, Yvonne Mülle, and Klemens Böhm. 2012. Outlier ranking via subspace analysis in multiple views of the data. In ICDM. 529--538. Google ScholarDigital Library
- Colin O’Reilly, Alexander Gluhak, and Muhammad Ali Imran. 2015. Adaptive anomaly detection with kernel eigenspace splitting and merging. IEEE Transactions on Knowledge and Data Engineering 27, 1 (2015), 3--16.Google ScholarCross Ref
- Yaling Pei, Osmar R. Zaïane, and Yong Gao. 2006. An efficient reference-based approach to outlier detection in large datasets. In ICDM. 478--487. Google ScholarDigital Library
- Bryan Perozzi, Leman Akoglu, Patricia Iglesias Sanchez, and Emmanuel Müller. 2014. Focused clustering and outlier detection in large attributed graphs. In KDD. 1346--1355. Google ScholarDigital Library
- Ninh Pham and Rasmus Pagh. 2012. A near-linear time approximation algorithm for angle-based outlier detection in high-dimensional data. In KDD. 877--885. Google ScholarDigital Library
- Erich Schubert, Arthur Zimek, and Hans-Peter Kriegel. 2014. Generalized outlier detection with flexible kernel density estimates. In SDM. 542--550.Google Scholar
- Ming Shao, Dmitry Kit, and Yun Fu. 2014. Generalized transfer subspace learning through low-rank constraint. International Journal of Computer Vision 109, 1--2 (2014), 74--93. Google ScholarDigital Library
- Vikas Sindhwani and David S. Rosenberg. 2008. An RKHS for multi-view learning and manifold co-regularization. In ICML. 976--983. Google ScholarDigital Library
- Karthik Sridharan and Sham M. Kakade. 2008. An information theoretic framework for multi-view learning. In COLT. 403--414.Google Scholar
- Hanghang Tong and Ching-Yung Lin. 2011. Non-negative residual matrix factorization with application to graph anomaly detection. In SDM. 143--153.Google Scholar
- Grigorios Tzortzis and Aristidis Likas. 2012. Kernel-based weighted multi-view clustering. In ICDM. 675--684. Google ScholarDigital Library
- Martha White, Yaoliang Yu, Xinhua Zhang, and Dale Schuurmans. 2012. Convex multi-view subspace learning. In NIPS. 1682--1690. Google ScholarDigital Library
- Shu Wu and Shengrui Wang. 2013. Information-theoretic outlier detection for large-scale categorical data. IEEE Transactions on Knowledge and Data Engineering 25, 3 (2013), 589--602. Google ScholarDigital Library
- Liang Xiong, Xi Chen, and Jeff Schneider. 2011. Direct robust matrix factorization for anomaly detection. In ICDM. IEEE, 844--853. Google ScholarDigital Library
- Liang Xiong, Barnabás Póczos, and Jeff G. Schneider. 2011. Group anomaly detection using flexible genre models. In NIPS. 1071--1079. Google ScholarDigital Library
- Chang Xu, Dacheng Tao, and Chao Xu. 2013. A survey on multi-view learning. CoRR abs/1304.5634 (2013).Google Scholar
- Huan Xu, Constantine Caramanis, and Sujay Sanghavi. 2010. Robust PCA via outlier pursuit. In NIPS. 2496--2504. Google ScholarDigital Library
- Qi Rose Yu, Xinran He, and Yan Liu. 2014. GLAD: Group anomaly detection in social media analysis. In KDD. 372--381. Google ScholarDigital Library
- Xiaowei Zhou, Can Yang, and Weichuan Yu. 2012. Automatic mitral leaflet tracking in echocardiography by outlier detection in the low-rank representation. In CVPR. 972--979. Google ScholarDigital Library
- Arthur Zimek, Matthew Gaudet, Ricardo J. G. B. Campello, and Jörg Sander. 2013. Subsampling for efficient and effective unsupervised outlier detection ensembles. In KDD. 428--436. Google ScholarDigital Library
Index Terms
- Multi-View Low-Rank Analysis with Applications to Outlier Detection
Recommendations
Information-aware Multi-view Outlier Detection
With the development of multi-view learning, multi-view outlier detection has received increasing attention in recent years. However, the current research still faces two challenges: (1) The current research lacks theoretical analysis tools for multi-view ...
Multi-view Outlier Detection via Graphs Denoising
AbstractRecently, multi-view outlier detection attracts increasingly more attention. Although existing multi-view outlier detection methods have demonstrated promising performance, they still suffer from some problems. Firstly, many methods make the ...
Highlights- A novel unsupervised multi-view outlier detection method is proposed.
- It can explicitly extract the structured outliers on multiple graphs.
- The experiments demonstrates the effectiveness and superiority of the proposed method.
Robust Multi-view Subspace Learning Through Structured Low-Rank Matrix Recovery
Pattern Recognition and Computer VisionAbstractMulti-view data exists widely in our daily life. A popular approach to deal with multi-view data is the multi-view subspace learning (MvSL), which projects multi-view data into a common latent subspace to learn more powerful representation. Low-...
Comments