ABSTRACT
Protein subcellular localization prediction is the problem of predicting where a protein functions within a living cell. In this paper, we apply associative classifications (CMAR and CPAR) and multi-class Support Vector Machines to tackle the problem of protein subcellular localization prediction. We use classification feature sources generated from a protein's SwissProt annotation record. We visualize the applied classification rules in an explain graph for domain experts to interpret. We compare the performance of our approaches to those of Proteome Analyst 3.0, using the same set of classification features; we find that all three classification algorithms outperform Proteome Analyst. Multi-class SVM achieves overall F-measures [0.934 ~ 0.991], while CPAR and CMAR achieve overall F-measures [0.922 ~ 0.989] and [0.880 ~ 0.989], respectively. Our result shows that despite multi-class SVM is still the most accurate prediction algorithm with overall F-measures, CPAR and CMAR achieve very similar accuracy. In most cases, CPAR outperforms CMAR, especially when the feature space is large. Our result indicates that associative classification algorithms, especially CPAR, is a good alternative to SVM with similar accuracy but much better transparency in classification models.
- The LUCS-KDD software library. http://www.csc.liv.ac.uk/~frans/kdd/software/.Google Scholar
- The Proteome Analyst 3.0 dataset. http://webdocs.cs.ualberta.ca/~bioinfo/pa/datasets.html.Google Scholar
- Uniprot knowledge base. http://www.uniprot.org.Google Scholar
- S. F. Altschul, T. L. Madden, A. A. Schaffer, J. Zhang, Z. Zhang, W. Miller, and D. J. Lipman. Gapped blast and psi-blast: a new generation of protein database search programs. Nucleic Acids Research, 25:3389--3402, 1997.Google ScholarCross Ref
- C.-C. Chang and C.-J. Lin. LIBSVM: a library for support vector machines. 2001. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm. Google ScholarDigital Library
- H. Fadi Thabta. A review of associative classification mining. The Knowledge Engineering Review, 22(01):37--65, 2007. Google ScholarDigital Library
- W. Li, J. Han, and J. Pei. CMAR: accurate and efficient classification based on multiple class-association rules. Data Mining, 2001. ICDM 2001, Proceedings IEEE International Conference on, pages 369--376, 2001. Google ScholarDigital Library
- Z. Lu, D. Szafron, R. Greiner, P. Lu, D. S. Wishart, B. Poulin, C. Macdonell, and R. Eisner. Predicting subcellular localization of proteins using machine-learned classifiers. Bioinformatics, 20(4):547--556, 2004. Google ScholarDigital Library
- X. Yin and J. Han. CPAR: Classification based on predictive association rules. SIAM International Conference on Data Mining (SDM'03), 2003.Google ScholarCross Ref
Index Terms
- Protein subcellular localization prediction with associative classification and multi-class SVM
Recommendations
Chinese text classification by the Naïve Bayes Classifier and the associative classifier with multiple confidence threshold values
Each type of classifier has its own advantages as well as certain shortcomings. In this paper, we take the advantages of the associative classifier and the Naive Bayes Classifier to make up the shortcomings of each other, thus improving the accuracy of ...
Multi-class Support Vector Machine (SVM) Classifiers -- An Application in Hypothyroid Detection and Classification
BIC-TA '11: Proceedings of the 2011 Sixth International Conference on Bio-Inspired Computing: Theories and ApplicationsThe paper presents a Multi-class Support Vector Machine classifier and its application to hypothyroid detection and classification. Support Vector Machines (SVM) have been well known method in the machine learning community for binary classification ...
AdaBoost classifiers for pecan defect classification
Highlights The performance of AdaBoost algorithms were compared with support vector machine and Bayesian classifiers for pecan defect classification. AdaBoost classifiers took least time and gave best classification accuracy. AdaBoost classifiers ...
Comments