skip to main content
10.1145/2147805.2147880acmconferencesArticle/Chapter ViewAbstractPublication PagesbcbConference Proceedingsconference-collections
poster

Protein subcellular localization prediction with associative classification and multi-class SVM

Published:01 August 2011Publication History

ABSTRACT

Protein subcellular localization prediction is the problem of predicting where a protein functions within a living cell. In this paper, we apply associative classifications (CMAR and CPAR) and multi-class Support Vector Machines to tackle the problem of protein subcellular localization prediction. We use classification feature sources generated from a protein's SwissProt annotation record. We visualize the applied classification rules in an explain graph for domain experts to interpret. We compare the performance of our approaches to those of Proteome Analyst 3.0, using the same set of classification features; we find that all three classification algorithms outperform Proteome Analyst. Multi-class SVM achieves overall F-measures [0.934 ~ 0.991], while CPAR and CMAR achieve overall F-measures [0.922 ~ 0.989] and [0.880 ~ 0.989], respectively. Our result shows that despite multi-class SVM is still the most accurate prediction algorithm with overall F-measures, CPAR and CMAR achieve very similar accuracy. In most cases, CPAR outperforms CMAR, especially when the feature space is large. Our result indicates that associative classification algorithms, especially CPAR, is a good alternative to SVM with similar accuracy but much better transparency in classification models.

References

  1. The LUCS-KDD software library. http://www.csc.liv.ac.uk/~frans/kdd/software/.Google ScholarGoogle Scholar
  2. The Proteome Analyst 3.0 dataset. http://webdocs.cs.ualberta.ca/~bioinfo/pa/datasets.html.Google ScholarGoogle Scholar
  3. Uniprot knowledge base. http://www.uniprot.org.Google ScholarGoogle Scholar
  4. S. F. Altschul, T. L. Madden, A. A. Schaffer, J. Zhang, Z. Zhang, W. Miller, and D. J. Lipman. Gapped blast and psi-blast: a new generation of protein database search programs. Nucleic Acids Research, 25:3389--3402, 1997.Google ScholarGoogle ScholarCross RefCross Ref
  5. C.-C. Chang and C.-J. Lin. LIBSVM: a library for support vector machines. 2001. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. H. Fadi Thabta. A review of associative classification mining. The Knowledge Engineering Review, 22(01):37--65, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. W. Li, J. Han, and J. Pei. CMAR: accurate and efficient classification based on multiple class-association rules. Data Mining, 2001. ICDM 2001, Proceedings IEEE International Conference on, pages 369--376, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Z. Lu, D. Szafron, R. Greiner, P. Lu, D. S. Wishart, B. Poulin, C. Macdonell, and R. Eisner. Predicting subcellular localization of proteins using machine-learned classifiers. Bioinformatics, 20(4):547--556, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. X. Yin and J. Han. CPAR: Classification based on predictive association rules. SIAM International Conference on Data Mining (SDM'03), 2003.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Protein subcellular localization prediction with associative classification and multi-class SVM

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      BCB '11: Proceedings of the 2nd ACM Conference on Bioinformatics, Computational Biology and Biomedicine
      August 2011
      688 pages
      ISBN:9781450307963
      DOI:10.1145/2147805
      • General Chairs:
      • Robert Grossman,
      • Andrey Rzhetsky,
      • Program Chairs:
      • Sun Kim,
      • Wei Wang

      Copyright © 2011 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 1 August 2011

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • poster

      Acceptance Rates

      Overall Acceptance Rate254of885submissions,29%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader