skip to main content
10.1145/3194480.3194501acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicbcbConference Proceedingsconference-collections
research-article
Public Access

FlexSLiM: a Novel Approach for Short Linear Motif Discovery in Protein Sequences

Authors Info & Claims
Published:12 March 2018Publication History

ABSTRACT

Short linear motifs are 3 to 11 amino acid long peptide patterns that play important regulatory roles in modulating protein activities. Although they are abundant in proteins, it is often difficult to discover them by experiments, because of the low affinity binding and transient interaction of short linear motifs with their partners. Moreover, available computational methods cannot effectively predict short linear motifs, due to their short and degenerate nature. Here we developed a novel approach, FlexSLiM, for reliable discovery of short linear motifs in protein sequences. By testing on simulated data and benchmark experimental data, we demonstrated that FlexSLiM more effectively identifies short linear motifs than existing methods. We provide a general tool that will advance the understanding of short linear motifs, which will facilitate the research on protein targeting signals, protein post-translational modifications, and many others.

References

  1. H. Dinkel, K. Van Roey, S. Michael, M. Kumar, B. Uyar, B. Altenberg, V. Milchevskaya, M. Schneider, H. Kuhn, A. Behrendt, S. L. Dahl, V. Damerell, S. Diebel, S. Kalman, S. Klein, A. C. Knudsen, C. Mader, S. Merrill, A. Staudt, V. Thiel, L. Welti, N. E. Davey, F. Diella, and T. J. Gibson, "ELM 2016--data update and new functionality of the eukaryotic linear motif resource," Nucleic Acids Res, vol. 44, no. D1, pp. D294--300, Jan 4, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  2. S. Lemeer, and A. J. R. Heck, "The phosphoproteomics data explosion," Current Opinion in Chemical Biology, vol. 13, no. 4, pp. 414--420, Oct, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  3. S. P. Mirza, and M. Olivier, "Methods and approaches for the comprehensive characterization and quantification of cellular proteomes using mass spectrometry," Physiological Genomics, vol. 33, no. 1, pp. 3--11, Mar 14, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  4. H. Dumortier, J. K. Gunnewiek, J. P. Roussel, Y. van Aarssen, J. P. Briand, W. J. van Venrooij, and S. Muller, "At least three linear regions but not the zinc-finger domain of U1C protein are exposed at the surface of the protein in solution and on the human spliceosomal U1 snRNP particle," Nucleic Acids Research, vol. 26, no. 23, pp. 5486--5491, Dec 1, 1998.Google ScholarGoogle ScholarCross RefCross Ref
  5. M. Kikuchi, M. Kataoka, T. Kojima, T. Horibe, K. Fujieda, T. Kimura, and T. Tanaka, "Single chain antibodies that recognize the N-glycosylation site," Archives of Biochemistry and Biophysics, vol. 422, no. 2, pp. 221--229, Feb 15, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  6. S. Basu, and D. Plewczynski, "AMS 3.0: prediction of post-translational modifications," BMC Bioinformatics, vol. 11, pp. 210, Apr 28, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  7. R. Gutman, C. Berezin, R. Wollman, Y. Rosenberg, and N. Ben-Tal, "QuasiMotiFinder: protein annotation by searching for evolutionarily conserved motif-like patterns," Nucleic Acids Research, vol. 33, pp. W255-W261, Jul 1, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  8. T. Mi, J. C. Merlin, S. Deverasetty, M. R. Gryk, T. J. Bill, A. W. Brooks, L. Y. Lee, V. Rathnayake, C. A. Ross, D. P. Sargeant, C. L. Strong, P. Watts, S. Rajasekaran, and M. R. Schiller, "Minimotif Miner 3.0: database expansion and significantly improved reduction of false-positive predictions from consensus sequences," Nucleic Acids Research, vol. 40, no. D1, pp. D252--D260, Jan, 2012.Google ScholarGoogle ScholarCross RefCross Ref
  9. J. C. Obenauer, L. C. Cantley, and M. B. Yaffe, "Scansite 2.0: proteome-wide prediction of cell signaling interactions using short sequence motifs," Nucleic Acids Research, vol. 31, no. 13, pp. 3635--3641, Jul 1, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  10. C. Ramu, "SIRW: a web server for the Simple Indexing and Retrieval System that combines sequence motif searches with keyword searches," Nucleic Acids Research, vol. 31, no. 13, pp. 3771--3774, Jul 1, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  11. E. Olorin, K. T. O'Brien, N. Palopoli, A. Perez-Bercoff, D. C. Shields, and R. J. Edwards, "SLiMScape 3.x: a Cytoscape 3 app for discovery of Short Linear Motifs in protein interaction networks," F1000Res, vol. 4, pp. 477, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  12. H. Y. K. Lam, P. M. Kim, J. Mok, R. Tonikian, S. S. Sidhu, B. E. Turk, M. Snyder, and M. B. Gerstein, "MOTIPS: Automated Motif Analysis for Predicting Targets of Modular Protein Domains," Bmc Bioinformatics, vol. 11, May 11, 2010.Google ScholarGoogle Scholar
  13. D. Schwartz, and S. P. Gygi, "An iterative statistical approach to the identification of protein phosphorylation motifs from large-scale data sets," Nature Biotechnology, vol. 23, no. 11, pp. 1391--1398, Nov, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  14. M. Fuxreiter, P. Tompa, and I. Simon, "Local structural disorder imparts plasticity on linear motifs," Bioinformatics, vol. 23, no. 8, pp. 950--6, Apr 15, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. R. J. Edwards, N. E. Davey, and D. C. Shields, "SLiMFinder: a probabilistic method for identifying over-represented, convergently evolved, short linear motifs in proteins," PLoS One, vol. 2, no. 10, pp. e967, Oct 3, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  16. J. Hu, and F. Zhang, "BayesMotif: de novo protein sorting motif discovery from impure datasets," BMC Bioinformatics, vol. 11 Suppl 1, pp. S66, Jan 18, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  17. W. Hugo, F. Song, Z. Aung, S. K. Ng, and W. K. Sung, "SLiM on Diet: finding short linear motifs on domain interaction interfaces in Protein Data Bank," Bioinformatics, vol. 26, no. 8, pp. 1036--42, Apr 15, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. D. S. Lieber, O. Elemento, and S. Tavazoie, "Large-Scale Discovery and Characterization of Protein Regulatory Motifs in Eukaryotes," Plos One, vol. 5, no. 12, Dec 29, 2010.Google ScholarGoogle Scholar
  19. V. Neduva, and R. B. Russell, "DILIMOT: discovery of linear motifs in proteins," Nucleic Acids Research, vol. 34, pp. W350-W355, Jul 1, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  20. I. Rigoutsos, and A. Floratos, "Combinatorial pattern discovery in biological sequences: the TEIRESIAS algorithm (vol 14, pg 55, 1998)," Bioinformatics, vol. 14, no. 2, pp. 229--229, 1998.Google ScholarGoogle Scholar
  21. S. H. Tan, W. Hugo, W. K. Sung, and S. K. Ng, "A correlated motif approach for finding short linear motifs from protein interaction networks," Bmc Bioinformatics, vol. 7, Nov 16, 2006.Google ScholarGoogle Scholar
  22. N. E. Davey, J. L. Cowan, D. C. Shields, T. J. Gibson, M. J. Coldwell, and R. J. Edwards, "SLiMPrints: conservation-based discovery of functional motif fingerprints in intrinsically disordered protein regions," Nucleic Acids Res, vol. 40, no. 21, pp. 10628--41, Nov, 2012.Google ScholarGoogle ScholarCross RefCross Ref
  23. N. E. Davey, K. Van Roey, R. J. Weatheritt, G. Toedt, B. Uyar, B. Altenberg, A. Budd, F. Diella, H. Dinkel, and T. J. Gibson, "Attributes of short linear motifs," Mol Biosyst, vol. 8, no. 1, pp. 268--81, Jan, 2012.Google ScholarGoogle ScholarCross RefCross Ref
  24. G. Grahne, and J. F. Zhu, "Fast algorithms for frequent itemset mining using FP-trees," Ieee Transactions on Knowledge and Data Engineering, vol. 17, no. 10, pp. 1347--1362, Oct, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. J. W. Han, J. Pei, and Y. W. Yin, "Mining frequent patterns without candidate generation," Sigmod Record, vol. 29, no. 2, pp. 1--12, Jun, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. D. R. Zerbino, P. Achuthan, W. Akanni, M. R. Amode, D. Barrell, J. Bhai, K. Billis, C. Cummins, A. Gall, C. G. Giron, L. Gil, L. Gordon, L. Haggerty, E. Haskell, T. Hourlier, O. G. Izuogu, S. H. Janacek, T. Juettemann, J. K. To, M. R. Laird, I. Lavidas, Z. Liu, J. E. Loveland, T. Maurel, W. McLaren, B. Moore, J. Mudge, D. N. Murphy, V. Newman, M. Nuhn, D. Ogeh, C. K. Ong, A. Parker, M. Patricio, H. S. Riat, H. Schuilenburg, D. Sheppard, H. Sparrow, K. Taylor, A. Thormann, A. Vullo, B. Walts, A. Zadissa, A. Frankish, S. E. Hunt, M. Kostadima, N. Langridge, F. J. Martin, M. Muffato, E. Perry, M. Ruffier, D. M. Staines, S. J. Trevanion, B. L. Aken, F. Cunningham, A. Yates, and P. Flicek, "Ensembl 2018," Nucleic Acids Res, vol. 46, no. D1, pp. D754-D761, Jan 4, 2018.Google ScholarGoogle ScholarCross RefCross Ref
  27. M. Glittenberg, C. Pitsouli, C. Garvey, C. Delidakis, and S. Bray, "Role of conserved intracellular motifs in Serrate signalling, cis-inhibition and endocytosis," EMBO J, vol. 25, no. 20, pp. 4697--706, Oct 18, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  28. J. E. Hopcroft, and J. D. Ullman, Introduction to automata theory, languages, and computation, Reading, Mass.: Addison-Wesley, 1979. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. G. Nuel, L. Regad, J. Martin, and A. C. Camproux, "Exact distribution of a pattern in a set of random sequences generated by a Markov source: applications to biological data," Algorithms for Molecular Biology, vol. 5, Jan 26, 2010.Google ScholarGoogle Scholar
  30. P. Ribeca, and E. Raineri, "Faster exact Markovian probability functions for motif occurrences: a DFA-only approach," Bioinformatics, vol. 24, no. 24, pp. 2839--2848, Dec 15, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Z. Dosztanyi, V. Csizmok, P. Tompa, and I. Simon, "IUPred: web server for the prediction of intrinsically unstructured regions of proteins based on estimated energy content," Bioinformatics, vol. 21, no. 16, pp. 3433--4, Aug 15, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. J. C. Wootton, and S. Federhen, "Statistics of Local Complexity in Amino-Acid-Sequences and Sequence Databases," Computers & Chemistry, vol. 17, no. 2, pp. 149--163, Jun, 1993.Google ScholarGoogle ScholarCross RefCross Ref
  33. C. M. Gould, F. Diella, A. Via, P. Puntervoll, C. Gemund, S. Chabanis-Davidson, S. Michael, A. Sayadi, J. C. Bryne, C. Chica, M. Seiler, N. E. Davey, N. Haslam, R. J. Weatheritt, A. Budd, T. Hughes, J. Pas, L. Rychlewski, G. Trave, R. Aasland, M. Helmer-Citterich, R. Linding, and T. J. Gibson, "ELM: the status of the 2010 eukaryotic linear motif resource," Nucleic Acids Res, vol. 38, no. Database issue, pp. D167--80, Jan, 2010.Google ScholarGoogle Scholar
  34. F. Yu, Z. Chen, Y. Diao, T. V. Lakshman, and R. H. Katz, "Fast and memory-efficient regular expression matching for deep packet inspection."Google ScholarGoogle Scholar

Index Terms

  1. FlexSLiM: a Novel Approach for Short Linear Motif Discovery in Protein Sequences

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Other conferences
        ICBCB 2018: Proceedings of the 2018 6th International Conference on Bioinformatics and Computational Biology
        March 2018
        174 pages
        ISBN:9781450363488
        DOI:10.1145/3194480

        Copyright © 2018 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 12 March 2018

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed limited
      • Article Metrics

        • Downloads (Last 12 months)50
        • Downloads (Last 6 weeks)3

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader