ABSTRACT
Short linear motifs are 3 to 11 amino acid long peptide patterns that play important regulatory roles in modulating protein activities. Although they are abundant in proteins, it is often difficult to discover them by experiments, because of the low affinity binding and transient interaction of short linear motifs with their partners. Moreover, available computational methods cannot effectively predict short linear motifs, due to their short and degenerate nature. Here we developed a novel approach, FlexSLiM, for reliable discovery of short linear motifs in protein sequences. By testing on simulated data and benchmark experimental data, we demonstrated that FlexSLiM more effectively identifies short linear motifs than existing methods. We provide a general tool that will advance the understanding of short linear motifs, which will facilitate the research on protein targeting signals, protein post-translational modifications, and many others.
- H. Dinkel, K. Van Roey, S. Michael, M. Kumar, B. Uyar, B. Altenberg, V. Milchevskaya, M. Schneider, H. Kuhn, A. Behrendt, S. L. Dahl, V. Damerell, S. Diebel, S. Kalman, S. Klein, A. C. Knudsen, C. Mader, S. Merrill, A. Staudt, V. Thiel, L. Welti, N. E. Davey, F. Diella, and T. J. Gibson, "ELM 2016--data update and new functionality of the eukaryotic linear motif resource," Nucleic Acids Res, vol. 44, no. D1, pp. D294--300, Jan 4, 2016.Google ScholarCross Ref
- S. Lemeer, and A. J. R. Heck, "The phosphoproteomics data explosion," Current Opinion in Chemical Biology, vol. 13, no. 4, pp. 414--420, Oct, 2009.Google ScholarCross Ref
- S. P. Mirza, and M. Olivier, "Methods and approaches for the comprehensive characterization and quantification of cellular proteomes using mass spectrometry," Physiological Genomics, vol. 33, no. 1, pp. 3--11, Mar 14, 2008.Google ScholarCross Ref
- H. Dumortier, J. K. Gunnewiek, J. P. Roussel, Y. van Aarssen, J. P. Briand, W. J. van Venrooij, and S. Muller, "At least three linear regions but not the zinc-finger domain of U1C protein are exposed at the surface of the protein in solution and on the human spliceosomal U1 snRNP particle," Nucleic Acids Research, vol. 26, no. 23, pp. 5486--5491, Dec 1, 1998.Google ScholarCross Ref
- M. Kikuchi, M. Kataoka, T. Kojima, T. Horibe, K. Fujieda, T. Kimura, and T. Tanaka, "Single chain antibodies that recognize the N-glycosylation site," Archives of Biochemistry and Biophysics, vol. 422, no. 2, pp. 221--229, Feb 15, 2004.Google ScholarCross Ref
- S. Basu, and D. Plewczynski, "AMS 3.0: prediction of post-translational modifications," BMC Bioinformatics, vol. 11, pp. 210, Apr 28, 2010.Google ScholarCross Ref
- R. Gutman, C. Berezin, R. Wollman, Y. Rosenberg, and N. Ben-Tal, "QuasiMotiFinder: protein annotation by searching for evolutionarily conserved motif-like patterns," Nucleic Acids Research, vol. 33, pp. W255-W261, Jul 1, 2005.Google ScholarCross Ref
- T. Mi, J. C. Merlin, S. Deverasetty, M. R. Gryk, T. J. Bill, A. W. Brooks, L. Y. Lee, V. Rathnayake, C. A. Ross, D. P. Sargeant, C. L. Strong, P. Watts, S. Rajasekaran, and M. R. Schiller, "Minimotif Miner 3.0: database expansion and significantly improved reduction of false-positive predictions from consensus sequences," Nucleic Acids Research, vol. 40, no. D1, pp. D252--D260, Jan, 2012.Google ScholarCross Ref
- J. C. Obenauer, L. C. Cantley, and M. B. Yaffe, "Scansite 2.0: proteome-wide prediction of cell signaling interactions using short sequence motifs," Nucleic Acids Research, vol. 31, no. 13, pp. 3635--3641, Jul 1, 2003.Google ScholarCross Ref
- C. Ramu, "SIRW: a web server for the Simple Indexing and Retrieval System that combines sequence motif searches with keyword searches," Nucleic Acids Research, vol. 31, no. 13, pp. 3771--3774, Jul 1, 2003.Google ScholarCross Ref
- E. Olorin, K. T. O'Brien, N. Palopoli, A. Perez-Bercoff, D. C. Shields, and R. J. Edwards, "SLiMScape 3.x: a Cytoscape 3 app for discovery of Short Linear Motifs in protein interaction networks," F1000Res, vol. 4, pp. 477, 2015.Google ScholarCross Ref
- H. Y. K. Lam, P. M. Kim, J. Mok, R. Tonikian, S. S. Sidhu, B. E. Turk, M. Snyder, and M. B. Gerstein, "MOTIPS: Automated Motif Analysis for Predicting Targets of Modular Protein Domains," Bmc Bioinformatics, vol. 11, May 11, 2010.Google Scholar
- D. Schwartz, and S. P. Gygi, "An iterative statistical approach to the identification of protein phosphorylation motifs from large-scale data sets," Nature Biotechnology, vol. 23, no. 11, pp. 1391--1398, Nov, 2005.Google ScholarCross Ref
- M. Fuxreiter, P. Tompa, and I. Simon, "Local structural disorder imparts plasticity on linear motifs," Bioinformatics, vol. 23, no. 8, pp. 950--6, Apr 15, 2007. Google ScholarDigital Library
- R. J. Edwards, N. E. Davey, and D. C. Shields, "SLiMFinder: a probabilistic method for identifying over-represented, convergently evolved, short linear motifs in proteins," PLoS One, vol. 2, no. 10, pp. e967, Oct 3, 2007.Google ScholarCross Ref
- J. Hu, and F. Zhang, "BayesMotif: de novo protein sorting motif discovery from impure datasets," BMC Bioinformatics, vol. 11 Suppl 1, pp. S66, Jan 18, 2010.Google ScholarCross Ref
- W. Hugo, F. Song, Z. Aung, S. K. Ng, and W. K. Sung, "SLiM on Diet: finding short linear motifs on domain interaction interfaces in Protein Data Bank," Bioinformatics, vol. 26, no. 8, pp. 1036--42, Apr 15, 2010. Google ScholarDigital Library
- D. S. Lieber, O. Elemento, and S. Tavazoie, "Large-Scale Discovery and Characterization of Protein Regulatory Motifs in Eukaryotes," Plos One, vol. 5, no. 12, Dec 29, 2010.Google Scholar
- V. Neduva, and R. B. Russell, "DILIMOT: discovery of linear motifs in proteins," Nucleic Acids Research, vol. 34, pp. W350-W355, Jul 1, 2006.Google ScholarCross Ref
- I. Rigoutsos, and A. Floratos, "Combinatorial pattern discovery in biological sequences: the TEIRESIAS algorithm (vol 14, pg 55, 1998)," Bioinformatics, vol. 14, no. 2, pp. 229--229, 1998.Google Scholar
- S. H. Tan, W. Hugo, W. K. Sung, and S. K. Ng, "A correlated motif approach for finding short linear motifs from protein interaction networks," Bmc Bioinformatics, vol. 7, Nov 16, 2006.Google Scholar
- N. E. Davey, J. L. Cowan, D. C. Shields, T. J. Gibson, M. J. Coldwell, and R. J. Edwards, "SLiMPrints: conservation-based discovery of functional motif fingerprints in intrinsically disordered protein regions," Nucleic Acids Res, vol. 40, no. 21, pp. 10628--41, Nov, 2012.Google ScholarCross Ref
- N. E. Davey, K. Van Roey, R. J. Weatheritt, G. Toedt, B. Uyar, B. Altenberg, A. Budd, F. Diella, H. Dinkel, and T. J. Gibson, "Attributes of short linear motifs," Mol Biosyst, vol. 8, no. 1, pp. 268--81, Jan, 2012.Google ScholarCross Ref
- G. Grahne, and J. F. Zhu, "Fast algorithms for frequent itemset mining using FP-trees," Ieee Transactions on Knowledge and Data Engineering, vol. 17, no. 10, pp. 1347--1362, Oct, 2005. Google ScholarDigital Library
- J. W. Han, J. Pei, and Y. W. Yin, "Mining frequent patterns without candidate generation," Sigmod Record, vol. 29, no. 2, pp. 1--12, Jun, 2000. Google ScholarDigital Library
- D. R. Zerbino, P. Achuthan, W. Akanni, M. R. Amode, D. Barrell, J. Bhai, K. Billis, C. Cummins, A. Gall, C. G. Giron, L. Gil, L. Gordon, L. Haggerty, E. Haskell, T. Hourlier, O. G. Izuogu, S. H. Janacek, T. Juettemann, J. K. To, M. R. Laird, I. Lavidas, Z. Liu, J. E. Loveland, T. Maurel, W. McLaren, B. Moore, J. Mudge, D. N. Murphy, V. Newman, M. Nuhn, D. Ogeh, C. K. Ong, A. Parker, M. Patricio, H. S. Riat, H. Schuilenburg, D. Sheppard, H. Sparrow, K. Taylor, A. Thormann, A. Vullo, B. Walts, A. Zadissa, A. Frankish, S. E. Hunt, M. Kostadima, N. Langridge, F. J. Martin, M. Muffato, E. Perry, M. Ruffier, D. M. Staines, S. J. Trevanion, B. L. Aken, F. Cunningham, A. Yates, and P. Flicek, "Ensembl 2018," Nucleic Acids Res, vol. 46, no. D1, pp. D754-D761, Jan 4, 2018.Google ScholarCross Ref
- M. Glittenberg, C. Pitsouli, C. Garvey, C. Delidakis, and S. Bray, "Role of conserved intracellular motifs in Serrate signalling, cis-inhibition and endocytosis," EMBO J, vol. 25, no. 20, pp. 4697--706, Oct 18, 2006.Google ScholarCross Ref
- J. E. Hopcroft, and J. D. Ullman, Introduction to automata theory, languages, and computation, Reading, Mass.: Addison-Wesley, 1979. Google ScholarDigital Library
- G. Nuel, L. Regad, J. Martin, and A. C. Camproux, "Exact distribution of a pattern in a set of random sequences generated by a Markov source: applications to biological data," Algorithms for Molecular Biology, vol. 5, Jan 26, 2010.Google Scholar
- P. Ribeca, and E. Raineri, "Faster exact Markovian probability functions for motif occurrences: a DFA-only approach," Bioinformatics, vol. 24, no. 24, pp. 2839--2848, Dec 15, 2008. Google ScholarDigital Library
- Z. Dosztanyi, V. Csizmok, P. Tompa, and I. Simon, "IUPred: web server for the prediction of intrinsically unstructured regions of proteins based on estimated energy content," Bioinformatics, vol. 21, no. 16, pp. 3433--4, Aug 15, 2005. Google ScholarDigital Library
- J. C. Wootton, and S. Federhen, "Statistics of Local Complexity in Amino-Acid-Sequences and Sequence Databases," Computers & Chemistry, vol. 17, no. 2, pp. 149--163, Jun, 1993.Google ScholarCross Ref
- C. M. Gould, F. Diella, A. Via, P. Puntervoll, C. Gemund, S. Chabanis-Davidson, S. Michael, A. Sayadi, J. C. Bryne, C. Chica, M. Seiler, N. E. Davey, N. Haslam, R. J. Weatheritt, A. Budd, T. Hughes, J. Pas, L. Rychlewski, G. Trave, R. Aasland, M. Helmer-Citterich, R. Linding, and T. J. Gibson, "ELM: the status of the 2010 eukaryotic linear motif resource," Nucleic Acids Res, vol. 38, no. Database issue, pp. D167--80, Jan, 2010.Google Scholar
- F. Yu, Z. Chen, Y. Diao, T. V. Lakshman, and R. H. Katz, "Fast and memory-efficient regular expression matching for deep packet inspection."Google Scholar
Index Terms
- FlexSLiM: a Novel Approach for Short Linear Motif Discovery in Protein Sequences
Recommendations
SLiMSearch: a webserver for finding novel occurrences of short linear motifs in proteins, incorporating sequence context
PRIB'10: Proceedings of the 5th IAPR international conference on Pattern recognition in bioinformaticsShort, linear motifs (SLiMs) play a critical role in many biological processes. The SLiMSearch (Short, Linear Motif Search) webserver is a flexible tool that enables researchers to identify novel occurrences of predefined SLiMs in sets of proteins. ...
A hybrid clustering algorithm for identifying modules in Protein Protein Interaction networks
Identifying modules in Protein Protein Interaction (PPI) networks is important to understand the organisation of the cellular processes. In this paper, we present a novel algorithm combining Molecular Complex Detection (MCODE) with Girvan Newman (GN) to ...
Computational prediction of protein ubiquitination sites mapping on Arabidopsis thaliana
Graphical abstractThe working flowchart of the proposed in silico prediction method is represented here. The various sample ratios were considered to train the supervised learning algorithm. To evaluate the model performance cross validation ...
Highlights- Ubiquitination sites prediction.
- CKSAAP encoding scheme.
AbstractAmong the protein post-translational modifications (PTMs), ubiquitination is considered as one of the most significant processes which can regulate the cellular functions and various diseases. Identification of ubiquitination sites ...
Comments