The structural variation in glycans is fundamental to their biological activity. One of the most powerful tools for glycan structure determination is tandem mass spectrometry. Interpreting the tandem mass spectra of glycopeptides with de novo approach is essential to determine novel glycan structures. In this work, we examine the glycan de novo sequencing problem.
We use a labelled tree to represent a glycan structure. Let S be the alphabet of simple sugars. A glycan tree T is an unordered rooted tree with bounded degree whose nodes are labelled by letters from S. The degree of glycan trees is bounded by 4. The root of T is linked to a peptide.
We define glycan de novo sequencing as follows: Let M = {( m i, I i ) | 1 ý i ý n } be a spectrum of a glycopeptide, where m i is the mass and I i is the intensity of a peak. For each mass value m, according to the intensity of the peak nearby m, a score function f ( m ) can be defined. Let T be a glycan tree. Then, the score of T, S ( T ), is defined as the summation of f ( m ) for all the mass values m of the fragment ions of T. The glycan structure de novo sequencing problem then finds a tree structure T such that the mass of T is equal to a given value M and S ( T ) is maximized.
We proved that the glycan de novo sequencing is an NP-hard problem for arbitrary score function, and then developed a heuristic algorithm for the problem. The algorithm first generates many acceptable small subtrees, which are then joined together in an iterative process to obtain larger suboptimal subtrees until the desired mass is reached. At each size of the subtree, only a limited number of subtrees are kept for later use.
Experiments on real MS/MS data of glycopeptides from the cationic isozyme peanut peroxidase showed that the heuristic algorithm can determine glycan structures accurately.
Keywords : Glycomics, proteomics, tandem mass spectrometry, glycan, glycoprotein, de novo sequencing.
Recommendations
A Machine Learning Based Approach to de novo Sequencing of Glycans from Tandem Mass Spectrometry Spectrum
Recently, glycomics has been actively studied and various technologies for glycomics have been rapidly developed. Currently, tandem mass spectrometry (MS/MS) is one of the key experimental tools for identification of structures of oligosaccharides. MS/...
Algorithms for identifying protein cross-links via tandem mass spectrometry
RECOMB '01: Proceedings of the fifth annual international conference on Computational biologyCross-linking technology combined with tandem mass spectrometry (MS-MS) is a powerful method that provides a rapid solution to the discovery of protein-protein interactions and protein structures. We studied the problem of detecting cross-linked ...