ABSTRACT
Accurately identifying disease-associated alleles from large sequencing experiments remains challenging. During this tutorial, participants will learn how to use a new variant annotation and filtering web app called Bystro (https://bystro.io/) to analyze sequencing experiments. Bystro is the first online, cloud-based application that makes variant annotation and filtering accessible to all researchers for even the largest, terabyte-sized whole-genome experiments containing thousands of samples. Using its general-purpose, natural-language filtering engine, attendees will be shown how to perform quality control measures and identify alleles of interest. They will then be guided in exporting those variants, and using them in both a regression context by performing rare-variant association tests in R, as well as classification context by training new machine learning models in Python's scikit-learn library.
- I. Ionita-Laza, S. Lee, V. Makarov, J. D. Buxbaum, and X. Lin . 2013. Sequence kernel association tests for the combined effect of rare and common variants. Am J Hum Genet Vol. 92, 6 (2013), 841--53.Google ScholarCross Ref
- M. Kircher, D. M. Witten, P. Jain, B. J. O'Roak, G. M. Cooper, and J. Shendure . 2014. A general framework for estimating the relative pathogenicity of human genetic variants. Nat Genet Vol. 46, 3 (2014), 310--5.Google ScholarCross Ref
- Alex V. Kotlar, Cristina E. Trevino, Michael E. Zwick, David J. Cutler, and Thomas S. Wingo . 2017. Bystro: Rapid online variant annotation and natural-language filtering at whole-genome scale. bioRxiv (2017).Google Scholar
Index Terms
- Tutorial: Rapidly Identifying Disease-associated Rare Variants using Annotation and Machine Learning at Whole-genome Scale Online
Recommendations
Classifying promoters by interpreting the hidden information of DNA sequences for disease prediction in clinical laboratories using Gaussian decision boundary estimation
A promoter is a brief stretch of DNA (100–1,000 bp) where RNA polymerase starts to transcribe a gene. A DNA (Deoxyribonucleic Acid) base pair is a fundamental unit of DNA structure and represents the pairing of two complementary nucleotide bases within ...
Prediction of small non-coding RNA in bacterial genomes using support vector machines
Small non-coding RNA genes have been shown to play important regulatory roles in a variety of cellular processes, but prediction of non-coding RNA genes is a great challenge, using either an experimental or a computational approach, due to the ...
Machine learning-based approaches identify a key physicochemical property for accurately predicting polyadenlylation signals in genomic sequences
ICIC'13: Proceedings of the 9th international conference on Intelligent Computing Theories and TechnologyAccurately predicting poly(A) signals (PASs) is one of important topics in bioinformatics for high-quality genome annotation and transcription regulation mechanism investigation. In this study, we identified a powerful physicochemical property of DNA ...
Comments