Analyzing Next-Generation Sequencing Data
Modern biomedical research is increasingly making use of genome-scale data from next-generation sequencing platforms, including Illumina HiSeq and MiSeq machines and Pacific Biosciences SMRT. These platforms make it possible for individual labs to quickly and cheaply generate vast amounts of genomic and transcriptomic data from de novo sequencing, resequencing, ChIP-seq, mRNA-seq, and allelotyping experiments.
Despite this ability to quickly generate large data sets, biologists are rarely trained in the compu- tational and statistical techniques necessary to make sense of this data. Thus, many researchers must rely on others – often computational scientists with little biological training – to design and implement appropriate data reduction and data mining techniques. Moreover, most institutions do not have access to the substantial computational resources necessary to run these analyses.
In order to bridge this gap, Dr. Brown facilitates a two-week intensive summer course, by teaching biomedical researchers to (1) run analyses on remote UNIX servers hosted in the Amazon Web Services “cloud”; (2) perform mapping and assembly on large short-read data sets; (3) tackle specific biological problems with existing short-read data; and (4) design computational pipelines capable of addressing their own research questions. This will be accomplished by in-depth hands- on practical training in the relevant techniques. The experience, confirmed by assessment, is that this practical training leads to a substantial improvement in the basic computational sophistication of participants. Dr. Brown believes that in the long term our cadre and those of other courses will contribute to a significant improvement in the general area of data-driven biology.