Mo Rahman

Bioinformatics, it’s a thing.

Read this first

Understanding Translational Effects of Variants With SnpEff

Once we have assembled the genomes of our subject(s)[1][2], generated a list of variants, annotated these variants with relevant databases (e.g. dbSNP)[3], we may now be interested in investigating the structural and translational effects of genomic variants on proteins.
BG4YqNiCAAApoLZ.png-large.png -Interacting with protein structures in VMD“

If the interest in structural variations is well-intentioned then it behooves us to use SnpEff, which both adheres to VCF 4.1 standards and GATK best practices. As with many of the downstream processes we must make an initial investment by choosing a reference build, which for human samples, at the moment, consists of GRCh37 and HG19. Install the necessary reference library and run SnpEff:

$java -Xmx[allocate memory] -jar snpEff download [reference library]
$java -Xmx[allocate memory] -jar snpEff eff -v -onlyCoding true -i vcf -o vcf [reference library] [input].vcf >
...

Continue reading →


Exome Sequence Assembly Utilizing Bowtie & Samtools

Screen Shot 2014-01-26 at 5.16.05 PM.png
At the end of all the wet chemistry for a genome sequencing project we are left with the raw data in the form of fastq files. The following post documents the processing of said raw files to assembled genomes using Bowtie & Samtools.
screen-shot-2012-11-17-at-1-10-58-pm.png

Fig.1: Raw data is split into approximately 20-30 fastq files per individual

Each of these raw files, once uncompressed, contains somewhere around 1 gigabyte of nucleotide, machine, and quality information. Which will follow the fastq guidelines and look very similar to the following. It’s quickly noticeable where our nucleotide data consisting of ATGC lives within these raw files.

@HWI-ST1027:182:D1H4LACXX:5:2306:21024:142455 1:N:0:ACATTGGATTTGAATGGCACTGAATATACAGATCAACTTGAAGATAACTGATATCTAAACTATGCTGAGTCTTCTAATTCATGAACACAGTACATTTCTATTTAGG
+
@?<DFEDEHHFHDHEEGGECHHIIIIIGIGIIFGIBGHGBHGIE9>GIIIIIIIIIIIFGEII@DCHIIIIIIGHHIIFEGHBHECHEHFEDFDFDCEE>
...

Continue reading →