How to get from FASTQ files to BAM files. Quick walk-through. 1. Install software. On Ubuntu: Simply install the following packages in the Software Center: tophat2, samtools, igv 2. Get reference Look at http://www.ensembl.org/, find "FTP Download" page Download FASTA file (toplevel, not "rm") Download GTF file. Unpack both with gunzip. 3. Build index bowtie2-build Drosophila_melanogaster.BDGP5.75.dna.toplevel.fa Dmel_5.75 4. Align with TopHat tophat2 -G Drosophila_melanogaster.BDGP5.75.gtf --transcriptome-index Dmel_5.75tr index/Dmel_5.75 fastq/SRR031714_1.fastq fastq/SRR031714_2.fastq 5. Make an index ln -s ../tophat_out_prepared/accepted_hits.bam sample1.bam samtools index sample1.bam 7. Inspect with IGV 8. Count the features library( GenomicFeatures ) library( GenomicAlignments ) tdb <- makeTranscriptDbFromGFF( "downloads/Drosophila_melanogaster.BDGP5.75.gtf.gz", format="gtf" ) exonsByGene <- exonsBy( tdb, by="gene" ) summarizeOverlaps( exonsByGene, BamFileList( "bam/sample1.bam" ), ignore.strand=TRUE ) se <- summarizeOverlaps( exonsByGene, BamFileList( "bam/sample1.bam" ), ignore.strand=TRUE ) head( assay(se) )