This document provides a quality assessment of Genome Analyzer results. The assessment is meant to complement, rather than replace, quality assessment available from the Genome Analyzer and its documentation. The narrative interpretation is based on experience of the package maintainer. It is applicable to results from the 'Genome Analyzer' hardware single-end module, configured to scan 300 tiles per lane. The 'control' results refered to below are from analysis of PhiX-174 sequence provided by Illumina.
Subsequent sections of the report use the following to identify figures and other information.
Key | |
wgEncodeUwTfbsAg04449CtcfStdAlnRep1.bam | 1 |
wgEncodeUwTfbsAg04450CtcfStdAlnRep1.bam | 2 |
wgEncodeUwTfbsAg09309CtcfStdAlnRep1.bam | 3 |
wgEncodeUwTfbsAg09319CtcfStdAlnRep1.bam | 4 |
wgEncodeUwTfbsAg10803CtcfStdAlnRep1.bam | 5 |
wgEncodeUwTfbsAoafCtcfStdAlnRep1.bam | 6 |
wgEncodeUwTfbsHaspCtcfStdAlnRep1.bam | 7 |
wgEncodeUwTfbsHbmecCtcfStdAlnRep1.bam | 8 |
wgEncodeUwTfbsHcfaaCtcfStdAlnRep1.bam | 9 |
wgEncodeUwTfbsHcpeCtcfStdAlnRep1.bam | 10 |
wgEncodeUwTfbsHeeCtcfStdAlnRep1.bam | 11 |
wgEncodeUwTfbsHmfCtcfStdAlnRep1.bam | 12 |
wgEncodeUwTfbsHpafCtcfStdAlnRep1.bam | 13 |
wgEncodeUwTfbsHpfCtcfStdAlnRep1.bam | 14 |
wgEncodeUwTfbsHrpeCtcfStdAlnRep1.bam | 15 |
wgEncodeUwTfbsAg04449CtcfStdAlnRep2.bam | 16 |
wgEncodeUwTfbsAg09309CtcfStdAlnRep2.bam | 17 |
wgEncodeUwTfbsAg09319CtcfStdAlnRep2.bam | 18 |
wgEncodeUwTfbsAg10803CtcfStdAlnRep2.bam | 19 |
wgEncodeUwTfbsAoafCtcfStdAlnRep2.bam | 20 |
wgEncodeUwTfbsHbmecCtcfStdAlnRep2.bam | 21 |
wgEncodeUwTfbsHcpeCtcfStdAlnRep2.bam | 22 |
wgEncodeUwTfbsHeeCtcfStdAlnRep2.bam | 23 |
wgEncodeUwTfbsHmfCtcfStdAlnRep2.bam | 24 |
wgEncodeUwTfbsHpafCtcfStdAlnRep2.bam | 25 |
wgEncodeUwTfbsHpfCtcfStdAlnRep2.bam | 26 |
wgEncodeUwTfbsAg04449InputStdAlnRep1.bam | 27 |
wgEncodeUwTfbsAg04450InputStdAlnRep1.bam | 28 |
wgEncodeUwTfbsAg09309InputStdAlnRep1.bam | 29 |
wgEncodeUwTfbsAg09319InputStdAlnRep1.bam | 30 |
wgEncodeUwTfbsAg10803InputStdAlnRep1.bam | 31 |
wgEncodeUwTfbsAoafInputStdAlnRep1.bam | 32 |
wgEncodeUwTfbsHaspInputStdAlnRep1.bam | 33 |
wgEncodeUwTfbsHbmecInputStdAlnRep1.bam | 34 |
wgEncodeUwTfbsHcfaaInputStdAlnRep1.bam | 35 |
wgEncodeUwTfbsHcpeInputStdAlnRep1.bam | 36 |
wgEncodeUwTfbsHeeInputStdAlnRep1.bam | 37 |
wgEncodeUwTfbsHmfInputStdAlnRep1.bam | 38 |
wgEncodeUwTfbsHpafInputStdAlnRep1.bam | 39 |
wgEncodeUwTfbsHpfInputStdAlnRep1.bam | 40 |
wgEncodeUwTfbsHrpeInputStdAlnRep1.bam | 41 |
Read counts. Filtered and aligned read counts are reported relative to the total number of reads (clusters; if only filtered or aligned reads are available, total read count is reported). Consult Genome Analyzer documentation for official guidelines. From experience, very good runs of the Genome Analyzer 'control' lane result in 25-30 million reads, with up to 95% passing pre-defined filters.
ShortRead:::.ppnCount(qa[["readCounts"]])
read | filter | aligned | |
1 | 9952444 | ||
2 | 21170101 | ||
3 | 14311099 | ||
4 | 22451182 | ||
5 | 26964677 | ||
6 | 9317234 | ||
7 | 14968206 | ||
8 | 23428973 | ||
9 | 20244846 | ||
10 | 20606447 | ||
11 | 23118965 | ||
12 | 9764834 | ||
13 | 27021265 | ||
14 | 21074336 | ||
15 | 25683072 | ||
16 | 23572200 | ||
17 | 10263622 | ||
18 | 25700109 | ||
19 | 29559218 | ||
20 | 29974058 | ||
21 | 15647982 | ||
22 | 31968244 | ||
23 | 10628790 | ||
24 | 21481465 | ||
25 | 27813170 | ||
26 | 24317687 | ||
27 | 16096148 | ||
28 | 18400427 | ||
29 | 7925518 | ||
30 | 23348186 | ||
31 | 19776716 | ||
32 | 21424353 | ||
33 | 26952024 | ||
34 | 21100439 | ||
35 | 23650231 | ||
36 | 20835811 | ||
37 | 12764649 | ||
38 | 8720617 | ||
39 | 20412883 | ||
40 | 18540990 | ||
41 | 18348314 |
ShortRead:::.plotReadCount(qa)
Base call frequency over all reads. Base frequencies should accurately reflect the frequencies of the regions sequenced.
ShortRead:::.plotNucleotideCount(qa)
Overall read quality. Lanes with consistently good quality reads have strong peaks at the right of the panel.
df <- qa[["readQualityScore"]] ShortRead:::.plotReadQuality(df[df$type=="read",])
These curves show how coverage is distributed amongst reads. Ideally, the cumulative proportion of reads will transition sharply from low to high.
Portions to the left of the transition might correspond roughly to sequencing or sample processing errors, and correspond to reads that are represented relatively infrequently. 10-15%; of reads in a typical Genome Analyzer 'control' lane fall in this category.
Portions to the right of the transition represent reads that are over-represented compared to expectation. These might include inadvertently sequenced primer or adapter sequences, sequencing or base calling artifacts (e.g., poly-A reads), or features of the sample DNA (highly repeated regions) not adequately removed during sample preparation. About 5% of Genome Analyzer 'control' lane reads fall in this category.
Broad transitions from low to high cumulative proportion of reads may reflect sequencing bias or (perhaps intentional) features of sample preparation resulting in non-uniform coverage. the transition is about 5 times as wide as expected from uniform sampling across the Genome Analyzer 'control' lane.
df <- qa[["sequenceDistribution"]] ShortRead:::.plotReadOccurrences(df[df$type=="read",], cex=.5)
Common duplicate reads might provide clues to the source of over-represented sequences. Some of these reads are filtered by the alignment algorithms; other duplicate reads might point to sample preparation issues.
ShortRead:::.freqSequences(qa, "read")
sequence | count | lane |
AAAAAAAAAAAAAAAGAAAAAAAAAAAAAAACAAAA | 1458 | 23 |
TTGTTCACTATGGAGTTGCGGTTAAAAGTAGGCCCT | 1329 | 21 |
GCTTCTCCAAGGGCAGAGCCAGAGTCCTCTTTTGCC | 1070 | 21 |
AAAAAAAAAAAAAAAAGGAAAAAAAAAAAAAAAAAA | 895 | 22 |
AAAAAAAAAAAAAAATAAAAAAAAAAAAAAAAAAAA | 665 | 23 |
AAAAAAAAAAAAAAAAGTAAAAAAAAAAAAAAAAAA | 535 | 22 |
TTGTTCACTATGGAGTTGCGGTTAAAAGTAGGCCCT | 475 | 2 |
AAAAAAAAGAAAAAAAAAAAAAANAAAAAAAAAAGA | 460 | 22 |
AAAAAAAATAAAAAAAAAATAAAAAAAAAAAAAAAA | 455 | 22 |
TTGTTCACTATGGAGTTGCGGTTAAAAGTAGGCCCT | 442 | 9 |
AAAAAACAAAAAAAAACAAAAAAAACAAAACAANAA | 439 | 22 |
GCTTCTCCAAGGGCAGAGCCAGAGTCCTCTTTTGCC | 408 | 2 |
TTGTTCACTATGGAGTTGCGGTTAAAAGTAGGCCCT | 367 | 15 |
GCTTCTCCAAGGGCAGAGCCAGAGTCCTCTTTTGCC | 358 | 9 |
TTGTTCACTATGGAGTTGCGGTTAAAAGTAGGCCCT | 355 | 25 |
GCTTCTCCAAGGGCAGAGCCAGAGTCCTCTTTTGCC | 343 | 15 |
AAAAAAAGAAAACAAAAAACAAAAAAAAGAAAAAAA | 308 | 22 |
TTGTTCACTATGGAGTTGCGGTTAAAAGTAGGCCCT | 295 | 20 |
GCTTCTCCAAGGGCAGAGCCAGAGTCCTCTTTTGCC | 293 | 25 |
AAAAAAAATAAAAAAAAAAAAAANAAAAAAAAAAGA | 291 | 22 |
Common duplicate reads after filtering
ShortRead:::.freqSequences(qa, "filtered")
NA
Common aligned duplicate reads are
ShortRead:::.freqSequences(qa, "aligned")
NA
Per-cycle base call should usually be approximately uniform across cycles. Genome Analyzer `control' lane results often show a deline in A and increase in T as cycles progress. This is likely an artifact of the underlying technology.
perCycle <- qa[["perCycle"]] ShortRead:::.plotCycleBaseCall(perCycle$baseCall)
Per-cycle quality score. Reported quality scores are `calibrated', i.e., incorporating phred-like adjustments following sequence alignment. These typically decline with cycle, in an accelerating manner. Abrupt transitions in quality between cycles toward the end of the read might result when only some of the cycles are used for alignment: the cycles included in the alignment are calibrated more effectively than the reads excluded from the alignment.
The reddish lines are quartiles (solid: median, dotted: 25, 75), the green line is the mean. Shading is proporitional to number of reads.
perCycle <- qa[["perCycle"]] ShortRead:::.plotCycleQuality(perCycle$quality)
The number of times the aligned reads overlap a given sequence position.
ShortRead:::.plotDepthOfCoverage(qa[["depthOfCoverage"]])
Adapter contamination is defined here as non-genetic sequences attached at either or both ends of the reads. The 'contamination' measure is the number of reads with a right or left match to the adapter sequence over the total number of reads. Mismatch rates are 10% on the left and 20% on the right with a minimum overlap of 10 nt.
ShortRead:::.ppnCount(qa[["adapterContamination"]])Not available.
Tue Oct 11 17:41:36 2011; ShortRead v. 1.11.42
Report template: Martin Morgan