Coursera Week 1
Does amalgamation of the reference human genome from various individuals cause problems? Can’t such amalgamation produce a phenotype that does not occur naturally?
How does the repeated triplet "CAG" affect the severity of Huntington's disease?
Would it be better to use multiple reference genomes instead of a single reference genome?
What is the point of appending the "$" sign to Text when we construct SuffixTrie(Text)?
Why do we construct the trie of all suffixes and not the trie of all prefixes for pattern matching?
Why do we need an edge from the root to the leaf in the suffix tree (labeled by the "$" sign) if this edge is never traversed during pattern matching?
What are the edge labels in the suffix tree for "panamabananas$"?
How does storing SuffixTree(Text) require memory on the order of 20·|Text| if the number of nodes in the suffix tree does not exceed 2·|Text|?
How can I construct a suffix tree in linear time?
Why does the suffix tree for the 3-billion nucleotide human require about 60 GB of memory?
How do we construct the suffix tree for a human genome with 23 chromosomes?
Can I see an example of how PatternMatchingWithSuffixArray works?
Why does the suffix array for the 3-billion nucleotide human require 12 GB of memory? Is about 3 GB not sufficient?
Why do we need the LCP array to transform a suffix array into a suffix tree?
Coursera Week 2
Is it possible to construct the Burrows-Wheeler Transform in linear time and space?
What is special about the final column of the Burrows-Wheeler matrix? Why not work with some other column?
Doesn't the Last-to-First mapping require a lot of memory?
Why does FirstColumn appear among the arguments in BWMatching if it is never used in the BWMatching pseudocode?
How does the array FirstOccurrence reduce memory if we still need the larger array FirstColumn to construct FirstOccurrence in the first place?
Why do the first and last occurrences of symbol in the range of positions from top to bottom in LastColumn have respective ranks Countsymbol(top, LastColumn)+1 and Countsymbol(bottom+1, LastColumn)?
Would it be possible to match a pattern using the Burrows-Wheeler Transform by moving forward through the pattern instead of moving backward?
Is BetterBWMatching guaranteed to terminate?
Would BetterBWMatching work properly if Pattern contains symbols that do not appear in Text?
In the main text, you illustrated walking backward in BetterBWMatching with the pattern "ana", which is a palindrome. How can we match a non-palindromic pattern?
How do we select the constant K for constructing the partial suffix array?
If we have to construct the entire suffix array to build the partial suffix array, how can it save memory?
How do we select the constant C when constructing checkpoint arrays?
What modifications of BetterBWMatching are needed to make it work with checkpoint arrays instead of count arrays?
How do biologists determine the maximum allowable number of mismatches while mapping reads to the human genome?
Can reads that "fall off the edges of the text" form approximate matches?
What is the running time of approximate pattern matching using the Burrows Wheeler Transform?
How does seed extension work?
How does BLAST find all k-mers that have scores above a given threshold when scored against a k-mer in the query string?
How does BLAST extend the seeds that it identifies? Does it not require constructing an optimal alignment, thus significantly slowing down the algorithm?
After finding a seed, BLAST does construct an alignment but imposes a limit on the number of insertions and deletions in this alignment. For example, after finding the two amino acid-long seed CG below, we can find the best local alignment starting at the “end” of this seed (shown by the blue rectangle shown below) and another local alignment ending at the “beginning” of this seed (the corresponding rectangle is not shown). The resulting local alignments extending this seed from both sides form a full-length alignment.


How can I modify the approximate pattern matching with the Burrows-Wheeler transform to account for patterns whose last symbols do not appear in Text?
When we approximately match patterns with the Burrows-Wheeler transform, we consider possibilities of mismatches in all positions but the first one. Wouldn't this strategy fail to match a read with an error at the first position?
Why do we add both the # and $ symbols to the texts when solving the Longest Shared Substring Problem?
How should I modify BinarySearch if I want to determine how many times a key is present in an array with duplicates?
while maxIndex ≥ minIndex
midIndex = (minIndex + maxIndex)/2
if Array(midIndex) ≥ key
maxIndex = mid-1
else
minIndex = mid+1
return minIndex
while maxIndex ≥ minIndex
midIndex = (minIndex + maxIndex)/2
if Array(midIndex) > key
maxIndex = mid-1
else
minIndex = mid+1
return minIndex