Coursera Week 4
Since humans did not descend directly from mice, why do we nevertheless analyze a rearrangement scenario transforming mouse into human?
Why does the Random Breakage Model result in an exponential distribution of synteny blocks?
What is a permutation?
How do we compare genomes with different numbers of synteny blocks, such as (+a +b +c) and (+c -b +a -d)?
The easiest way to deal with synteny blocks that appear in one genome and not another is to ignore them and consider only those blocks common to both genomes, e.g., in this case to compare (+a +b +c) with (+c -b +a). It is also possible to incorporate insertions and deletions into genome rearrangement studies, providing some penalty for the insertion/deletion of a single block, or a penalty for the insertion/deletion of a series of contiguous blocks. Various research papers have attempted to expand genome rearrangement metrics to account for insertions and deletions.
How can we conclude that there are 1,070 different seven-step scenarios to transform the mouse X chromosome into the human X chromosome by reversals?
Why does the pair (+4 +3) form a breakpoint but the pair (-4 -3) does not?
The pair (+4 +3) forms a breakpoint because, in contrast to (-4 -3), it cannot be transformed into (+3 +4), a desirable pair when sorting by reversals, by a single reversal. For example, applying a reversal to
(+1 +2 +4 +3 +5 +6)
transforms this permutation into
(+1 +2 -3 -4 +5 +6),
but applying a reversal to
(+1 +2 -4 -3 +5 +6)
transforms it into the identity permutation
(+1 +2 +3 +4 +5 +6).
To better understand why (+4 +3) is a breakpoint, try sorting the permutation (+6 +5 +4 +3 +2 +1) – you will see that it requires many reversals!
Are there other types of genome rearrangements other than reversals, translocations, fusions, and fissions?
Yes! For example, a transposition moves a segment from one location in the genome to another. For example, one transposition applied to the blue region of the chromosome (+1 +2 +3 +4 +5 +6 +7) yields (+1 +5 +6 +2 +3 +4 +7). However, transpositions are more rare than reversals and other rearrangements discussed in the chapter.
Transpositions represent an example of a 3-break, a rearrangement that requires 3 rather than 2 breaks (between +1 and +2, between +4 +5, and between +6 and +7). Since 3-breaks are rare compared to 2-breaks, we can obtain reasonable distance functions without them, and so 3-breaks are not covered in this chapter.
Coursera Week 5
The Exercise Break in the section “Breakpoint Graphs” suggests that the only genome that forms a trivial breakpoint graph with a genome P is P itself. But I can find another genome satisfying this condition!
Can you give an example of how GraphToGenome should work?
-
Every even number 2x is a black node head, and every even number 2x−1 is a black node tail; -
every colored edge is composed by two numbers representing black heads or tails.
-
(2,4) ends with a 4 (even), and the only edge that starts with 4−1=3 is (3,6); -
(3,6) ends with a 6 (even), and the only edge that starts with 6−1=5 is (5,1); -
(5,1) ends with a 1 (odd), and the only edge that starts with 1+1=2 is (2,4), which brings us back where we started, thus forming a cycle.
-
(7,9) ends with a 9 (odd), and the only edge that starts with 9+1=10 is (10,12); -
(10,12) ends with a 12 (even), and the only edge that starts with 12−1=11 is (11,8); -
(11,8) ends with a 8 (even), and the only edge that starts with 8−1=7 is (7,9), which brings us back where we started, thus forming a cycle.
We refuted the Random Breakage Model by assuming that human and mouse genomes have circular chromosomes. But don't these genomes have linear chromosomes?
Why do we ignore small diagonals when constructing synteny blocks?
How can we account for mutations when constructing synteny blocks?
Our algorithm for constructing synteny blocks, which is based on shared k-mers, does account for mutations. For example, even though the two "genes" ACTGAGTTC and ACTGGGTTC differ from each other by a mutation (A -> G), the genomic dot-plot with k = 3 will reveal that they form a single synteny block; construct this dot-plot and see for yourself!
Modern programs for constructing synteny blocks use dot-pots representing all local alignments (with scores exceeding a threshold) rather than all shared k-mers between the two genomes. However, constructing all such local alignments for long genomes is a time-consuming task.
How do reverse palindromes affect the construction of genomic dot-plots?
Can we use sorting to solve the Shared k-mers Problem?
In "DETOUR: Sorting Linear Permutations by Reversals", why do we need the complex formula for reversal distance if we have a simpler formula for the 2-break distance for linear chromosomes?
2-breaks include reversals, but not every 2-break is a reversal. For example, one 2-break on the linear chromosome (+a +b +c +d +e) may yield a fission operation, resulting in the linear chromosome (+a +b +e) and the circular chromosome (+c +d).