Bioinformatics Algorithms FAQ: Chapter 6

Indeed, mouse and human have a common ancestor from which they have both evolved. Yet when we construct a scenario consisting of n rearrangements transforming the mouse genome into the human genome, the first x rearrangements represent a transformation of the mouse genome into the ancestor genome (going back in time) and the last n-x rearrangements represent a transformation from the ancestor to the human genome. This relies on the fact that the rearrangements we consider are invertible, e.g., the inverse operation of a reversal is a reversal.

In a Poisson distribution, we assume that some event is happening on average λ times within a given interval of fixed length, with no relationship between the occurrences. That is, if we look at a given interval, we will see on average λ occurrences, but there may be any finite number of occurrences in practice. If the Random Breakage Model is true, then the Poisson distribution offers a good model for the number of breakpoints that occur in a given interval of the genome.

Notation adapted from http://stats.stackexchange.com/questions/2092/relationship-between-poisson-and-exponential-distribution

A permutation is a specific ordering of the positive integers from 1 to n, where each element is used exactly once. For example, there are six permutations of length 3:

(1 2 3) (1 3 2) (2 1 3) (2 3 1) (3 1 2) (3 2 1)

In this book, we often use the term "permutation" as shorthand for a signed permutation, in which each element has a sign, or orientation (represented as a "+" or "-"). You can verify that there are 48 signed permutations of length 3.

You can label repeated elements in the first genome using subscripts so that each synteny block appears just once, e.g., (+a +b1 +c +b2). You can then label the second genome either as (+a -b1 +b2 -c) or as (+a -b2 +b1 -c) and compute the 2-break distance from (+a +b1 +c +b2) to each of the two resulting genomes, selecting the one that results in the minimum 2-break distance as the best labeling.

The problem with this approach is that the number of re-labelings of a permutation with duplicated elements may grow very quickly. Furthermore, this approach only works when the number of copies of the same synteny block in each of genome is the same.

The easiest way to deal with synteny blocks that appear in one genome and not another is to ignore them and consider only those blocks common to both genomes, e.g., in this case to compare (+a +b +c) with (+c -b +a). It is also possible to incorporate insertions and deletions into genome rearrangement studies, providing some penalty for the insertion/deletion of a single block, or a penalty for the insertion/deletion of a series of contiguous blocks. Various research papers have attempted to expand genome rearrangement metrics to account for insertions and deletions.

Given a permutation P and a reversal ρ, we denote the genome resulting from applying ρ to P as P*ρ. A reversal ρ is called P-valid if the reversal distance of P*ρ is smaller than the 2-break distance of P. The following recurrence relation computes NumberOfScenarios(P), the number of different reversal scenarios that transform a genome P into the identity permutation using the minimum number of reversals:

Equation: NumberOfScenarios(P) = Σall P-valid 2-breaks ρNumberOfScenarios(P*ρ)

The pair (+4 +3) forms a breakpoint because, in contrast to (-4 -3), it cannot be transformed into (+3 +4), a desirable pair when sorting by reversals, by a single reversal. For example, applying a reversal to

(+1 +2 +4 +3 +5 +6)

transforms this permutation into

(+1 +2 -3 -4 +5 +6),

but applying a reversal to

(+1 +2 -4 -3 +5 +6)

transforms it into the identity permutation

(+1 +2 +3 +4 +5 +6).

To better understand why (+4 +3) is a breakpoint, try sorting the permutation (+6 +5 +4 +3 +2 +1) – you will see that it requires many reversals!

Yes! For example, a transposition moves a segment from one location in the genome to another. For example, one transposition applied to the blue region of the chromosome (+1 +2 +3 +4 +5 +6 +7) yields (+1 +5 +6 +2 +3 +4 +7). However, transpositions are more rare than reversals and other rearrangements discussed in the chapter.

Transpositions represent an example of a 3-break, a rearrangement that requires 3 rather than 2 breaks (between +1 and +2, between +4 +5, and between +6 and +7). Since 3-breaks are rare compared to 2-breaks, we can obtain reasonable distance functions without them, and so 3-breaks are not covered in this chapter.

Bioinformatics

Algorithms

Since humans did not descend directly from mice, why do we nevertheless analyze a rearrangement scenario transforming mouse into human?

Why does the Random Breakage Model result in an exponential distribution of synteny blocks?

What is a permutation?

How do we compare genomes where some synteny blocks appear in multiple copies, such as (+a +b +c +b) and (+a -b +b -c)?

How do we compare genomes with different numbers of synteny blocks, such as (+a +b +c) and (+c -b +a -d)?

How can we conclude that there are 1,070 different seven-step scenarios to transform the mouse X chromosome into the human X chromosome by reversals?

Why does the pair (+4 +3) form a breakpoint but the pair (-4 -3) does not?

Are there other types of genome rearrangements other than reversals, translocations, fusions, and fissions?

FAQ Chapter 6

Are There Fragile Regions in the Human Genome?