Coursera Week 3
How do I expand the list of candidate peptides in CyclopeptideSequencing?
I've noticed a discrepancy between the mass of an amino acid cited in the book and in other sources. Why is this?
Coursera Week 4
How can I improve the performance of LeaderboardCyclopeptideSequencing?
How can I trim the peptide leaderboard without sorting?
To trim a peptide leaderboard without using sorting, we will first compute an array ScoreHistogram, where ScoreHistogram(i) holds the number of peptides in Leaderboard with score i. For example, if we are trimming the leaderboard from Charging Station: Trimming the Peptide Leaderboard to N = 5 peptides (including ties), then ScoreHistogram = ScoreHistogram = (0, 0, 2, 1, 3, 2, 2).
As a result, 2 + 2 + 3 = 7 peptides will be retained and the remaining 0 + 0 + 2 + 1 = 3 peptides will be trimmed. Here, the minimum score that a peptide can have without being cut is denoted ScoreThresholdN(Spectrum).
Assuming that N is smaller than the number of elements on Leaderboard, note that the number of peptides cut is at most |Leaderboard| - N. In order to compute ScoreThresholdN(Spectrum), we need to find the index i such that the sum of the first i elements in ScoreHistogram is at most |Leaderboard| - N and the sum of the first i + 1 elements in ScoreHistogram exceeds |Leaderboard| - N. To find this index, we will compute CumulativeHistogram, where CumulativeHistogram(i) holds the number of peptides in Leaderboard with score below i.
For our ongoing example, CumulativeHistogram = (0, 0, 2, 3, 6, 8, 10). This leads us to the following pseudocode.