Randomized Motif Search Input: Integers k and t, followed by a collection of strings Dna. Output:...

Question

Question

Randomized Motif Search Input: Integers k and t, followed by a collection of strings Dna. Output:...

Randomized Motif Search

Input: Integers k and t, followed by a collection of strings Dna.

Output: A collection BestMotifs resulting from running RandomizedMotifSearch(Dna, k, t)

1,000 times. Remember to use pseudocounts!

Pseudocode

RandomizedMotifSearch(Dna, k, t):

Motifs ← empty list

for each sequence seq in Dna

add randomly selected k-mer from seq to Motifs

BestMotifs ← Motifs

while forever

Profile ← Profile(Motifs)

Motifs ← Motifs(Profile, Dna)

if Score(Motifs) < Score(BestMotifs)

BestMotifs ← Motifs

else

return BestMotifs

SAMPLE DATASET:

Input:

8 5

CGCCCCTCTCGGGGGTGTTCAGTAAACGGCCA

GGGCGAGGTATGTGTAAGTGCCAAGGTGCCAG

TAGTACCGAGACCGAAAGAAGTATACAGGCGT

TAGATCAAGTTTCAGGTGCACGTCGGTGAACC

AATCCACCAGCTCCACGTGCAATGTTGGCCTA

Output:

TCTCGGGG

CCAAGGTG

TACAGGCG

TTCAGGTG

TCCACGTG

The sample dataset is not actually run on your code.

TEST DATASET 1:

Input:

6 8

AATTGGCACATCATTATCGATAACGATTCGCCGCATTGCC

GGTTAACATCGAATAACTGACACCTGCTCTGGCACCGCTC

AATTGGCGGCGGTATAGCCAGATAGTGCCAATAATTTCCT

GGTTAATGGTGAAGTGTGGGTTATGGGGAAAGGCAGACTG

AATTGGACGGCAACTACGGTTACAACGCAGCAAGAATATT

GGTTAACTGTTGTTGCTAACACCGTTAAGCGACGGCAACT

AATTGGCCAACGTAGGCGCGGCTTGGCATCTCGGTGTGTG

GGTTAAAAGGCGCATCTTACTCTTTTCGCTTTCAAAAAAA

Output:

CGATAA

GGTTAA

GGTATA

GGTTAA

GGTTAC

GGTTAA

GGCCAA

GGTTAA

This dataset checks if your code has an off-by-one error at the beginning of each

sequence of Dna. Notice that the some of the motifs of the solution occur at the beginning of

their respective sequences in Dna, so if your code did not check the first k-mer in each sequence

of Dna, it would not find these sequences.

TEST DATASET 2:

Input:

6 8

GCACATCATTAAACGATTCGCCGCATTGCCTCGATTAACC

TCATAACTGACACCTGCTCTGGCACCGCTCATCCAAGGCC

AAGCGGGTATAGCCAGATAGTGCCAATAATTTCCTTAACC

AGTCGGTGGTGAAGTGTGGGTTATGGGGAAAGGCAAGGCC

AACCGGACGGCAACTACGGTTACAACGCAGCAAGTTAACC

AGGCGTCTGTTGTTGCTAACACCGTTAAGCGACGAAGGCC

AAGCTTCCAACATCGTCTTGGCATCTCGGTGTGTTTAACC

AATTGAACATCTTACTCTTTTCGCTTTCAAAAAAAAGGCC

Output:

TTAACC

ATAACT

TTAACC

TGAAGT

TTAACC

TTAAGC

TTAACC

TGAACA

This dataset checks if your code has an off-by-one error at the end of each sequence of

Dna. Notice that the some of the motifs of the solution occur at the end of their respective

sequences in Dna, so if your code did not check the last k-mer in each sequence of Dna, it would

not find these sequences.

In Java

Engineering Computer-Science

0 0

Add a comment Transcribed image text

Answer 1

Answer #1

I followed the pseudocode to the best of my knowledge. I hope this is code will help you in solving this task. Here is my code:

def BuildProfileMatrix(dnamatrix):
    ProfileMatrix = [[1 for x in xrange(len(dnamatrix[0]))] for x in xrange(4)]
    indices = {'A':0, 'C':1, 'G': 2, 'T':3}
    for seq in dnamatrix:
    for i in xrange(len(dnamatrix[0])):            
        ProfileMatrix[indices[seq[i]]][i] += 1
    ProbMatrix = [[float(x)/sum(zip(*ProfileMatrix)[0]) for x in y] for y in ProfileMatrix]
    return ProbMatrix
def ProfileRandomGenerator(profile, dna, k, i):
    indices = {'A':0, 'C':1, 'G': 2, 'T':3}
    score_list = []
    for x in xrange(len(dna[i]) - k + 1):
        probability = 1
        window = dna[i][x : k + x]
    for y in xrange(k):
        probability *= profile[indices[window[y]]][y]
    score_list.append(probability)
    rnd = uniform(0, sum(score_list))
    current = 0
    for z, bias in enumerate(score_list):
        current += bias
        if rnd <= current:
            return dna[i][z : k + z]
def score(motifs):
    ProfileMatrix = [[0 for x in xrange(len(motifs[0]))] for x in xrange(4)]
    indices = {'A':0, 'C':1, 'G': 2, 'T':3}
    for seq in motifs:
        for i in xrange(len(motifs[0])):            
            ProfileMatrix[indices[seq[i]]][i] += 1
    score = len(motifs)*len(motifs[0]) - sum([max(x) for x in zip(*ProfileMatrix)])
    return score
from random import randint, uniform    
def GibbsSampler(k, t, N):
     dna = ['CGCCCCTCTCGGGGGTGTTCAGTAACCGGCCA',
    'GGGCGAGGTATGTGTAAGTGCCAAGGTGCCAG',
    'TAGTACCGAGACCGAAAGAAGTATACAGGCGT',
    'TAGATCAAGTTTCAGGTGCACGTCGGTGAACC',
    'AATCCACCAGCTCCACGTGCAATGTTGGCCTA']
    Motifs = []
    for i in [randint(0, len(dna[0])-k) for x in range(len(dna))]:
        j = 0
        kmer = dna[j][i : k+i]
        j += 1
        Motifs.append(kmer)
    BestMotifs = []
    s_best = float('inf')
    for i in xrange(N):
        x = randint(0, t-1)
    Motifs.pop(x)
    profile = BuildProfileMatrix(Motifs)
    Motif = ProfileRandomGenerator(profile, dna, k, x)
    Motifs.append(Motif)
    s_motifs = score(Motifs)
    if s_motifs < s_best:
        s_best = s_motifs
        BestMotifs = Motifs
return [s_best, BestMotifs]

k, t, N =8, 5, 100            
best_motifs = [float('inf'), None]

# Repeat the Gibbs sampler search 20 times.
for repeat in xrange(20):
    current_motifs = GibbsSampler(k, t, N)
    if current_motifs[0] < best_motifs[0]:
        best_motifs = current_motifs
# Print and save the answer.
print '\n'.join(best_motifs[1])

0 0

Add a comment

Randomized Motif Search Input: Integers k and t, followed by a collection of strings Dna. Output:...

Homework Answers

Post as a guest

Earn Coins

Not the answer you're looking for?

Similar Questions

Need Online Homework Help?

Active Questions