Question

2. You are working to sequence and annotate a new species of bacteria recently discovered in...

2. You are working to sequence and annotate a new species of bacteria recently discovered in a soil sample. After the latest round of assembly, you are specifically looking to annotate contig #18:

>Contig18
GTTAAATATTCTATAGCTAATTAAACCTAACAACTATGGTTTCCCCTACAACACCAATATCGTATACGTT
ATTACCAGATTTTTTCCACCCATTTTCAAGTTTAACCTCTTTGTCATATAGTCTGTAATTTCTGGAAAAC
ACATTTCTTTGCATTAACACCTCTGACCACATCCAATCATTGTTAATAATGCGTGGTATTAACTCTCTCA
TTAAAGGATGCTTTATTACTATGTTTTCATTTATTGGTGCATACGGTTCTGTGCCAATGAATTTTATATT
TTTTTTGTCTCTTCCAAATCCAAGATAATCTATGTCTTGAGATATTCTATTTACAATGCTTTCCTCAAGC
TGAAACTGTGCATTTATGGCATTGTAAGCACCATAAGAAAATATTGTTGATATTAAAAGAATAAAAGAAA
AATATATTCTTGATATTAACTGTTTATCTTCAAAAGCATAGAATACGCATAGGCAACAAAAAAACATAAA
GCCACCCATACCAATCAATACCCTCGGTGCGTATATTGGTGATTTTAGAAAAATCATTGGTCCAATGATG
AAGAACATTGATGCTAATAAAATTAAAACTACTAGCAATAACTTTGTTTTCTTATTTTCATCTCTTTTGA
TTGCTTTTAAAACTATGACTATCAAAGAAATGATTAGCGCAAAGAATAGCGAGTAGTAGATTAAGTAATT
ATCGCCATTCAAGATCGTGCTAAACATTCTATAAAATGATAAGACGTTAGAAATTATCCCTTCAAATAAA
CTTGAGTTTATCTCTATAATCTTACTATGTTCGATATTGTAAGAACCTGTTACAAGTCTTTTTGCAATAA
AGTAAGAATAGGCAAAATATCCTACTATTAAACCAGCGACAGAAGATGCTGTATTTTTTGTGATATTTGA
AATTGAGTTTTTCTTAACCACATCTGAAATTATAAAGGCCAACAAGAATATTGCGTAAGTATTCAGCGCA
GCCTGATAAAGACTAAGGAATGCAATGGTTAAAATGGATGATATTATGATATTTATAGGCTTGTATTGAT
AAGCGACATACGATGAGATAATAGATATTGCCACACTCATGCACATTGTTAATGAATCATATCTATATGA
TAGATTTTCAATAAAGAATGGGTTTGCCAAAATCATCATAAAACAAAGAGATGCTGTGATGTAGTCATCT
CCAAACAGCTTTTCCCTGATGCAGGATAGTGCCAATGCTAAAATAACTATCCCTAGCATTAAAGGTAGCG
GAGAAGCATCTATAATTGGGGTTCCAAAATTAATGATATAGAAAATAAAGTCGGAAAGTGGGCGACCATT
GCCTGACCAACCCAACCCGCCATATAAAGACCTACCCAAGTCATCAACGAAAAATGATTGATGTGTCAAT
AAAGGAAATGTATATATAATCGCCAATCCAAGAAAGATTGATATAAATATCCTGTCATTACTATTAAATT
TCACTTTTAAAACCCTTACGCTTTAATATGTATTTAGGCCGCTGTTTGGTTTCTATGTAAATTCTACCAA
TATATTCTCCAAGAATACCTATTCCTATCAATTGAACGCCACCCAGGAAAAGAACAGAAACAAGAAGAGA
CGGGTAGCCAGGAACATTATTTCCAAATATTAATTTATCAATAATCATCCATGCACCGTAAAGGAATGAC
ATACCTGCAATAAACAATCCAATGTAAGTCCATATGCGGAGCGGAAATGTTGAGAAAGAAGTTATTCCCT
CCAGCGCCAGGTTCCATAATTTCCAGCCGTTGAATTTCGAATCACCGGCCACGCGTTCGGCACGGGCATA
TTTAACAACATCCGTTTTTCCGCCAACCCAACTGAGCACACCCTTCATAAACAAGTTGCGTTCTGGCATT
TGTTTGATGTTCTCGACAACCGCACGGCTCATTAACCGAAAGTCGCCAACATTTTCTTCGATTTTTGGAT
TGCTGATTTTATTGTGCAGCTTATAAAACCACTCAGCTGTCTTACGCTTCATGCGCCCGTCAGTTGAGCG
GTCTGAGCGCTTAGCCAGCACCATATCCGCGCCAGCCTGCCACTTCTCAATGAGATGAGGGATAACTTCT
ATCGGATCCTGTAAATCGACATCAATAGGAATGACCGCATCCCCGGTTGCATGGTCGAGACCCGCGAAAA
GAGCAGGTTCTTTACCGAAGTTTCGCGTAAACGAAAGCGGAATAACGAGCGGATCAGATGCAGCTATTTT
GTTAATTATTGATTCAGTCGCATCTTTACTACCATCATTAATAAAAACGATCTCAATTTCATATTCTTTT
AGCTCATTAAACTCACGTACCGTTTTATAGAAAATCGGTATCGTGTCTTCTTCGTTAAAAACTGGAACGA
CAAGAGAGATTTTCATCTTATATCCCTGAAAACAATGAATCTGGAATAGATAAAGCCGCATACCAGGCTA
ATTGCCGAGAAAGTGATAAGGGTAATCAATGGTGGCAAGGAACATTGGTCAGCCATCCAGCCAACAACAG
CGCTCAGTGTTCCCATGAATCCCACATACATCATGTAGCGAAGCGTGGTGGTGGTGGCATTAAAGGTGAA
ACGCGCATTGGCATAGAAGCTGAACGATACGGCGATAACAAAACCGGAAAAGTTCGCCAGCGCCTGATGC
GTATGCATCCCATACACACAAAAAGCAAATACGCCCCAATGAATAAGCGTGTTAAGAACACCGATCGATG
TGTACTTAGCGAATAACTTCAACATTATGAAAATCAGCGGATTCGGAAAGGTCTGAAGTGTAGCACTACA
AATTGTTTTGATCGATACAAGCGATCAATAATGTATAATTTGATAGTTTTTATCTATATAATGCATGTTA
ATTGATCGTTGTTACCGATCAATTTTTATTGCTGATTGCTAAGTGGTTTGGGACAAAAATGGGACATACA
AATCTTTGCATCGGTTTGCAAGGCTTTGCATGTCTTTCGAAGATGGGACGTGTGAGCGCAGGTATGACGT
GGTATGTTGTTGACTTAAAAGGTAGTTCTTATAATTCGTAATGCGAAGGTCGTAGGTTCGACTCCTATTA
TCGGCACCAGTTAAATCAAATACTTACGTATTATTCGTGCCTTCCTTATTTTTACTGTGGGACATATTTG
GGACAGAAGTACCAAAAATCGAGTCAATTTGTCGAGCATGTTCAGTCAGGTGATTTGGTGCCAGATGAGC
ATATCGGCGAACCATTTCGATAGACTCCCAGCCACCCATTTCCTGCAATACCGAAATCGGAACGCCAGCC
TGAACTAACCAACTTGCCCACGTGTGCCTCAGGTCATGAAAACGGAAGTCTTCAATGCCCGCTCGTTTTA
ATGCTGCCCTCCATGCAGTATTAGCGTCATAGCGCATCTTCCTCACTACAGGTGATTTAGTTCCGTCTGG
TTTGGTGCTGCTTTCCTTGTAGACGAACACCCATTTGTGATGATTGCCGATTTGCTTTTTCAGCACCCGG
CAAGCGGTATCATTCAGCGCCACTCCAATGGCATGATTAGACTTGCTTTGTTCCGGGTGTATCCATGCCA
CCTTTCGTTGCATGTCTATCTGCTGCCACTCCAGATTGATAATGTTAGACCGCCTTAAGCCAGTAGAAAG
CGCAAACTCTACGACTGACTTTAGCGGTTCCTGGCATTCATCAATCAACCTTTTTGCCTCGTGAGGCTCA
AGCCAGCGGATACGCTTATTTTTCGGCTGAGGAACTTTGATGATCGGAGCCTTATCCAGCATCTTCCATT
CGCGTTCAGCAGCCCGGAGGAGTGCCTTAATGAATGAAAGGTGAGTTGCTTTTGTAGCTACTGCTGCCGG
CTTAGGCTTGAATACCGGAGGCTGCTTCCCATTCTTCCTGCAAGCTTCATCCATTAACTTCCAGTTTTCC
TCATGCCGCCGATTAGTTATCTTCTGGATGGCGGAGTAAATCTTCGTCTCGGTAATATCCTTCAACTGCA
TTCCTGCAAAATGCTGGAGCCAGAATCCTATCCGACTCTTGTCATCATCCAGCGACTTCTTATGCGCCTT
CTCCTCTAACCACCTGACACAGGCCCCCTCAAAAGTCATGTCAGGCGTCTCTCCTAATTTACTTACCCTC
CATGCTTCTGCCTTCAGCTTGTCATGAAGCTCTGTGGCCTGCCTTTTGTCCTTTGTCCCAAGAGACTGCT
TAAATCTTTTGCCGTTCGGCAATGTGAAACTGGCGTACCAGGTTTCACCTCTGCGGAATAGTGACATTTC
AGTTCCTCTGTTATGTCATCACCCGCGCTCACCTGGACAGTATGCAGCGGAGATTGAAGTGCCGCAATGC
AGGCTTGTCGTGTGGTGAGGTAAGGGGATTTCGGTTTGGTGGGGTCTTTACGTGTTGCCTGTAGTCGGCC
TGTGCGAATCCAGTTGGTGGCGGTAGGTCTGGATATCTTGAGAAATGCACAGGCCTCATCAAGTGTGAGG
CTGTGTGATTCCATGTTTACTCCGCTGTTTCTTCTTCGTCTTCTTTTGCGTTAGGCATGTCGTAGAATTG
CCCGTAAGTTATTTTCTTGAATGCATCAGGGATAACAACTACGCCATGCCTTTCTTCTTTGTTATTTGGT
ATTGCAAAAATAAGACAATCATCGCGTTGCGGGTGCTTGCCGCCGTACGTTGATAACATAACGAAACCAA
AGCCACGGCCCGATTGACCACCAATTCCTGTACGCATAATCCCGTAGTGGTTGGTTATGTAGTAATTCCA
TTCAGGCAAGGATTTTAGCTTGGCGTTAGCGTTATGCATGATTGCATCCAGCTCTTTGTTGTATGCGCGG
CCTTCCTTTGTGTTTCCCTTTCCTCGCGCTATCACAACTCTCTTCCCGTCCAAAAAATCCTCGCGTTTGA
TTGTTATCTGGCATGGGAATTCATATCCTTTTTCCCAAACGAAGCTTTGTAGAAGTCCGCCTTCTCCACC
CCAGCTACGGGCTGTAGTCCATGCGATTGCGCCAACCTTTTCAGCGGCTGTGGTTAGGATTGAATTTCGT
TGATCGTTAATGGTGTCGTATGACTGGATAAGCTCCTTAACATCCTCACCCTCTACCATGTAGTAGTCGT
AATATTTGCTCTGGCCTGACATTTATTGTCTCCAATAAAAAACCGCCATCAGGCGGCTTGGTGTTCTTTC
AGTTCTTCAATTCGAATATTGGTTACATTGTTTTCATATATGAATAAATAAATTAGCTTTTTTCGTTGCC
TTTGCGTTCCTTATTAATTCTGACAAACTCGTTTTTACCACGCTCTCCAAATGCGTCTTTAGAGTCGTTG
TATCCGCAATCGCAGCACACATAATCATCAGACCATCCACGCATTGTTTTTTCTTTTGCAATATTTCCAG
AACCGCATTTTGGACAAGACATGTCACTACCTCCAAAGCATGAGTGAGATGACAACGTAACATTGATTGG
AGATTAACAATAGATTGCTGATGTAAAAGATATGTATAAGCTTCGCTATCAAAGGGGAGGATCTGGTAGC
TGCATCCAGTGGCTTACACCGATAATTTCCATACCCTCCCAATAGTCAAAGAACCCATCATCGTCGTATG
TAGCAACGAACATCCCCTGACCCAGACATTTTCCGGTAAAAATTGCGATGGGTTTAGATTCATCATTATC
TGGCATTCGCTCACTACAGCTTATCCAGCCACCCGGAATTACCGGAGAGTTGCCTGCCAGCGCTTCTTGC
AGTCGTTCAAGCTTCACGTATTCCTGAACGCAAGTTCCCGAGTAGTTATTTATCCAGATGGTCGCCTTTT
CTGGGTCAGGCGTATAAGTAACCACTTCTCCCGACTGCCAGCACAGGGCGTACAGGTCAGCGACTGCCTT
AACCTGTGCGTATGGCAACTCGGCAGGGCATTCCTCCGGCACTACCGACTCTGGCTGCTCTTTGATATGC
AGTCGTGGCTCGCCGTCTTTCGGTTCAGGCCACTGGCGTGTTTTGTTTATCTCCAGTTTTTCAATCATCG
CCCTCGTAATGAATTCGTCGGAAATTCCCATGCGCCGCTGGGCATCCCACAATAAAAACTGCATATCAGC
CCATTCAAGCGGGTCTGATGGGTCGGCAGCGGCTTCCAGCGCTTCTTTCGAAAGATGTTTAAGCGGGCCG
ACTGGACCAACATCGCCGAACGTCTTATCTGACCACTCGGCGTGCTCACGGCGAATACGTTCGCGTTCTG
GCACTGGCGGGGCGGCGTAGAGCGGTATATACACGGCAACATCATCAGCAGCATTTGGCTGCTGCTCTAA
TGTCACGCATGTACCGGAAAATTTATTCAGGTATCGCACAGTCTCAGCATCCAGCGATGCCAGCGCAATC
CGTGCCAGCTCCATTTGTTCGCCACGGGTAAGCCCGTTTTCAAGCGGCGATTTAACGAACAATTCAATAC
GTTCTTTGGTAATAGTGGTCATGGGTTAGTCCTCAACGCTGATATCAACGGCCACTTTCATTCTCCCGGC
AGAAACTTCAAAACCGGTGACATCCGCATTAAGCATGTATTCAGATATAACCAGCGCAAGAAGTTTTAAC
TTCGCGTCTGTATCGTTACCGTTCAATTCTTCAAGAAGCTCAATAACTGGCTTCATATGTTCACCAATCT
TCATGCTCATTCCCCCTTAACCTTGATGCCAGCGCGTGTGCTATATGCAGACATGCACTGCGTGAACCCG
GATTGGTCATCTGTCTGCCCATAACTGAACCCTGCTTTCAGGCCGTCACGGAATGCGCCATCCTGCAACT
TGTCGTCAGTTTCAAGCTTCGCCTCCAGTTCTTGAATACGCTGGCGTAACGCTGTAATTTCCACCTCAGC
AGCGTCTGCGTAATGAACGTTTTCATGCTCAAGTGGTGGTAAATCCGGCGTAATGACACCAAAAAGTTTT
GCCAGCGCCCGGTAATTCAGTTCGCTGTGATAACGACCTTTGCAGCGAACCAGTTTTTCAGCAGCAGCTA
CAATCGCGCTTTGTTCTGTCATGCGCTTTTTTGCTGCTTCCAGCTCAACTCGCAGCTTCCCAACCGTAAG
CGCAATATCCTCGTCCTCCTGGTCGCGGAGTTTGATGTATTGCTGTTTTTTATCCAGCTCATCCAGCAGC
GCCTCTGCGGCGATATAAATAACCTGGCGTGCACGGTCTGCTGGGTCGCTGTAATGGTCTTGCATGTACT
GGAATTCTTCACGCAGCGCCTGTTTGTCGATGTTGCTCATTGGGCTGTCTCCGGTGGATAACAAATATCG
TCGAAATATTTTTCTGCAACGCACATGTTGAAGTGATCGAGATTCATCTCCTCCACCTGGAGTTTTGCCC
CAACAATACCCGTGCATCGATTGACGTAATCCCGATTTTCTGGGGATTCCGCTACCCACTCCATAAGGTC
TTCGGTGACACGTTTTAAGCAACGTAAGGCGCAGTCCATATCAGTAAAATGCTGAGCATCTGTGATGCAG
GAGACGACATAATACGTGGTGATTTTTGGCCCTTCAGCGCGTCGCTTAAGTTCTCTCTCTATATCTCTTT
TCAGATCAACCAGTTCATGGTCATTGAGTTTGTCGATGTTGCTCATTGGGCTGGCCCTCGCATTTGTGAT
TTTCTGGATCATCGGCTTTGAAATAACCGCCGCAGATTTTGCAGGGTATCGTCGGCACTTCGTCGTAATT
TGAGGTTCCCGTAATCATGACTGCACTCCTTTGCGAAGCTGGGCGGCGATATCTTCGATAACGCCATCGG
CGAATGAGCGATCAAAATCGCCTTCCGGTGCATCAGCCATAAATTCTGTGGAGGTCAGTATCATTCGTGC
GATGTCCGCAGCGTTCTTTGCTGTGTCGTCGATAAATCCTGCATCCCATGCGGCCAGCATTCGGTTAGCA
ACAAAGTAAGCGCCTTCCTTGTGAGCCTGCGCCCGCATCTCCGCCAGAAAGGCGTCTGTCGCCGGGGTCT
CAGTGAAATCGTCTACCCACGTATCGCCAACGTCCTCGCACTCGTGACGACAATAATCGTTGAATTCGAC
CTCTGATTTTTTCAGTGCCCCATTCTCCACCACCAGTGCCGCGCATTTCGCCTCCAGTTCCCGATAATCC
GACGCCAGCACCAGATCCACACAGAACGATTCAGACTGTACCGGTGGGGATAAATCTGATGGGGATGCGG
TGTAAATTTTTACCTGTTGCATTTATCTTTCCTCAGTATCGCATTCAAATATTTATTCTCGTTAATAGAA
GGAAATGAATTGCGCTGCAATAATTCTTCGCGTGTAGGCATTGGTTTAATTTTGTGCCTAATAATAAGTT
CGGCTGGTAGAATGTCGGGATTGTATGCAAGTCCTCTCATCGTAAACTCCTCAGTTATTGCTGATAGCTC
CGTAACGCGAACGGTAATCACGAAGACGCGGGTCTATTTCAATGAATTTGGTGTAAGTGGCTTTGCGGAA
TGGCCGGATGGCTGTCTGGTAAATTCGCTCGCGTTCTTCTTTCTCTGCAAGCCATATACAGTGGCGAAAT
TCCTTTTCCTCTTTCGTTTCCTGCGGTAGTGACATTATCAGGTCGTAGTTTTTTCTGAATTTATCCAGCA
CCTCCGATACGGAATTGCCGGAACAGCGGCGCGGGTCATCCGCACCATACAAAGGCGCTGGCATAATTTA
CTCCAGGGTAGGTTATCCGAATAATGTGGTACGTATAGGGTTATTTCTTTCGTAAACGTGATAGCCTGCT
TTTTACCGACTCTTCACTTCGCCCGAGAATTTTTGCTACATTTCTTTGTGTATAGCCTGATGAGATAAGC
GTCTGCATTCTTTTGTCTTCGTCGTCGCTCCATCTTGGCTTAATGAATGCCGTTTTTAATGACAGTTTTT
TTGCTATGTAATAAAACTGATTTATGTTTAGGCCCAGATGTTCTGCTGCACGGCAAGCTACCATGCGACC
GCAAACTGACTCCATCTCCGCTGTAGTTATGTTTAATCTTCTCATTAAGCCACCTGTTTAAGCTCATTTA
TTCTGATATTCATTACCTGAACGCATTTTGTCTGCTCATCATCGTGACCAGTCAATAATTGCCAGTCGTG
CTGGTATCTCTCAATTAGCTTTTTCTTGTCCGTTTCTGTTGCTGCATATTCACTGAATGCCTTAAGAACC
TGTTCAGGAGCAGGGGAGGAAGGCGATGATTTAGTTTGCTTAGCAGGTGCTGCATTCTGCTGCTGTTTGT
GCTCCTCAGTATCAGCGTCTTTGGCGTCGTCGATACCAAACAAACCGTTAAGGCAATATTTGCGAGCGTA
AGAGCTTGTAGCGCCCGTTACCTGAGCTGCATCCATTCCCTTCTTGTTTTCTTCTTCTCGCGCTATAGCG

As a first pass annotation, you plan to consider all open reading frames (ORFs).

(2pt) (a) Describe the pattern that an ORF finder looks for in bacterial sequences. For full credit, describe how many frames must be considered in your search.

(2pt) (b) Using NCBI’s orf finder (https://www.ncbi.nlm.nih.gov/orffinder/) Copy and paste your DNA sequence in FASTA format into the search box. Set the minimal ORF length to 30 a.a., the genetic code to “standard”, the start codon to ‘ATG only’, and ignore nested ORFs. Under these settings, how many ORFs are predicted to reside in Contig18?

(2pt) (c) What is the reasoning behind ignoring nested ORFs? [Note that you can select individual ORFs from either the list on the right or by clicking directly on the red/pink boxes in the sequence browser at the top.]

(2pt) (d) What is the longest ORF on the negative strand? (report frame, start, stop and length in amino acids)

(2pt) (e) What is the longest ORF on the positive strand? (report frame, start, stop and length in amino acids)

Homework Answers

Answer #1

(a) ORF finder finds the ORF in bacterial genome using standard genetic code. Three frames are considered in my search.

(b) 37 ORF are predicted to reside in Contig18.

(c) Because we do not find ORF that are placed completely within another because this type of ORF are not found in bacteria.

(d) ORF NO. 28 is the longest ORF on the negative strand. Frame is 2, 1249 and 17 is the start and stop codon and length of protein is 410.

(e) ORF NO. 10 is the longest ORF on the negative strand. Frame is 2, 8342,8569 is the start and stop codon and 75 is the length of protein

Know the answer?
Your Answer:

Post as a guest

Your Name:

What's your source?

Earn Coins

Coins can be redeemed for fabulous gifts.

Not the answer you're looking for?
Ask your own homework help question
Similar Questions