Reference no: EM133703684
Fix the following issues in the program:
1)Infinite Loop: The while loop in the Find_it method could potentially result in an infinite loop.
2)Incorrect Regular Expression: The backward_pattern regular expression re.compile(r'AAAA') seems incorrect for finding patterns in DNA sequences.
3)Inefficient String Copying: In the while loop, sequence = sequence[:] is attempting to make a copy of the sequence string.
import re
class FindSomething(object):
forward_pattern = re.compile(r'ATG(...)*(TGA|TAA|TAG)')
backward_pattern = re.compile(r'AAAA') # hmmm, this doesn't seem like it could be right
def __init__(self):
pass
def Find_it(self, sequence, mode=True):
found_things = []
if mode:
pattern = FindSomething.forward_pattern
else:
pattern = FindSomething.backward_pattern
count = 0
match = pattern.search(sequence)
while match:
count += 1
if count > 25: # This is here just to avoid inadvertent infinite loops while you are testing
print ('Quitting... too many times through the while loop!') # and getting things figured out!
print ("Here's what I found before quiting:", found_things)
exit()
found_things.append(match.group())
next_position = match.start() + 1
sequence = sequence[:] # Maybe something simple is missing here??
match = pattern.search(sequence)
return found_things
def main():
a = FindSomething()
print (a.Find_it('ggg aug aaa ugu ucc cgg uaa aug aau gcc cgg gaa auu uag ccu gac aug a', mode=True))
print (a.Find_it('ggg aug aaa ugu ucc cgg uaa aug aau gcc cgg gaa auu uag ccu gac aug a', mode=False))
if __name__ == "__main__":
main()
3F) Assuming that you have fixed the above problems, try running the program. Does it behave the way that you expected? Probably, you are getting a message now that it is quitting because it has looped too many times. However, look (very carefully!) to see if it did (repeatedly) find matches what you had predicted based upon your biological analysis of the regular expressions. If not, describe exactly how the sequence you observe varies from your expectations.
3G) Armed with your knowledge of the behavior of the regular expression engine, and what you know of the regular expression mini language, can you suggest a simple fix to the forward_pattern regular expression that will make the result comport more exactly with your biological expectations? If so, make the change, and explain to me here the pythonic significance of the change you made. Further, tell me how the pythonic change correlates with the biological issue you identified in question 3F above.
3H) Our program is still just repeatedly finding a single pattern before bombing out of the while loop. However, the code in the while loop strongly hints at an approach that will allow the identification of multiple hits (why else would we use a loop unless we intended to find multiple things?). Indeed, it turns out that there is just a small problem with the implementation that is preventing the identification of multiple hits here. Look closely, then describe here the approach you think the code is suggesting, but not quite pulling off...
3I) Can you spot and fix the issue with the implementation of the idea above? If so, incorporate it into your code and describe it for me here. What results are you getting now when you run? Describe, especially with respect to number and nature.
3J) If you're still hanging in there - ponder the significance of the mode variable. Given what we've by now groked about forward_pattern, both with respect to biology and Python, and keeping in mind the nature of DNA, what behaviour do you think mode is seeking to control? Describe the relationship between mode and backward_pattern.
3K) Modify backward_pattern so that it does what you think it should (reason by analogy here). Once you are content with your efforts, run the program one last time and describe for me what you think it is now doing.
3L) If you've done everything right, we should now have some reasonable working code. However, thinking in terms of python, cast an eyeball once again at the algorithm we have employed, and describe for me any shortcomings that occur to you, especially with respect to efficiency. Hint: strings are an immutable type.