Quantcast
Channel: Protein sequence pattern-matching python - Stack Overflow
Viewing all articles
Browse latest Browse all 3

Protein sequence pattern-matching python

$
0
0

I'm working on a matching algorithm for protein sequences. I'm starting with an aligned protein sequence, and I am attempting to convert a mis-aligned sequence into the correctly aligned one. Here is an example:

original aligned sequence: ----AB--C-D-----

mis-aligned sequence: --A--BC---D-

The expected output should look like this:

original aligned sequence: ----AB--C-D-----

mis-aligned sequence: ----AB--C-D----- (both are now the same)

I'm told to be very specific about my problem, but the sequences I'm trying to match are >4000 characters long, and look ridiculous when pasted here. I'll post two sequences representative of my problem, though, and that should do.

seq="---A-A--AA---A--"newseq="AA---A--A-----A-----"seq=list(seq) #changing maaster sequence from string to listnewseq=list(newseq) #changing new sequence from string to listn=len(seq) #obtaining length of master sequencenewseq.extend('.') #adding a tag to end of new sequence to account for terminal gapsprint(seq, newseq,n) #verification of sequences in list form and lengthfor i in range(n)    if seq[i]!=newseq[i]:        if seq[i] != '-': #gap deletion            del newseq[i]        elif newseq[i] != '-':            newseq.insert(i,'-') #gap insertion        elif newseq[i] == '-':            del newseq[i]old=''.join(seq) #changing list to stringnew=''.join(newseq) #changing list to stringnew=new.strip('.') #removing tagprint(old) #verification of master-sequence fidelityprint(new) #verification of matching sequence

The output I get from this particular code and set of sequences is:

---A-A--AA---A--

---A-A--A----A-----A-----

I can't seem to get the loop to correctly delete unwanted dashes in between the letters more than once, because the rest of the loop iterations are used in an add dash/delete dash pair.
This is a good start to the problems here.

How can I write this loop successfully to obtain my desired output (two identical sequences)


Viewing all articles
Browse latest Browse all 3

Latest Images

Trending Articles





Latest Images