[Biopython] getting alignment out of Align.PairwiseAligner

Discussion:

John Berrisford

2018-10-04 19:25:56 UTC

Hi

How do I get the alignment out of Align.PairwiseAligner?

I have the following code

aligner = Align.PairwiseAligner()

alignments = aligner.align(self.sequence1, self.sequence2)
for alignment in sorted(alignments):
logging.debug(alignment)
logging.debug(alignment.score)
logging.debug(alignment.target)
logging.debug(alignment.query)
logging.debug(alignment.path)
logging.debug(dir(alignment))

my example

Query 193 residues long.

Target 6 residues long.

out of this I can get the

alignment - which appears to be a line separated string of query, alignment,
target.

In my example:

MEKLEVGIYTRAREGEIACGDACLVKRVEGVIFLAVGDGIGHGPEAARAAEIAIASMESSMNTGLVNIFQLCHREL
RGTRGAVAALCRVDRRQGLWQAAIVGNIHVKILSAKGIITPLATPGILGYNYPHQLLIAKGSYQEGDLFLIHSDGI
QEGAVPLALLANYRLTAEELVRLIGEKYGRRDDDVAVIVAR

----------------------------------------------------------------------------
----------------------------------------------------------------------------
------------------------------|XX|XX-----

----------------------------------------------------------------------------
----------------------------------------------------------------------------
------------------------------RANDOM-----

score - the alignment score (I can also get this with aligner.score)

target - self.sequence2

query - self.sequence1

path - I think this is what I want, but I don't know how to interpret this -
it is something the following in the above example: ((0, 0), (182, 0), (188,
6), (193, 6))

is this documented somewhere?

It looks like 0-181 no alignment, 182 to 187 adds a score of 6. 188 to 193
keeps the score at 6.

when I dir(alignment) I only see the above options

['__class__', '__cmp__', '__delattr__', '__dict__', '__dir__', '__doc__',
'__eq__', '__format__', '__ge__', '__getattribute__', '__gt_

_', '__hash__', '__init__', '__le__', '__lt__', '__module__', '__ne__',
'__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__',
'__sizeof__', '__str__', '__subc

lasshook__', '__weakref__', '_format_psl', 'path', 'query', 'score',
'target']

what I'm after is the middle row of the alignment (above). Is the only
option to split alignment on carriage return?

Thanks

John

--

John Berrisford

PDBe

European Bioinformatics Institute (EMBL-EBI)

European Molecular Biology Laboratory

Wellcome Trust Genome Campus

Hinxton

Cambridge CB10 1SD UK

Tel: +44 1223 492529

<http://www.pdbe.org/> http://www.pdbe.org

<http://www.facebook.com/proteindatabank>
http://www.facebook.com/proteindatabank

<http://twitter.com/PDBeurope> http://twitter.com/PDBeurope

Peter Cock

2018-10-05 15:08:39 UTC

Permalink

Yes, if you look at the code which makes that string
it does it via the path structure:

https://github.com/biopython/biopython/blob/biopython-172/Bio/Align/__init__.py#L991

What do you want out of the alignment object? Two strings
with gap characters inserted? Something else?

Peter

Post by John Berrisford
Hi
How do I get the alignment out of Align.PairwiseAligner?
I have the following code
aligner = Align.PairwiseAligner()
alignments = aligner.align(self.sequence1, self.sequence2)
logging.debug(alignment)
logging.debug(alignment.score)
logging.debug(alignment.target)
logging.debug(alignment.query)
logging.debug(alignment.path)
logging.debug(dir(alignment))
my example
Query 193 residues long.
Target 6 residues long.
out of this I can get the
alignment – which appears to be a line separated string of query, alignment, target.
MEKLEVGIYTRAREGEIACGDACLVKRVEGVIFLAVGDGIGHGPEAARAAEIAIASMESSMNTGLVNIFQLCHRELRGTRGAVAALCRVDRRQGLWQAAIVGNIHVKILSAKGIITPLATPGILGYNYPHQLLIAKGSYQEGDLFLIHSDGIQEGAVPLALLANYRLTAEELVRLIGEKYGRRDDDVAVIVAR
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|XX|XX-----
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------RANDOM-----
score – the alignment score (I can also get this with aligner.score)
target – self.sequence2
query – self.sequence1
path – I think this is what I want, but I don’t know how to interpret this – it is something the following in the above example: ((0, 0), (182, 0), (188, 6), (193, 6))
is this documented somewhere?
It looks like 0-181 no alignment, 182 to 187 adds a score of 6. 188 to 193 keeps the score at 6.
when I dir(alignment) I only see the above options
['__class__', '__cmp__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt_
_', '__hash__', '__init__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subc
lasshook__', '__weakref__', '_format_psl', 'path', 'query', 'score', 'target']
what I’m after is the middle row of the alignment (above). Is the only option to split alignment on carriage return?
Thanks
John
--
John Berrisford
PDBe
European Bioinformatics Institute (EMBL-EBI)
European Molecular Biology Laboratory
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD UK
Tel: +44 1223 492529
http://www.pdbe.org
http://www.facebook.com/proteindatabank
http://twitter.com/PDBeurope
_______________________________________________
http://mailman.open-bio.org/mailman/listinfo/biopython

_______________________________________________
Biopython mailing list - ***@mailman.open-bio.org
http://mailman.

Michiel de Hoon

2018-10-07 06:53:05 UTC

Permalink

The pairwise aligner makes use of a trace matrix as described in detail in e.g. Biological Sequence Analysis by Richard Durbin et al.Each alignment corresponds to a path through this trace matrix, consisting of horizontal, vertical, and diagonal segments. Horizontal and vertical segments are gaps; diagonal segments are sequence alignments. The path you got are the vertices in the trace matrix connecting the segments.In your example:
((0, 0), (182, 0), (188, 6), (193, 6)) means
(0, 0) - (182, 0): a gap of 182 amino acids

(182, 0), (188, 6): an alignment of 6 amino acids (amino acids 182-188 in one sequence against amino acids 0-6 in the other sequence)

(188, 6), (193, 6)): a gap of 6 amino acids.
Try a few simple examples and compare to Richard Durbin's book; that should make things clear.

Best,-Michiel

On Saturday, October 6, 2018, 1:01:06 AM GMT+9, Peter Cock <***@googlemail.com> wrote:

Yes, if you look at the code which makes that string
it does it via the path structure:

https://github.com/biopython/biopython/blob/biopython-172/Bio/Align/__init__.py#L991

What do you want out of the alignment object? Two strings
with gap characters inserted? Something else?

Peter

Post by John Berrisford
Hi
How do I get the alignment out of Align.PairwiseAligner?
I have the following code
aligner = Align.PairwiseAligner()
alignments = aligner.align(self.sequence1, self.sequence2)
logging.debug(alignment)
logging.debug(alignment.score)
logging.debug(alignment.target)
logging.debug(alignment.query)
logging.debug(alignment.path)
logging.debug(dir(alignment))
my example
Query 193 residues long.
Target 6 residues long.
out of this I can get the
alignment â which appears to be a line separated string of query, alignment, target.
MEKLEVGIYTRAREGEIACGDACLVKRVEGVIFLAVGDGIGHGPEAARAAEIAIASMESSMNTGLVNIFQLCHRELRGTRGAVAALCRVDRRQGLWQAAIVGNIHVKILSAKGIITPLATPGILGYNYPHQLLIAKGSYQEGDLFLIHSDGIQEGAVPLALLANYRLTAEELVRLIGEKYGRRDDDVAVIVAR
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|XX|XX-----
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------RANDOM-----
score â the alignment score (I can also get this with aligner.score)
target â self.sequence2
query â self.sequence1
path â I think this is what I want, but I donât know how to interpret this â it is something the following in the above example: ((0, 0), (182, 0), (188, 6), (193, 6))
is this documented somewhere?
It looks like 0-181 no alignment, 182 to 187 adds a score of 6. 188 to 193 keeps the score at 6.
when I dir(alignment) I only see the above options
['__class__', '__cmp__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt_
_', '__hash__', '__init__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subc
lasshook__', '__weakref__', '_format_psl', 'path', 'query', 'score', 'target']
what Iâm after is the middle row of the alignment (above). Is the only option to split alignment on carriage return?
Thanks
John
--
John Berrisford
PDBe
European Bioinformatics Institute (EMBL-EBI)
European Molecular Biology Laboratory
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD UK
Tel: +44 1223 492529
http://www.pdbe.org
http://www.facebook.com/proteindatabank
http://twitter.com/PDBeurope
_______________________________________________
http://mailman.open-bio.org/mailman/listinfo/biopython

_______________________________________________
Biopython mailing listÂ -Â ***@mailman.open-bio.org
http://mailman.open-bio.org/mailman/listinfo/biopython