Affordable Access

Access to the full text

NGlyAlign: an automated library building tool to align highly divergent HIV envelope sequences

Authors
  • Akand, Elma H.1
  • Murray, John M.1
  • 1 School of Mathematics and Statistics, UNSW, Sydney, NSW, Australia , Sydney (Australia)
Type
Published Article
Journal
BMC Bioinformatics
Publisher
Springer (Biomed Central Ltd.)
Publication Date
Feb 08, 2021
Volume
22
Issue
1
Identifiers
DOI: 10.1186/s12859-020-03901-y
Source
Springer Nature
Keywords
License
Green

Abstract

BackgroundThe high variability in envelope regions of some viruses such as HIV allow the virus to establish infection and to escape subsequent immune surveillance. This variability, as well as increasing incorporation of N-linked glycosylation sites, is fundamental to this evasion. It also creates difficulties for multiple sequence alignment methods (MSA) that provide the first step in their analysis. Existing MSA tools often fail to properly align highly variable HIV envelope sequences requiring extensive manual editing that is impractical with even a moderate number of these variable sequences.ResultsWe developed an automated library building tool NGlyAlign, that organizes similar N-linked glycosylation sites as block constraints and statistically conserved global sites as single site constraints to automatically enforce partial columns in consistency-based MSA methods such as Dialign. This combined method accurately aligns variable HIV-1 envelope sequences. We tested the method on two datasets: a set of 156 founder and chronic gp160 HIV-1 subtype B sequences as well as a set of reference sequences of gp120 in the highly variable region 1. On measures such as entropy scores, sum of pair scores, column score, and similarity heat maps, NGlyAlign+Dialign proved superior against methods such as T-Coffee, ClustalOmega, ClustalW, Praline, HIValign and Muscle. The method is scalable to large sequence sets producing accurate alignments without requiring manual editing. As well as this application to HIV, our method can be used for other highly variable glycoproteins such as hepatitis C virus envelope.ConclusionsNGlyAlign is an automated tool for mapping and building glycosylation motif libraries to accurately align highly variable regions in HIV sequences. It can provide the basis for many studies reliant on single robust alignments. NGlyAlign has been developed as an open-source tool and is freely available at https://github.com/UNSW-Mathematical-Biology/NGlyAlign_v1.0 .

Report this publication

Statistics

Seen <100 times