Simplifying gene trees for easier comprehension

  • Paul-Ludwig Lott
  • Marvin Mundry
  • Christoph Sassenberg
  • Stefan Lorkowski
  • Georg Fuellen
Implementation of a tree simplification system Supplementary Data: Simplifying gene trees for easier comprehension Paul-Ludwig Lott1,2,§ Marvin Mundry1,2,3,§ Christoph Sassenberg1,2,§ Stefan Lorkowski4,5, Georg Fuellen1,3,6,* 1 Division of Bioinformatics, Biology Department, University Münster, Schlossplatz 4, 48149 Münster, Germany 2 Institut für Informatik, Fachbereich Mathematik und Informatik, Einsteinstr. 62, 48149 Münster, Germany 3 Department of Medicine, AG Bioinformatics, University Münster, Domagkstrasse 3, 48149 Münster, Germany 4 Leibniz-Institute of Arteriosclerosis Research, University Münster, Domagkstrasse 3, 48149 Münster, Germany 5 Institute of Biochemistry, University Münster, Wilhelm-Klemm-Str. 2, 48149 Münster, Germany 6 Institute of Mathematics and Computer Science, University Greifswald, Jahnstrasse 15a, 17489 Greifswald, Germany § These authors contributed equally to this work. � POU transcription factor tree. Using the TreeSimplifier tool described in the main paper we simplified a gene tree (Fig. S1_) of POU transcription factors (see e.g. [17]), resulting in the gene tree shown in Fig. S2_. The simplified tree has 96 leaves, while the original tree has 185 leaves. The latter was generated using the RiPE pipeline [1], searching the entire NCBI NR (non-redundant) database with a profile of POU5F1 sequences from several organisms. Moreover, HUGO gene names were added to the deflines of the human POU proteins. (POU5F1 is also known as the Oct3/Oct4 transcription factor.) To guide monophyletic compression, we used the entire NCBI taxonomy as the species tree, converted to Newick format, and taking care of nodes with a single leaf. (For example, the node “Homo sapiens” with the single leaf “Homo sapiens neanderthalensis” is converted to the bifurcation (“Homo sapiens”, “Homo sapiens neanderthalensis”). The putative phylogeny of POU factors is much easier to recognize in the simplified tree than in the original tree, and species

