Pythia and Adaptive RAxML-NG (Methods in Phylogenomics) - Julia Haag & Anastasis Togkousidis
This month, the ERGA BioGenome Analysis and Applications Seminars will focus on phylogenomic methods with 2 interesting talks:
Talk 1: Pythia – predicting the difficulty of phylogenetic analyses
Speaker: Julia Haag
Abstract: "Phylogenetic analyzes under the Maximum-Likelihood model are time and resource intensive. To adequately capture the vastness of tree space, one needs to infer multiple independent trees. On some datasets, multiple tree inferences converge to similar tree topologies, on others to multiple, topologically highly distinct yet statistically indistinguishable topologies. In my talk I will present Pythia, a machine learning based predictor that accurately predicts this difficulty. Pythia predicts the degree of difficulty of analyzing a dataset prior to initiating ML-based tree inferences. Pythia can be used to increase user awareness with respect to the amount of signal and uncertainty to be expected in phylogenetic analyzes, and hence inform an appropriate (post-)analysis setup. Further, it can be used to select appropriate search algorithms for easy-, intermediate-, and hard-to-analyze datasets."
Talk 2: Adaptive RAxML-NG: Accelerating Phylogenetic Inference under Maximum Likelihood using Dataset Difficulty
Speaker: Anastasis Togkousidis (Heidelberg Institute for Theoretical Studies, Germany)
Abstract: "Phylogenetic inferences under the maximum likelihood criterion deploy heuristic tree search strategies to explore the vast search space. Depending on the input dataset, searches from different starting trees might all converge to a single tree topology. Often, though, distinct searches infer multiple topologies with large log-likelihood score differences or yield topologically highly distinct, yet almost equally likely, trees. Recently, Haag et al. introduced an approach to quantify, and implemented machine learning methods to predict, the dataset difficulty with respect to phylogenetic inference. Easy multiple sequence alignments (MSAs) exhibit a single likelihood peak on their likelihood surface, associated with a single tree topology to which most, if not all, independent searches rapidly converge. As difficulty increases, multiple locally optimal likelihood peaks emerge, yet from highly distinct topologies. To make use of this information, we introduce and implement an adaptive tree search heuristic in RAxML-NG, which modifies the thoroughness of the tree search strategy as a function of the predicted difficulty. Our adaptive strategy is based upon three observations. First, on easy datasets, searches converge rapidly and can hence be terminated at an earlier stage. Second, overanalyzing difficult datasets is hopeless, and thus it suffices to quickly infer only one of the numerous almost equally likely topologies to reduce overall execution time. Third, more extensive searches are justified and required on datasets with intermediate difficulty. While the likelihood surface exhibits multiple locally optimal peaks in this case, a small proportion of them is significantly better. Our experimental results for the adaptive heuristic on 9,515 empirical and 5,000 simulated datasets with varying difficulty exhibit substantial speedups, especially on easy and difficult datasets (53% of total MSAs), where we observe average speedups of more than 10x. Further, approximately of the inferred trees using the adaptive strategy are statistically indistinguishable from the trees inferred under the standard strategy (RAxML-NG)."
---
The ERGA BioGenome Analysis and Applications Seminar Series is a joint initiative of the ERGA Data Analysis Committee (DAC) and the BGE-ERGA WP11-Genome Applications. The purpose of this seminar series is to promote knowledge exchange on state-of-the-art genomic analyses and applications and to create a space for connection and analysis-oriented discussion for ERGA members and the broader genomics research community.
These seminars will provide opportunities for interdisciplinary interactions that explore emerging scientific trends, providing a platform for cutting-edge research, novel ideas, and insightful discussions.
Learn more about ERGA & BGE at: https://www.erga-biodiversity.eu/ and https://biodiversitygenomics.eu/