Dr. Maria C. Dunford

Dr. Maria C. Dunford

London, England, United Kingdom
14K followers 500+ connections

About

My two favorite words: Self-Development & Evolution

My mantra: "The biggest…

Articles by Dr. Maria

Activity

Join now to see all activity

Experience

  • Lifebit Biotech Ltd Graphic

    Lifebit Biotech Ltd

    Cambridge, United Kingdom

  • -

  • -

    Barcelona Area, Spain

  • -

    Barcelona Area, Spain

  • -

    Barcelona Area, Spain

  • -

    Barcelona Area, Spain

  • -

    Barcelona Area, Spain

  • -

    Stockholm, Sweden

Education

  • Universitat Pompeu Fabra Graphic

    Universitat Pompeu Fabra

    -

    One of the main and most recent challenges of modern biology is to keep-up with growing amount of biological data coming from next generation sequencing technologies and extract actionable biomedical insights. Large-scale comparative bioinformatics analyses are an integral part of this procedure. When doing comparative bioinformatics, multiple sequence alignments (MSAs) are by far the most widely used models. In this PhD thesis I expose the current relevance of multiple sequence aligners, I…

    One of the main and most recent challenges of modern biology is to keep-up with growing amount of biological data coming from next generation sequencing technologies and extract actionable biomedical insights. Large-scale comparative bioinformatics analyses are an integral part of this procedure. When doing comparative bioinformatics, multiple sequence alignments (MSAs) are by far the most widely used models. In this PhD thesis I expose the current relevance of multiple sequence aligners, I show how their current scaling up is leading to serious numerical stability issues and how they impact phylogenetic tree reconstruction. For this purpose, I have developed two new methods, MEGA-Coffee, a large scale aligner and Shootstrap a novel bootstrapping measure. To improve computational efficiency and reproducibility of large-scale analyses like the one carried out in the context of these studies, I co-developed a new computational framework Nextflow.

  • -

  • -

    Thesis: “Isoelectric point estimation of peptides with post-translational modifications”
    Supervisor: Assistant Professor Lukas Käll

  • -

    Thesis: “Markov Models in protein sequence analysis”
    Supervisor: Professor Pantelis G. Bagos

  • -

Publications

  • Nextflow enables reproducible computational workflows

    Nature biotechnology

    The increasing complexity of readouts for omics analyses goes hand-in-hand with concerns about the reproducibility of experiments that analyze 'big data'. When analyzing very large data sets, the main source of computational irreproducibility arises from a lack of good practice pertaining to software and database usage. Small variations across computational platforms also contribute to computational irreproducibility by producing numerical instability, which is especially relevant to…

    The increasing complexity of readouts for omics analyses goes hand-in-hand with concerns about the reproducibility of experiments that analyze 'big data'. When analyzing very large data sets, the main source of computational irreproducibility arises from a lack of good practice pertaining to software and database usage. Small variations across computational platforms also contribute to computational irreproducibility by producing numerical instability, which is especially relevant to high-performance computational (HPC) environments that are routinely used for omics analyses. We present a solution to this instability named Nextflow, a workflow management system that uses Docker technology for the multi-scale handling of containerized computation.

    Other authors
    See publication
  • PSI/TM-Coffee: a web server for fast and accurate multiple sequence alignments of regular and transmembrane proteins using homology extension on reduced databases.

    Nucleic Acids Res.

    The PSI/TM-Coffee web server performs multiple sequence alignment (MSA) of proteins by combining homology extension with a consistency based alignment approach. Homology extension is performed with Position Specific Iterative (PSI) BLAST searches against a choice of redundant and non-redundant databases. The main novelty of this server is to allow databases of reduced complexity to rapidly perform homology extension. This server also gives the possibility to use transmembrane proteins (TMPs)…

    The PSI/TM-Coffee web server performs multiple sequence alignment (MSA) of proteins by combining homology extension with a consistency based alignment approach. Homology extension is performed with Position Specific Iterative (PSI) BLAST searches against a choice of redundant and non-redundant databases. The main novelty of this server is to allow databases of reduced complexity to rapidly perform homology extension. This server also gives the possibility to use transmembrane proteins (TMPs) reference databases to allow even faster homology extension on this important category of proteins. Aside from an MSA, the server also outputs topological prediction of TMPs using the HMMTOP algorithm. Previous benchmarking of the method has shown this approach outperforms the most accurate alignment methods such as MSAProbs, Kalign, PROMALS, MAFFT, ProbCons and PRALINE™. The web server is available at http://tcoffee.crg.cat/tmcoffee.

    See publication
  • Multiple sequence alignment modeling: methods and applications.

    Briefings in Bioinformatics

    This review provides an overview on the development of Multiple sequence alignment (MSA) methods and their main applications. It is focused on progress made over the past decade. The three first sections review recent algorithmic developments for protein, RNA/DNA and genomic alignments. The fourth section deals with benchmarks and explores the relationship between empirical and simulated data, along with the impact on method developments. The last part of the review gives an overview on…

    This review provides an overview on the development of Multiple sequence alignment (MSA) methods and their main applications. It is focused on progress made over the past decade. The three first sections review recent algorithmic developments for protein, RNA/DNA and genomic alignments. The fourth section deals with benchmarks and explores the relationship between empirical and simulated data, along with the impact on method developments. The last part of the review gives an overview on available MSA local reliability estimators and their dependence on various algorithmic properties of available methods.

    See publication
  • The impact of Docker containers on the performance of genomic pipelines.

    PeerJ

    Genomic pipelines consist of several pieces of third party software and, because of their experimental nature, frequent changes and updates are commonly necessary thus raising serious deployment and reproducibility issues. Docker containers are emerging as a possible solution for many of these problems, as they allow the packaging of pipelines in an isolated and self-contained manner. This makes it easy to distribute and execute pipelines in a portable manner across a wide range of computing…

    Genomic pipelines consist of several pieces of third party software and, because of their experimental nature, frequent changes and updates are commonly necessary thus raising serious deployment and reproducibility issues. Docker containers are emerging as a possible solution for many of these problems, as they allow the packaging of pipelines in an isolated and self-contained manner. This makes it easy to distribute and execute pipelines in a portable manner across a wide range of computing platforms. Thus, the question that arises is to what extent the use of Docker containers might affect the performance of these pipelines. Here we address this question and conclude that Docker containers have only a minor impact on the performance of common genomic pipelines, which is negligible when the executed jobs are long in terms of computational time.

    KEYWORDS:Bioinformatics; Docker; Pipelines; Virtualisation; Workflow

    Other authors
    See publication
  • SARA-Coffee web server, a tool for the computation of RNA sequence and structure multiple alignments.

    Nucleic Acids Research

    Abstract

    This article introduces the SARA-Coffee web server; a service allowing the online computation of 3D structure based multiple RNA sequence alignments. The server makes it possible to combine sequences with and without known 3D structures. Given a set of sequences SARA-Coffee outputs a multiple sequence alignment along with a reliability index for every sequence, column and aligned residue. SARA-Coffee combines SARA, a pairwise structural RNA aligner with the R-Coffee multiple RNA…

    Abstract

    This article introduces the SARA-Coffee web server; a service allowing the online computation of 3D structure based multiple RNA sequence alignments. The server makes it possible to combine sequences with and without known 3D structures. Given a set of sequences SARA-Coffee outputs a multiple sequence alignment along with a reliability index for every sequence, column and aligned residue. SARA-Coffee combines SARA, a pairwise structural RNA aligner with the R-Coffee multiple RNA aligner in a way that has been shown to improve alignment accuracy over most sequence aligners when enough structural data is available. The server can be accessed from http://tcoffee.crg.cat/apps/tcoffee/do:saracoffee.

    doi: 10.1093/nar/gku459.

    See publication
  • Chromatographic retention time prediction for posttranslationally modified peptides

    PROTEOMICS


    Keywords:

    Bioinformatics;
    Machine learning;
    Posttranslational modification;
    Retention time prediction;
    Reversed-phase liquid chromatography

    Retention time prediction of peptides in liquid chromatography has proven to be a valuable tool for mass spectrometry-based proteomics, especially in designing more efficient procedures for state-of-the-art targeted workflows. Additionally, accurate retention time predictions can also be used to increase…


    Keywords:

    Bioinformatics;
    Machine learning;
    Posttranslational modification;
    Retention time prediction;
    Reversed-phase liquid chromatography

    Retention time prediction of peptides in liquid chromatography has proven to be a valuable tool for mass spectrometry-based proteomics, especially in designing more efficient procedures for state-of-the-art targeted workflows. Additionally, accurate retention time predictions can also be used to increase confidence in identifications in shotgun experiments. Despite these obvious benefits, the use of such methods has so far not been extended to (posttranslationally) modified peptides due to the absence of efficient predictors for such peptides. We here therefore describe a new retention time predictor for modified peptides, built on the foundations of our existing Elude algorithm. We evaluated our software by applying it on five types of commonly encountered modifications. Our results show that Elude now yields equally good prediction performances for modified and unmodified peptides, with correlation coefficients between predicted and observed retention times ranging from 0.93 to 0.98 for all the investigated datasets. Furthermore, we show that our predictor handles peptides carrying multiple modifications as well. This latest version of Elude is fully portable to new chromatographic conditions and can readily be applied to other types of posttranslational modifications. Elude is available under the permissive Apache2 open source License at http://per-colator.com or can be run via a web-interface at http://elude.sbc.su.se.

    See publication

Projects

  • Nextflow

    - Present

    Nextflow is a fluent DSL modelled around the UNIX pipe concept, that simplifies writing parallel and scalable pipelines in a portable manner. You can use your favourite programming language
    and tools, exploiting your current skills.
    Nextflow mission is to facilitate the computation and analysis of Big Data, with special emphasis on "Big BioMedical" Data.

    Used:
    * Groovy
    * Java

    See project
  • T-Coffee

    - Present

    T-Coffee is one of the first bioinformatics tools created, that does Multiple Sequence Alignments (MSAs). My job is to re-design T-Coffee so as to scale it up in order to be able to handle and deliver Multiple Sequence Alignments of make hundreds of thousands of biological sequences (up to 1 million).

    Used:
    * C
    * C++

    See project
  • Elude

    -

    Implementation of a web server called Elude, which is a bioinformatics software
    used for peptide retention time prediction.

    See project

Honors & Awards

  • BIGDATA TALENT AWARDS

    ORACLE

    My PhD thesis won the 2016 BIG DATA TALENT AWARDS

  • “La Caixa” International PhD Programme Fellowships

    "La Caixa" Foundation

    I was selected and awarded this very competitive fellowship to pursue my PhD in the field of life sciences.

  • Best poster award

    Hellenic Society for Computational Biology and Bioinformatics

    Best poster award in the Hellenic Society for Computational Biology and Bioinformatics 2010 Conference for poster entitled “Mixture Transition Distribution
    (MTD) Markov models: Statistical modeling and prediction of protein families”

  • Scholarship Award

    The State Scholarships Foundation-ΙKY, Greece

    Received scholarship award from the State Scholarships Foundation of Greece for the
    academic year 2005-2006 for being the best student of my year.

Languages

  • Greek

    Native or bilingual proficiency

  • English

    Professional working proficiency

  • Spanish

    Limited working proficiency

More activity by Dr. Maria

View Dr. Maria’s full profile

  • See who you know in common
  • Get introduced
  • Contact Dr. Maria directly
Join to view full profile

Other similar profiles

Explore collaborative articles

We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

Explore More

Add new skills with these courses