Tuesday, 1 October 2013

Inferring the evolutionary history of photosynthesis : C 4 yourself

Biological evolution is a complex, stochastic process which dictates fundamental properties of life. Our understanding of evolutionary history is severely limited by the sparsity of the fossil record: we only have a handful of fossilised snapshots to infer how evolution may have progressed throughout the history of life. Many physicists and mathematicians have attempted theoretical treatments of the process of evolution, using varying degrees of abstraction, in order to provide a more solid quantitative foundation with which to study this complex and important phenomenon, but the predictive power of these theoretical models, and their ability to answer specific biological questions, is often questioned.

Figure. (left) Examples of steps in an evolutionary space that involve individual changes from the absence of a C4 feature (0) to the presence of that feature (1). The bitstrings represent possible sets of plant features. (right) Steps from C3 to C4 embedded in the high-dimensional evolutionary space involved in our model. Coloured points mark sets of plant features that are compatible with one or more plants that currently exist: pathways involving these compatible sets are more likely to represent evolutionary history. 




We recently focussed on one remarkable product of evolution in plants: so-called "C4 photosynthesis". C4 consists of a complex set of changes to the genetic and physiological features which have evolved in some plants and act to increase the efficiency of photosynthesis. This complex set of changes has evolved over 60 times convergently: that is, plants from many different lineages independently "discover" C4 photosynthesis through evolution. We were interested in the evolutionary history of how these discoveries occurred -- both motivated by fundamental biology and the possibility of "learning from evolution" and using information about the evolution of C4 to design more efficient crop plants.

To this end, we modelled the evolution of C4 as a pathway through a space containing many different possible plant features. The pathway starts at C3 -- the precursor to C4 -- and progressively takes steps in different directions, acquiring one-by-one the features that sum up to C4 photosynthesis. Using a survey of plant properties from across the wide scientific literature, we identified which intermediate states these pathways were likely to pass through, given observed properties of plants that currently possess some, but not all, C4 features. We were then able to use a new inference technique to predict the ordering in which these likely pathways traverse the evolutionary space. We showed that this approach worked by both successfully inferring the known evolutionary steps in synthetic datasets and correctly predicting previously unknown properties of several plants, which we verified experimentally. Our (open access) paper is here and there's a less technical summary and commentary here. Our approach showed that C4 photosynthesis can evolve through a range of distinct evolutionary pathways, providing a potential explanation for its striking convergence. Several of these different pathways were made explicitly visible when we examined the inferred evolutionary histories of different plant lineages -- different families are likely to have converged on C4 through different evolutionary routes. Furthermore, the most likely initial steps towards C4 photosynthesis are surprisingly not directly related to photosynthesis, being solutions to different biological challenges, but also providing evolutionary "foundations" upon which the machinery of C4 can evolve further. We hope that the recipes for C4 photosynthesis that we have inferred find use in efficient crop design, and anticipate our inference procedure being of use in the study of other specific biological questions regarding evolutionary histories. Iain

Wednesday, 3 April 2013

A compound methodological eye on nature’s signals


A compound methodological eye on nature’s signals: Background signals are both empirical (e.g. ECGs and human speech) and simulated (e.g. correlated noise and maps); the arctic krill eye shows output from thousands of time-series analysis methods wrapped around it [Fig.1 of our paper showing the results of applying 8651 methods to a set of time series]. Image created by B. D. Fulcher Accreditation details for the krill eye can be found here.
"… as an uneven mirror distorts the rays of objects according to its own figure and section, to the mind, when it receives impression of objects through the sense, cannot be trusted to report them truly, but in forming its notions mixes up its own nature with the nature of things…" Francis Bacon

We are constantly interacting with signals in the world around us: noticing the fluctuating breeze against our faces, observing the intermittent flickering of a candle, or becoming absorbed in the regularity of one’s own pulse. Researchers across science have developed highly sophisticated methods for understanding the structure in these types of time-varying processes, and identifying the types of mechanisms that produce them. However, scientists collaborate between disciplines surprisingly rarely, and therefore tend to use a small number of familiar methods from their own discipline. But how do the standard methods used in economics relate to those used in biomedicine or statistical physics?

In a recent article "Highly comparative time-series analysis: the empirical structure of time series and their methods" that appeared, accessible free, in Journal of the Royal Society Interface, we investigated what can be learned by comparing such methods from across science simultaneously. We collected over 9000 scientific methods for analysing signals, and compared their behaviour on a collection of over 35 000 diverse real-world and model-generated time series. The result provides a more unified and highly comparative scientific perspective on how scientists measure and understand structure in their data. For example, we showed how methods from across science that display similar behaviour to a given target can be retrieved automatically, or how different real-world or model-generated data with similar properties to a target time series can be retrieved similarly. Further examples of the kinds of questions we ask are in the boxes in the figure below. The result provides an interdisciplinary scientific context for both data and their methods. We also introduced a range of techniques for exploiting our library of methods to treat specific challenges in classification and medical diagnosis. For example, we showed how useful methods for diagnosing pathological heart beat series or Parkinsonian speech segments can be selected automatically, often yielding unexpected methods developed in disparate disciplines or in the distant past.

Representing a time series by the results of the behaviour of a set of automatically selected statistical methods and, unusually, representing statistical methods by their behaviour on a set of time series provides a form of empirical fingerprint for our time series and our methods. Given this fingerprint we can automatically answer questions like those posed in the boxes above. This gives us a powerful complement to the more conventional process of studying our methods and our data. [Based on Fig 2 of our paper]

We are developing a web platform to help this kind of comparative interdisciplinary scientific analysis, which can be found at http://www.comp-engine.org/timeseries/ The plan is to use this to allow people to exchange data, code for methods and to put each object in its context. Ben, Max and Nick

Tuesday, 5 February 2013

Evolutionary inference for functions

How might we reason about the forms of our unseen ancestors? I discuss a possible application to speech sounds in an earlier blog article (necrophonetics). A paper with John Moriarty which provides relevant theory came out lately in Royal Society Interface as "Evolutionary inference for function-valued traits: Gaussian process regression on phylogenies" (free version from this page). The gist of the idea is that some things in nature, like sounds or patterns, evolve in time and are best described as mathematical functions. Gaussian processes are a class of process which are very suited to the evolution of functions. An example of an evolving function would be a drawing of a line which is copied repeatedly (see here for a movie of us making school students do this). Having done the theory, Pantelis Hadjipantelis from Warwick (a student of John Aston) and  Chris Knight and David Springate helped take this further. They investigated whether our theory could be made to work in practice and considered careful simulated examples. In these we could see how our best estimate about characteristics of the evolutionary process and the form of the ancestors compared against (simulated) reality. We did reasonably well. On the way we used Independent Components Analysis - a very handy method. This work will be appearing shortly in Royal Society Interface as "Function-Valued Traits in Evolution" free version here. Having convinced ourselves of the relevance of the method for simulated data the next step was to consider real data that Chris Knight has - that paper is under-way. If this interests you then Mhairi Kerr produced a masters thesis on the topic working with Vincent Macaulay. This has some further introductory content. Nick

Functions can evolve along evolutionary trees - just like genetic sequences. On the left-hand we provide a simulation of function evolution. On the right we use the data from the leaves of the evolutionary tree to reconstruct the common ancestral function. Red line is the value of the function we expect/predict and black line is an actual value (in grey is a measure of our uncertainty)

Tuesday, 29 January 2013

Statistics vs Physics

While there's a whole branch of physics called statistical physics (probably a misleading title) physicists often get only a few hours of statistical training in their undergraduate degrees. This is surprising to some who think of physicists as the most mathematical of scientists. In fact you can find a diversity of statistical crimes/accidents in physics papers (and I'm sure you can find them in my own). In partial acknowledgement of this, I organised this Royal Society Discussion Meeting and edited this volume of the Philosophical Transactions of the Royal Society “Signal Processing and Inference for the Physical Sciences” with the excellent Prof Tom Maccarone (now at Texas Tech Astrophysics and Astronomy). Our goal was to expose physical scientists to some new topics in statistical inference and some data analysts to physical challenges. Lots of the volume is free and there are also talks from the authors and slides on this page. We provide an introduction "Inference for the Physical Sciences" which we hope can serve as a jumping off point for physical scientists wanting to use statistical tools. Max Little also wrote an article highlighting some challenges in signal processing in biophysics "Signal processing for molecular and cellular biological physics" putting some of our other work in context (see previous blog articles on finding steps beneath the noise and on molecular dance steps). For those with an interest in Machine Learning I think the talks by Bishop, Gharamani, Roberts and Hyv√§rinen are worth a look. Nick

Dr Ben Fulcher made the image above - similar signals are linked up (see a pending blog article) and we have to guess whether the green event that mysteriously occurred in Russia was a blue test explosion or a red earthquake...