Tuesday, 8 August 2017

What we learn from the learning rate

Cells need to sense their environment in order to survive. For example, some cells measure the concentration of food or the presence of signalling molecules. We are interested in studying the physical limits to sensing with limited resources, to understand the challenges faced by cells and to design synthetic sensors.

We have recently published a paper 'What we learn from the learning rate' (free version) where we explore the interpretation of a metric called 'the learning rate' that has been used to measure the quality of a sensor (e.g here). Our motivation is that in this field a number of metrics (a metric is a number you can calculate from the properties of the sensor that, ideally, tells you how good the sensor is) have been applied to make some statement about the quality of sensing, or limits to sensory performance. For example, a limit of particular interest is the energy required for sensing. However, it is not always clear how to interpret these metrics. We want to find out what the learning rate means. If one sensor has a higher learning rate than another what does that tell you? 

The learning rate is defined as the rate at which changes in the sensor increase the information the sensor has about the signal. The information the sensor has about the signal is how much your uncertainty about the state of the signal is reduced by knowing the state of the sensor (this is known as the mutual information). From this definition, it seems plausible that the learning rate could be a measure of sensing quality, but it is not clear. Our approach is a test to destruction – challenge the learning rate in a variety of circumstances, and try to understand how it behaves and why.

To do this we need a framework to model a general signal and sensor system. The signal hops between discrete states and the sensor also hops between discrete states in a way that follows the signal. A simple example is a cell using a surface receptor to detect the concentration of a molecule in its environment.

The figure shows such a system. The circles represent the states and the arrows represent transitions between the states. The signal is the concentration of a molecule in the cell’s environment. It can be in two states; high or low, where high is double the concentration of low. The sensor is a single cell surface receptor, which can be either unbound or bound to a molecule. Therefore, the joint system can be in four different states. The concentration jumps between its states with rates that don’t depend on the state of the sensor. The receptor becomes unbound at a constant rate and is bound at a rate proportional to the molecule concentration. 

We calculated the learning rate for several systems, including the one above, and compared it to the mutual information between the signal and the sensor (the mutual information is a refined measure of correlation). We found that in the simplest case, shown in the figure, the learning rate essentially reports the correlation between the sensor and the signal and so it is showing you the same thing as the mutual information. In more complicated systems the learning rate and mutual information show qualitatively different behaviour. This is because the learning rate actually reflects the rate at which the sensor must change in response to the signal, which is not, in general, the equivalent to the strength of correlations between the signal and sensor. Therefore, we do not think that the learning rate is useful as a general metric for the quality of a sensor. Rory, Nick and Tom 

Tuesday, 18 July 2017


Complex (adj.): 1. Consisting of many different and connected parts. ‘A complex network of water channels’.

Oxford English Dictionary

Complex systems’ – like cells, the brain or human society – are often defined as those whose interesting behaviour emerges from the interaction of many connected elements. A simple but particularly useful representation of almost any complex system is therefore as a network (aka a graph). When the connections (edges) between elements (nodes) have a direction, this takes the form of a directed network. For example, to describe interactions in an ecosystem, ecologists use directed networks called food webs, in which each species is a node and directed edges (usually drawn as arrows) go from prey to their predators. The last two decades have witnessed a lot of research into the properties of networks, and how their structure is related to aspects of complex systems, such as their dynamics or robustness. In the case of ecosystems, it has long been thought that their remarkable stability – in the sense that they don’t tend to succumb easily to destructive avalanches of extinctions – must have something to do with their underlying architecture, especially given May’s paradox: mathematical models predict that ecosystems should become more unstable with increasing size and complexity, but this doesn’t seem to happen to, say, rainforests or coral reefs.

Trophic coherence

In 2014 we proposed a solution to May’s paradox: the key structural property of ecosystems was a food-web feature called “trophic coherence”. Ecologists classify species by trophic level in the following way. Plants (nodes with no in-coming edges) have level one, herbivores (species which only have in-coming edges from plants) are at level two, and, in general, the level of any species is defined as the average level of its prey, plus one. Thus, if the network in the top left-hand corner of the figure below represented a food web, the nodes at the bottom would be plants (level 1) the next ones up would be herbivores (level 2), the next, primary carnivores (level 3) and so on. In reality, though, food webs are never quite so neatly organised, and many species prey on various levels, making food webs a bit more like the network in the top right-hand corner. Here, most species have a fractional trophic level. In order to measure this degree of order, which we called trophic coherence, we attributed to each directed edge a “trophic difference”, the difference between the levels of the predator and the prey, and looked at the statistical distribution of differences over all the edges in the whole network. We called the standard deviation of this distribution an “incoherence parameter”, q, because a perfectly coherent network like the one on the left has q=0, while a more incoherent one like that on the right has q>0 – in this case, q=0.7.

It turns out that the trophic coherence of food webs is key to their stability, and when we simulated (i.e. generated in the computer) networks with varying levels of coherence, we found that, for sufficiently coherent ones, the relationship between size and stability is inverted. Although there are plenty of caveats to this result – not least the question how one should measure stability – this suggests a solution to May’s paradox. Since then, further research has shown that trophic coherence affects other structural and dynamical properties of networks – for instance, whether a cascade of activity will propagate through a neural network (example papers here, here and here!). But all these results were somewhat anecdotal, since we didn’t have a mathematical theory relating trophic coherence to other network features. This is what we set out to do in our most recent paper.

Figure. Four directed networks, plotted so that the height of each node on the vertical axis is proportional in each case to its trophic level. The top two are synthetic networks, generated in a computer with the ‘preferential preying model’, which allows the user to tune trophic coherence [1,3]. Thus, they both have the same numbers of nodes and edges, but the one on the left is perfectly coherent (q=0) while the one on the right is more incoherent (q=0.7). The bottom two are empirically derived: the one on the left is the Ythan Estuary food web, which is significantly coherent (it has q=0.42, which is about 15% of its expected q) and belongs to the loopless regime; the one on the right is a representation of the Chlamydia pneumoniae metabolic network, which is singificantly incoherent (q=8.98, or about 162% of the random expectation) and sits in the loopful regime. The top two networks are reproduced from the SI of Johnson et alPNAS, 2014 [1], while the bottom two are from the SI of Johnson & Jones, PNAS, 2017 [5].


In statistical physics one thinks about systems in terms of ensembles – the sets of all possible systems which satisfy certain constraints – and this method has also been used in graph theory. For example, the Erdős-Rényi ensemble comprises all possible networks with given numbers of nodes N and edges L, while the configuration ensemble also specifies the degree sequence (the degree of a node being its number of neighbours). We defined the “coherence ensemble” as the set of all possible directed networks which not only have given N, L and degree sequences (each node has two degrees in directed networks, one in and one out) but also specified trophic coherence. This allows us to derive equations for the expected values of various network properties as a function of trophic coherence; in other words, these are the values we should expect to measure in a network given its trophic coherence (and other specified constraints) if we had no other knowledge about its structure.

Many network properties are heavily influenced by cycles – that is, paths through a network which begin and end at the same node. For example, in a food web you might find that eagles eat snakes, which eats squirrels, which eat eagles (probably in egg form), thus forming a cycle of length three. These cycles (properly called ‘directed cycles’ in directed networks), or loops, are related to various structural and dynamical features of complex systems. For example, feedback loops can destabilise ecosystems, mediate self-regulation of genes, or maintain neural activity in the brain. Furthermore, it had been reported that certain kinds of network – in particular, food webs and gene regulatory networks – often had either no cycles at all, or only a small number of quite short cycles. This was surprising, because in (arbitrarily large) random networks the number of cycles of length l grows exponentially with l, so it was assumed that there must be some evolutionary reason for this “looplessness”. We were able to use our coherence ensemble approach to derive the probability with which a randomly chosen path would be a cycle, as a function of q. From there we could obtain expected values for the number of cycles of length l, and for other quantities related to stability (in particular, for the adjacency matrix eigenspectrum, which captures the total extent of feedback in a system). It turns out that the number of cycles does indeed depend on length exponentially, but via a factor τ which is a function of trophic coherence. For sufficiently coherent networks, τ is negative, and hence the expected number of cycles of length l falls rapidly to zero. In fact, such networks have a high chance of being completely acyclic. Thus, our theory predicts that networks can belong to either of two regimes, depending on the “loop exponent” τ: a loopful one with lots of feedback, or a loopless one in which networks are either acyclic or have just a few short cycles. A comparison with a large set of networks from the real world – including networks of species, genes, metabolites, neurons, trading nations and English words –  shows that this is indeed so, and almost all of them are very close to our expectations given their trophic coherence.

Our theory can also be used to see how close quantities such as trophic coherence, or mean trophic level, are to what would be our random expectations, given just N, L and the degree sequences, for any real directed network. We found, for example, that in our dataset the food webs tended to be very coherent, while networks derived from metabolic reactions were significantly incoherent (see the bottom two networks in the figure: the one on the left is a food web and the one on the right is a metabolic network). Our gene regulatory networks are interesting in that, while often quite coherent in absolute terms, they are in fact very close to their random expectation.

Open questions

This work leaves open many new questions. Why are some networks significantly coherent, and others incoherent? We can guess at the mechanism behind food-web coherence: the adaptations which allow a given predator, say a wolf, to hunt deer are also useful for catching prey like goats or elk, which have similar characteristics because they, in turn, have similar diets – i.e. trophic levels. This correlation between trophic levels and node function might be more general. For example, we have shown that in a network of words which are concatenated in a text, trophic level serves to identify syntactic function, and something similar may occur in networks of genes or metabolites. If edges tend to form primarily between nodes with certain functions, this might induce coherence or incoherence. Some networks, like the artificial neural networks used for “deep learning”, are deliberately coherent, which suggests another question: how does coherence affect the performance of different kinds of system? Might there be an optimal level of trophic coherence for neural networks? And how might it affect financial, trade, or social networks, which can, in some sense, be considered human ecosystems? We hope topics such as these will attract the curiosity of other researchers who can make further inroads. You can read our paper “Looplessness in networks is linked to trophic coherence” for free here and also in the journal PNAS. Sam and Nick.

Thursday, 13 July 2017

Mitochondrial heterogeneity, metabolic scaling and cell death

Juvid Aryaman, Hanne Hoitzing, Joerg P. Burgstaller, Iain G. Johnston and Nick S. Jones

Cells need energy to produce functional machinery, deal with challenges, and continue to grow and divide -- these activities and others are collectively referred to as "cell physiology". Mitochondria are the dominant energy sources in most of our cells, so we'd expect a strong link between how well mitochondria perform and cell physiology. Indeed, when mitochondrial energy production is compromised, deadly diseases can result -- as we've written about before.

The details of this link -- how cells with different mitochondrial populations may differ physiologically -- is not well understood. A recent article shed new light on this link by looking at a measure of mitochondrial functionality in cells of different sizes. They found what we'll call the "mitopeak" -- mitochondrial functionality peaks at intermediate cell sizes, with larger and smaller cells having less functional mitochondria. The subsequent interpretation was that there is an “optimal”, intermediate, size for cells. Above this size, it was suggested that a proposed universal relationship between the energy demands of organisms (from microorganisms to elephants) and their size predicts the reduction in the function of mitochondria. Smaller cells, which result from a large cell having divided, were suggested to have inherited their parent's low mitochondrial functionality. Cells were predicted to “reset” their mitochondrial activity as they initially grow and reach an “optimal” size.

We were interested in the mitopeak, and wondered if scientifically simpler hypotheses could account for it. Using mathematical modelling, our idea was to use the observation that as a cell becomes larger in volume, the size of its mitochondrial population (and hence power supply) increases in concert. We considered that a cell has power demands which also track its volume, as well as demands which are proportional to surface area and power demands which do not depend on cell size at all (such as the energetic cost of replicating the genome at cell division, since the size of a cell's genome does not depend on how big the cell is). Assuming that power supply = demand in a cell, then bigger cells may more easily satisfy e.g. the constant power demands. This is because the number of mitochondria increases with cell volume yet the constant demands remain the same regardless of cell size. In other words, if a cell has more mitochondria as it gets larger, then each mitochondrion has to work less hard to satisfy power demand.

To explain why the smallest cells also have mitochondria which do not appear to work hard, we suggested that some smaller cells could be in the process of dying. If smaller cells are more likely to die, and if dying cells have low mitochondrial functionality (both of these ideas are biologically supported), then, by combining this with the power supply/demand picture above, the observed mitopeak naturally emerges from our mathematical model.

As an alternative model, we also suggested that the mitopeak could come entirely from a nonlinear relationship between cell size and cell death, with mitochondrial functionality as a passive indicator of how healthy a cell is. This indicates the existence of multiple hypotheses which could explain this new dataset.

Interestingly, we also found that the mitopeak could be an alternative to one aspect of a model we used some time ago to explain a different dataset, looking at the physiological influence of mitochondrial variability. Then, we modelled the activity of mitochondria as a quantity that is inherited identically by each daughter cell from its parent, plus some noise -- noting that this was a guess at the true behaviour because we didn't have the data to make a firm statement. We needed this relationship because observed functionality varied comparatively little between sister cells but substantially across a population. The mitopeak induces this variability without needing random inheritance of functionality, and may thus be the refined picture we've been looking for. These ideas, and suggestions for future strategies to explore the link between mitochondria and cell physiology in more detail, are free in our new BioEssays article "Mitochondrial heterogeneity, metabolic scaling and cell death" here. Juvid, Nick, and Iain.

Thursday, 19 January 2017

Using (mutual) information as a chemical battery

Biological systems at many scales exploit information to extract energy from their environment. In chemotaxis, single-celled organisms use the location of food molecules to navigate their way to more food; humans use the fact that food is typically found in the cafeteria. Although the general idea is clear, the fundamental physical connection between information and energy is not yet well-understood. In particular, whilst energy is inherently physical, information appears to be an abstract concept, and relating the two consistently is challenging. To overcome this problem, we have designed two microscopic machines that can be assembled out of naturally-occurring biological molecules and exploit information in the environment to perform chemical work.
Using chemical correlations as a battery.

The basic idea behind the machines is simple, and makes use of pre-existing biology. We use an enzyme that can take a small phosphate group from one molecule and attach it to another – a process known as phosphorylation. Phosphorylation is the principal signaling mechanism within a cell, as enzymes called kinases use phosphate molecules to activate other proteins. In addition to signalling, phosphates are one of the cell’s main stores of energy; chains of phosphate bonds in ATP (the cell’s fuel molecule) act as batteries. By ‘recharging’ ATP through phosphorylation, we store energy in a useful format; this is effectively what mitochondria do via a long series of biochemical reactions. In reality energy is stored both in the newly-formed phosphate bond and in the fact that the concentration of ATP has changed. We are only interested in the effects due to concentration so we set up the model to ignore the contribution from bond formation. This can trivially be put back in, as we explain in the Supplementary Material.

The machines we consider have three main components: the enzyme, the ‘fuel’ molecule that acts as a source of phosphates to charge ATP, and an activator for the enzyme, all of which are sitting in a solution of ATP and its dephosphorylated form ADP. Fuel molecules can either be charged (i.e. have a phosphate attached) or uncharged (without phosphate). When the enzyme is bound to an activator, it allows transfer of a phosphate from a charged fuel molecule to an ADP, resulting in an uncharged fuel molecule and ATP. The reverse reaction is also possible.

In order to systematically store energy in ATP, we want to activate the enzyme when a charged fuel molecule is nearby. This is possible if we have an excess of charged fuel molecules, or if charged fuel molecules are usually located near activators. In the second case, we're making use of information: the presence of an activator is informative about the possible presence of a charged fuel molecule. This is a very simple analogue of the way that cells and humans use information as outlined above. Indeed, mathematically, the 'mutual information' between the fuel and activator molecules is simply how well the presence of an activator indicates the presence of a charged fuel molecule. This mutual information  acts as an additional power supply that we can use to charge our ATP-batteries. We analyse the behaviour of our machines in environments containing information, and find that they can indeed exploit this information, or expend chemical energy in order to generate more information.

A nice feature of our designs is that they are completely free-running, or autonomous. Like living systems, they can operate without any external manipulation, happily converting between chemical energy and information on its own. There’s still a lot to do on this subject; we have only analysed the simplest kind of information structure possible and have yet to look at more complex spatial or temporal correlations. In addition, our system doesn’t learn, but relies on ‘hard-coded’ knowledge about the relation between fuel and activators. It would be very interesting to see how machines that can learn and harness more complex correlation structures would behave. You can read about this paper here for free or in Physical Review Letters under the title 'Biochemical machines for the interconversion of mutual information and work'. Tom, Nick, Pieter Rein, Tom

Thursday, 24 November 2016

Evolution, Energetics & Noise

Mitochondrial DNA (mtDNA) contains instructions for building important cellular machines. We have populations of mtDNA inside each of our cells -- almost like a population of animals in an ecosystem. Indeed, mitochondria were originally independent organisms, that billions of years ago were engulfed by our ancestor's cells and survived -- so the picture of mtDNA as a population of critters living inside our cells has evolutionary precedent! MtDNA molecules replicate and degrade in our cells in response to signals passed back and forth between mitochondria and the nucleus (the cell's "control tower"). Describing the behaviour of these population given the random, noisy environment of the cell, the fact that cells divide, and the complicated nuclear signals governing mtDNA populations, is challenging. At the same time, experiments looking in detail at mtDNA inside cells are difficult -- so predictive theoretical descriptions of these populations are highly valuable. 

Why should we care about these cellular populations? MtDNA can become mutated, wrecking the instructions for building machines. If a high enough proportion of mtDNAs in a cell are mutated, our cells struggle and we get diseases. It only takes a few cells exceeding this "threshold" to cause problems -- so understanding the cell-to-cell distribution of mtDNA is medically important (as well as biologically fascinating). Simple mathematical approaches typically describe only average behaviours -- we need to describe the variability in mtDNA populations too. And for that, we need to account for the random effects that influence them. 
​In our cells, signals from the "control tower" nucleus lead to the replication (orange) and degradation (purple) of mtDNA. These processes affect mtDNA populations that may contain normal (blue) and mutant (red) molecules. Our mathematical approach -- extending work addressing a similar but simpler system -- describes how the total number of machines, and the proportion of mutants, is likely to behave and change with time and as cells divide.

In the past, we have used a branch of maths called stochastic processes to answer questions about the random behaviour of mtDNA populations. But these previous approaches cannot account for the "control tower" -- the nucleus' control of mtDNA. To address this, we've developed a mathematical tradeoff -- we make a particular assumption (which we show not to be unreasonable) and in exchange are able to derive a wealth of results about mtDNA behaviour under all sorts of different nuclear control signals. Technically, we use a rather magical-sounding tool called "Van Kampen's system size expansion" to approximate mtDNA behaviour, then explore how the resulting equations behave as time progresses and cells divide.

Our approach shows that the cell-to-cell variability in heteroplasmy (the potentially damaging proportion of mutants in a cell) generally increases with time, and surprisingly does so in the same way regardless of how the control tower signals the population. We're able to update a decades-old and commonly-used expression (often called the Wright formula) for describing heteroplasmy variance, so that the formula, instead of being rather abstract and hard to interpret, is directly linked to real biological quantities. We also show that control tower attempts to decrease mutant mtDNA can induce more variability in the remaining "normal" mtDNA population. We link these and other results to biological applications, and show that our approach unifies and generalises many previous models and treatments of mtDNA -- providing a consistent and powerful theoretical platform with which to understand cellular mtDNA populations. The article is in the American Journal of Human Genetics here and a preprint version can be viewed here. Crossed from here.

The largest survey of opinions on vaccine confidence

Monitoring trust in immunisation programmes is essential if we are to identify areas and socioeconomic groups that are prone to vaccine-scepticism, and also if we are to forecast these levels of mistrust. Identification of vaccine-sceptic groups is especially important as clustering of non-vaccinators in social networks can serve to disproportionately lower the required vaccination levels for collective (or herd) immunity. To investigate these regions and socioeconomic groups, we performed a large-scale, data-driven study on attitudes towards vaccination. The survey — which we believe to be the largest on attitudes to vaccinations to date with responses from 67,000 people from 67 countries — was conducted by WIN Gallup International Association and probed respondents’ vaccine views by asking them to rate their agreement with the following statements: “vaccines are important for children to have”; “overall I think vaccines are safe”; “overall I think vaccines are effective”; and “vaccines are compatible with my religious beliefs”.

Our results show that attitudes vary by country, socioeconomic group, and between survey questions (where respondents are more likely to agree that vaccines are important than safe). Vaccine-safety related sentiment is particularly low in the European region, which has seven of the ten least confident countries, including France, where 41% of respondents disagree that vaccines are safe. Interestingly, the oldest age group — who may have been more exposed to the havoc that vaccine-preventable diseases can cause — hold more positive views on vaccines than the young, highlighting the association between perceived danger and pro-vaccine views. Education also plays a role. Individuals with higher levels of education are more likely to view vaccines as important and effective, but higher levels of education appear not to influence views on vaccine safety.

Vaccine World map of percentage negative ("tend to disagree" or "strongly agree") survey responses to the statement "overall I think vaccines are safe"

Our study, "The State of Vaccine Confidence 2016: Global Insights Through a 67-Country Survey" can be read for free in the journal EBioMedicine here with a commentary here. You can find other treatments in Science magazine, New Scientist, Financial Times, Le Monde and Scientific American. Alex, Iain, and Nick.

Wednesday, 31 August 2016

Understanding the strength and correlates of immunisation programmes

Childhood vaccinations are vital for the protection of children against dreadful diseases such as measles, polio, and diphtheria. In addition to providing personal protection, vaccines can also suppress epidemic outbreaks if a sufficiently large proportion of the population has immunity status – this “herd immunity” is important for society as many individuals are unable to vaccinate for medical reasons. Over the past half a century, public health organisations have made concerted efforts to vaccinate every child worldwide. However, notwithstanding the substantial improvements to vaccine coverage rates across the globe over the past few decades, there are still millions of unvaccinated children worldwide. The majority of these children live in countries where large numbers of the populations live in deprived, rural regions with poor access to healthcare. However, a number of children are denied vaccines because of parental attitudes and beliefs (which are often influenced by the media, religious groups, or anti-vaccination groups) – such hesitancy has been responsible for recent outbreaks in developing (e.g. Nigeria, Pakistan, Afghanistan) and developed (e.g. USA, UK) countries alike. Monitoring vaccine coverage rates, summarising recent vaccination behaviours, and understanding the factors which drive vaccination behaviour are thus key to our understanding vaccine acceptance, and can allow immunisation programmes to be more effectively tailored.

To understand these pertinent issues, we used machine learning tools on publicly-available vaccination and socioeconomic data (which can be found here  and the World Health Organization’s websites). We used Gaussian process regression to forecast vaccine coverage rates and used the predictive distributions over forecasted coverage rates to introduce a quantitative marker summarising a country’s recent vaccination trends and variability:  this summary is termed the Vaccine Performance Index. Parameterisations of this index can then be used to identify countries which are likely (over next few years) to have vaccine coverage rates far from those required for herd immunity levels or that are displaying worrying declines in rates and to assess which countries will miss immunisation goals set by global public health bodies. We find that these poorly-performing countries were mostly located in South-East Asia and sub-Saharan Africa though, surprisingly, a handful of European countries also perform poorly.

To investigate the factors associated with vaccination coverage, we sought links between socioeconomic factors with vaccine coverage and found that countries with higher levels of births attended by skilled health staff, gross domestic product, government health spending, and higher education levels have higher vaccination coverage levels (though these results are region-dependent).

Our vaccine performance index could aid policy makers’ assessments of the strength and resilience of immunisation programmes. Further,  identification of socioeconomic correlates of vaccine coverage points to factors to address to improve vaccination coverage. You can read further in our freely available paper – which is in collaboration with the London School of Hygiene and Tropical Medicine (Heidi Larson and David Smith) and IIT Delhi (Sumeet Agarwal) – in the open-access journal Lancet Global Health under the title “Forecasted trends in vaccination coverage and correlations with socioeconomic factors: a global time-series analysis over 30 years” and there is another free article unpacking it under the title "Global Trends in Vaccination Coverage". Alex, Iain, Nick.

Sunday, 10 January 2016

Energetic arguments constraining complex fungal systems

Fungi are ubiquitous and ecologically important organisms that grow over the resources they consume. Fungi decompose everything from dead trees to dung, but whatever substrate they consume, fungi are obliged to spend energy on growth, reproduction, and substrate digestion. Many fungi also recycle their own biomass to fuel further growth. Within this overall framework, each fungal species adopts a different strategy, depending on the relative investment in growth, recycling, digestion and reproduction. Collectively, these strategies determine ecologically critical rates of carbon and nutrient cycling, including rates of decomposition and CO2 release. Crucially, a given fungus will encounter more of a resource if it increases its growth rate, and it will obtain energy from that resource more rapidly if it increases its investment in transporters and digestive enzymes. However, any energy that is expended on growth or resource acquisition cannot be spent on spore production, so fungi necessarily confront trade-offs between these three essential processes.
An example of a foraging fungal network
To understand these trade-offs we developed an energy budget model which uses a common energy currency to systematically explore how different rates of growth, recycling, and investments in resource acquisition affect the amount of energy available for reproduction, and how those trade-offs are affected by characteristics of the resource environment. Our model helps to explain the complex range of strategies adopted by various fungi. In particular, it shows that recycling is only beneficial for fungi growing on recalcitrant, nutrient-poor substrates, and that when the timescale of reproduction is large compared to the time required for the fungus to double in size, the total energy available for reproduction will be maximal when a very small fraction of the energy budget is spent on reproduction. You can read about this free under the title "Energetic Constraints on Fungal Growth" and it appears in the glamorously titled American Naturalist. Luke, Mark and Nick

Thursday, 16 July 2015

Generations of generating functions in dividing cells

Cell biology is a unpredictable world, as we've written about before. The important machines in our cells replicate and degrade in processes that can be described as random; and when cells divide, the partitioning of these machines between the resulting cells also looks random. The number of machines we have in our cells is important, but how can we work with numbers in this unpredictable environment?
In our cells, machines are produced (red), replicate (orange), and degrade (purple) randomly with time, as well as being randomly partitioned when cells split and divide (blue). Our mathematical approach describes how the total number of machines is likely to behave and change with time and as cells divide.

Tools called "generating functions" are useful in this situation. A generating function is a mathematical function (like G(z) = z2, but generally more complicated) that encodes all the information about a random system. To find the generating function for a particular system, one needs to consider all the random things that can happen to change the state of that system, write them down in an equation (the "master equation") describing them all together, then use a mathematical trick to push that equation into a different mathematical space, where it is easier to solve. If that "transformed" equation can be solved, the result is the generating function, from which we can then get all the information we could want about a random system: the behaviour of its mean and variance, the probability of making any observation at any time, and so on.

We've gone through this mathematical process for a set of systems where individual cellular machines can be produced, replicated, and degraded randomly, and split at cell divisions in a variety of different ways. The generating functions we obtain allow us to follow this random cellular behaviour in new detail. We can make probabilistic statements about any aspect of the system at any time and after any number of cell divisions, instead of relying on assumptions that the system has somehow reached an equilibrium, or restricting ourselves to a single or small number of divisions. We've applied this tool to questions about the random dynamics of mitochondrial DNA (which we're very interested in! And this work connects explicitly with our recent eLife paper - blog article here) in cells that divide (like our cells) or "bud" (like yeast cells), but the approach is very general and we hope it will allow progress in many more biological situations. You can read about this, free, here under the title "Closed-form stochastic solutions for non-equilibrium dynamics and inheritance of cellular components over many cell divisions" in the Proceedings of the Royal Society A. Iain and Nick

Monday, 15 June 2015

How evolution deals with mitochondrial mutants (and how we can take advantage)

Our mitochondrial DNA (mtDNA) provides instructions for building vital machinery in our cells. MtDNA is inherited from our mothers, but the process of inheritance -- which is important in predicting and dealing with genetic disease -- is poorly understood. This is because mitochondrial behaviour during development (the process through which a fertilised egg becomes an independent organism) is rather complex. If a mother's egg cell begins with a mixed population of mtDNA -- say with some type A and some type B -- we usually observe hard-to-predict mtDNA differences between cells in the daughter. So if the mother's egg cell starts off with 20% type A, egg cells in the daughter could range (for example) from 10%-30% of type A, with each different cell having a different proportion of A. This increase in variability, referred to as the mtDNA bottleneck, is important for the inheritance of disease. It allows cells with higher proportions of mutant mtDNA to be removed; but also means that some cells in the next generation may contain a dangerous amount of mutant mtDNA. Crucially, how this increase in variability comes about during development is debated. Does variability increase because of random partitioning of mtDNAs at cell divisions? Is it due to the decreased number of mtDNAs per cell, increasing the magnitude of genetic drift? Or does something occur during later development to induce the variability? Without knowing this in detail, it is hard to propose therapies or make predictions addressing the inheritance of disease.

We set out to answer this question with maths! Several studies have provided data on this process by measuring the statistics of mixed mtDNA populations during development in mice. The different studies provided different interpretations of these results, proposing several different mechanisms for the bottleneck. We built a mathematical framework that was capable of modelling all the different mechanisms that had been proposed. We then used a statistical approach called approximate Bayesian computation to see which mechanism was most supported by the existing data. We identified a model where a combination of copy number reduction and random mtDNA duplications and deletions is responsible for the bottleneck. Exactly how much variability is due to each of these effects is flexible -- going some way towards explaining the existing debate in the literature.  We were also able to solve the equations describing the most likely model analytically. These solutions allow us to explore the behaviour of the bottleneck in detail, and we use this ability to propose several therapeutic approaches to increase the "power" of the bottleneck, and to increase the accuracy of sampling in IVF approaches.

A "bottleneck" acts to increase mtDNA variability between generations. But how is this bottleneck manifest? Our approach suggests that a combination of copy number reduction (pictured as a "true" copy number bottleneck), and later random turnover of mtDNA (pictured as replication and degradation), is responsible.

Our excellent experimental collaborators, led by Joerg Burgstaller, then tested our theory by taking mtDNA measurements from a model mouse that differed from those used previously and which, could in principle have shown different behaviour. The behaviour they observed agreed very well with the predictions of our theory, providing encouraging validation that we have identified a likely mechanism for the bottleneck. New measurements also showed, interestingly, that the behaviour of the bottleneck looks similar in genetically diverse systems, providing evidence for its generality. You can read about this in the free (open-access) journal eLife under the title "Stochastic modelling, Bayesian inference, and new in vivo measurements elucidate the debated mtDNA bottleneck mechanism"  Iain and Nick

Monday, 27 April 2015

The function of mitochondrial networks

Mitochondria are dynamic energy-producing organelles, and there can be hundreds or even thousands of them in one cell. Mitochondria (as we've blogged about before - e.g. here) do not exist independently of each other: sometimes they form giant fused networks across the cell, sometimes they are fragmented, and sometimes they take on intermediate shapes. Which state is preferred (fragmented, fused or in between) seems to depend on, for example, cell-division stage, age, nutrient availability and stress levels. But what is exactly the reason for the cell preferring one morphology over another?
Nonlinear phenomena -- like some percolation effects -- could help account for the functional advantage of mitochondrial networks
We recently wrote an open-access paper (free here in the journal BioEssays) in which we try to answer the question: what is it about fused mitochondrial networks that could make them preferable to fragmented mitochondria? Our paper differs from previous work in that we attempt to use a range of mathematical tools to gain insight into this complex biological system and we try to hit on the root physiological and physical roles. We use physical models, simulations, and numerical estimations to compare ideas, to reason about existing hypotheses, and to propose some new ones. Among the possibilities we consider are the effects of fusion on mitochondrial quality control, on the spread of important protein machinery throughout the cell, on the chemistry of important ions, and on the production and distribution of energy through the cell. The models we use are quite simple, but we propose ideas for improving them, and experiments that will lead to further progress.

Taking a mathematical perspective leads to a central idea: for fused mitochondria to be 'preferred' by the cell, there must be some nonlinear advantage to fusion. That's what the fuzzy line is representing in the figure above. A big mitochondrion formed by fusing two smaller ones must in some sense be 'better' than the sum of the two smaller ones, or there would be no reason why a fused state is preferred.

Mitochondria can fuse to form large continuous networks across the cell. From a mathematical and physical viewpoint, we evaluate existing and novel possible functions of mitochondrial fusion, and we suggest both experiments and modelling approaches to test hypotheses
What is the source of this nonlinearity? We find several physical and chemical possibilities. Large pieces of fused mitochondria are better at sharing their contents (e.g. proteins, enzymes, and possibly even DNA) than smaller pieces of fused mitochondria. If the 'fusedness' of the mitochondrial population increases by a factor of two, the efficiency with which they share their contents increases by more than two! Also, fusion can reduce damage. If a mitochondrion gets physically or chemically damaged, having some fused non-damaged neighbours can help to reduce the overall harm to the cell. Finally, fusion may increase energy production because of a nonlinear chemical dependence of energy production on mitochondrial membrane potential. Fusing more mitochondria may, under certain circumstances, have the effect of increasing energy production. Hanne, Iain and Nick

Thursday, 11 December 2014

Turbocharging the back of the envelope

The numbers that we use to describe the world are rarely exact. How long will it take you to drive to work? Perhaps "between 20 and 30 minutes". It would be unwise (and unnecessary) to say "exactly 23.4 minutes".

This uncertainty means that "back-of-the-envelope" calculations are very valuable in estimating and reasoning about numerical problems, particularly in the sciences. The idea here is to perform a calculation using rough guesses of the quantities involved, to get an "order of magnitude" estimate of the answer you're after. Made famous in physics as "Fermi problems", attributed to Enrico Fermi (who used rough reasoning to deduce quantities from the power of an atomic bomb to the number of piano tuners in Chicago), this approach is integral in many current applications of maths and science. Cool books like "Street-fighting Mathematics", "Guesstimation", "Back of the envelope physics", the excellent "What If?" section of xkcd, and the lateral interview questions facing some job candidates: "how much of the world's water is contained in a cow?" are all examples.

Calculations in biology, such as the time it takes for a protein (foreground) to diffuse through an E. coli cell (background), are often subject to large uncertainties. Our approach and web tool allows us to track this uncertainty and obtain a probability distribution over possible answers (plotted).
We've built a free online calculator (Caladis -- calculate a distribution) that complements this approach by allowing one to take the uncertainty in one's estimates into account throughout a calculation. For example, what volume of CO2 is produced by our yearly driving? We could say that we cover 8000 miles per year "give or take" 1000 miles, and find that our car's CO2 emissions are between 100 and 150 grams per kilometre. Our calculator allows us to do the necessary conversions and sums while taking this possible variability into account -- doing maths with "probability distributions" describing our uncertainty. We no longer obtain a single (possibly inaccurate) answer, but a distribution telling us how likely any particular answer is -- in this case a rather concerning bell-shaped distribution between 1 and 2 tonnes which can be viewed here

In the sciences, particularly in biology, measurements often have substantial uncertainties -- due to experimental error, natural variability in the system of interest, or both -- and so using distributions rather than single numbers in calculations allows us to understand and process more about the question of interest. "Back-of-the-envelope" calculations are certainly useful in biology but, owing to the uncertainties involved, one can trust one's estimates better if one has a smart envelope that takes that uncertainty into account.  We've written an accompanying paper "Explicit tracking of uncertainty increases the power of quantitative rule-of-thumb reasoning in cell biology" (free to all in Biophysical Journal) showing how to use our calculator -- in conjunction with the excellent Bionumbers online database, a collection of (often uncertain) experimental measurements in biology -- to make real biological calculations more powerful. Do have a go at using our calculator at www.caladis.org : it's user-friendly and there are lots of examples showing how it works! Iain and Nick

Thursday, 4 December 2014

Therapies for mtDNA disease: models and implications

Mitochondrial DNA (mtDNA) is a molecule in our cells that contains information about how to build important cellular machines that provide us with the energy required for life. Mutations in mtDNA can prevent our cells from producing these machines correctly, causing serious diseases. Mutant mtDNA can be passed from a carrier mother to her children, and as the amount of mutated mtDNA inherited can vary, children's symptoms can be much more severe (often deadly) than those in the mother.

Several therapies exist to prevent or minimise the inheritance of mutant mtDNA from mother to daughter. These range from simply using a donor mother's eggs (in which case the child inherits no genes from the "mother") to amazing new techniques where a mother's nucleus is transferred into a donor's egg cell which has had its nucleus removed (so that the child inherits nuclear DNA from the mother and father, and healthy mtDNA from the donor). The UK is currently debating whether to allow these new therapies: several potential scientific issues have been identified in their application.

If a mother carries an mtDNA mutation, (A) no clinical intervention can lead to her child inheriting that mutation and developing an mtDNA disease. Several "classical" (B-C) and modern (D-E) strategies exist to attempt to prevent the inheritance of mutant mtDNA, which we review (see paper link below)

As experiments with human embryos are heavily restricted, experiments in animals provide the bulk of our knowledge about how these therapies may work. We have previously written about our research in mice, highlighting a possible issue arising from mtDNA "segregation", where one type of mtDNA (possibly carrying a harmful mutation) may proliferate over another: this phenomenon could, in some circumstances, nullify the beneficial effects of mtDNA therapies. Another possible issue involves the effects of "mismatching" between the mother and father's nuclear DNA and the donor's mtDNA: current experimental evidence is conflicted regarding the strength of this effect. Finally, mismatch between donor mtDNA and any leftover mother mtDNA may also lead to biological complications.

We have recently written a paper explaining and reviewing the current state of knowledge of these effects, summarising the evidence from existing animal experiments. We are positive about implementing these therapies, which have the potential to prevent the inheritance of devastating diseases. However, we note cautions about this implementation, noting that several scientific questions remain debated or unanswered. We particularly highlight that "haplotype matching", a strategy to ensure that donor and mother mtDNA are as similar as possible, will largely remove these concerns. Iain

Wednesday, 12 November 2014

Mitochondrial motion in plants

Mitochondria are often likened to the power stations of the cell, producing energy that fuels life's processes. However, compared to traditional power stations, they're very dynamic: mitochondria move through the cell, and fuse together and break apart (among other things). Interestingly, their ability to move and undergo fusion and fission affects their functionality, and so has powerful implications for understanding disease and cellular energy supplies.

Because of this central role, it is important to understand the fundamental biological mechanisms that govern mitochondrial dynamics. Several important genes controlling mitochondrial dynamics are known in humans (and other organisms), but plant mitochondria (despite the fundamental importance of plant bioenergetics for our society) are less well understood.
Our collaborators, David Logan and his team, working with a plant called Arabidopsis, observed that a particular gene, entertainingly called "FRIENDLY", affected mitochondrial dynamics when it was artificially perturbed. (This approach, artificially interfering with a gene to explore the effects that it has on the cell and the overall organism, is a common one in cell biology.) We've just written a paper with them "FRIENDLY regulates mitochondrial distribution, fusion, and quality control in Arabidopsis" (free here) exploring these effects. Plants with disrupted FRIENDLY had unusual clusters of mitochondria in their cells, their mitochondria were stressed, and cell death and poor plant growth resulted.

Simulation of mitochondrial dynamics

We used a 3D computational and mathematical model of randomly-moving mitochondria within the cell to show that an increased "association time" (the friendly mitochondria stick around each other for longer) was sufficient to explain the experimental observations of clustered mitochondria. Our paper thus identifies an important genetic player in determining mitochondrial dynamics in plants; and explores in substantial detail the intra-cellular, bioenergetic, and physiological implications of perturbation to this important gene. Iain and Nick

Thursday, 23 October 2014

'Mitoflashes' indicate acidity changes rather than free radical bursts

As we've written about before, mitochondria generate the energy required by our cells through respiration that involves using an "electrochemical gradient" as an energy store (a bit like pumping water up into a reservoir for energy storage to then harness it flowing down the gradient of a hill to turn a turbine), and produces superoxide (free oxygen radicals) as a by-product (a bit like sparks when the pumps are running hot). The fundamental importance of this machinery which not only delivers energy, but is also involved in disease and aging  has led to its investigation in great molecular detail (comparable to taking the turbines and generators apart to learn about their function). Much less is known about how mitochondria actually behave when they are fully functional in their natural environment inside our cells (comparable to looking at the fully intact and running turbine), and progress has been difficult since suitable `tools' are scarce.

A debate exists in the scientific literature about one of the key "tools" used in the investigation of living cells. A particular fluorescent sensor protein called cpYFP (circularly permuted yellow fluorescent protein) is used in biological experiments, ostensibly as a way of measuring the levels of superoxide/free oxygen radicals  in a mitochondrion. Our colleagues, however, have cast doubt on the ability of cpYFP to measure superoxide, providing evidence that it instead responds to pH, part of the above electrochemical gradient. This debate was complicated by the fact that in biology, pH and superoxide can vary together, as the amount of "driving" and amount of "sparks" might be expected to.

As another analogy: If we found an unknown measuring device and we did not know how it works, but we saw that it responds during sunny weather, we may conclude that it measures warm temperature. However, it may in fact measure high atmospheric pressure which is, like warm temperatures, often correlated with good weather.  
The protein cpYFP changes its fluorescence in response to pH changes, but is unaffected by superoxide changes.

A recent and fascinating paper in Nature observed that "flashes" of the cpYFP sensor during early development of worms (as a model for other animals and humans) were correlated with their eventual lifespan. However, despite the debate about what it is exactly that the  cpYFP sensor measures, the paper interpreted it as responding to superoxide: looking at the correlation in the light of the so called “free radical theory of aging". This long-standing and much debated theory hypothesizes that the cause of why we age and eventually die is related to the constant production of free oxygen radicals in our mitochondria causing a steady increase in damage to our cells weakening their energetic machinery more and more and making them prone to illnesses.

In response to this, our colleagues decided to settle the question about what the sensor actually measures chemically, removing biological complications from the system. In the analogy of the unknown measurement device, the device was now tested under controlled temperature and controlled pressure to clearly distinguish between the two. They produced an experimental setup where a mix of chemicals was used to generate superoxide in the absence of any pH change. cpYFP in this mix did not show any signal, showing that it remains unresponsive to superoxide. In concert, they showed that even small changes in pH produced a dramatic response in cpYFP signal. Finally, they investigated the physical structure of cpYFP, showing that a large opening in the barrel-like structure of the protein exposes a pH-sensitive chemical group to its environment (comparable to showing how exactly the inner mechanics of the unknown measurement device can pick up pressure changes). We thus concluded, in a recent publication "The ‘mitoflash’ probe cpYFP does not respond to superoxide" (in the journal Nature here) that the cpYFP sensor reports pH rather than superoxide, and that results using cpYFP (including the above Nature paper, which remains fascinating) should be interpreted as such. Iain, Markus and Nick

Friday, 6 June 2014

Evolutionary competition within our cells: the maths of mitochondrial DNA

Women may carry mutated copies of mitochondrial DNA (mtDNA) -- a molecule that describes how to build important cellular machinery relating to cellular energy supply. If this mutant mtDNA is passed on to that woman's child, the child may develop a mitochondrial disease, which are often degenerative, fatal, and incurable.

Joerg created mice that contained two types of mtDNA -- here illustrated as blue (lab mouse mtDNA) and yellow (mtDNA from a mouse from a wild population). We used several different wild mice from across Europe to represent the mtDNA diversity one may find in a human population. We found that throughout a mouse's lifetime, one mtDNA type often outcompetes another (here, yellow beats blue), with different patterns across different tissues.
Amazing new therapies potentially allow a carrier mother A and a father B to use another woman C's egg cells to conceive a baby without much of mother A's mtDNA being present. The approach involves taking nuclear DNA content from A and B (so that most of the child's features are inherited from the true mother and father), and placing it into C's egg cells, which contain a background of healthy mtDNA. You can read about, what are misleadingly called, three-parent babies here.

Something that is less discussed is that, in this process, a small amount of A's mutant mtDNA can be "carried over" into C's cell. If this small amount remains small through the child's life, there is no danger of disease, as the larger amount of healthy C mtDNA will allow the child's cell to function normally. We can think of the resulting situation as a competition between A and C -- if A and C are evenly matched, the small amount of A will remain small; if C beats A, the small amount of A will disappear with time; and if A beats C, the small amount of A will increase and may eventually come to dominate over C.

Until recently it has been fair to assume that A and C are always about evenly matched (unless something is drastically different between A or C). However, evidence for this idea was based on model organisms in laboratories, which do not have the same amount of genetic diversity as found in human populations. Our collaborator Joerg addressed this by capturing wild mice from across central Europe, selecting a set that showed a comparable degree of genetic diversity to that expected in a human population. He used these, with our modelling and mathematical analysis, to show that pronounced differences between A and C often exist, and are more likely in more diverse populations. The possibility that A beats C, and mutant mtDNA comes to dominate the child's cells, therefore cannot be immediately discounted in a diverse population. We propose "haplotype matching" -- ensuring that A and C are as similar as possible -- to ameliorate this potential risk. It's open as to whether one can generalize from observations in mice to people and it's also open as to whether our conclusions, which used lab-mice as parent A (which are not entirely typical creatures) of necessity generalize to other non-lab mouse types. 

Our mathematical approach also allowed us to explore, in detail, the dynamics by which this competition within cells occurs. We were able to use our data rather effectively by having a statistical model that allowed us to reason jointly about a range of data sets. We found that the degree to which one population of mtDNA beat the other depended on how genetically different they were.  We found that different tissues were like different environments: some favouring C over A and some vice-versa. This is perhaps surprising to some as this evolution in the proportions of different genetic species is not something we imagine occurring inside us, during our lives, and as something that might differ between our organs. We found several different regimes, where the strength of competition changes with time and as the organism develops: when our cells are multiplying faster they show a more marked preference for one of the species. We've shown our results to the UK HFEA in its ongoing assessment of these therapies, and you can read, for free, about our work called ``mtDNA Segregation in Heteroplasmic Tissues Is Common In Vivo and Modulated by Haplotype Differences and Developmental Stage'' in the journal Cell Reports here. Iain, Joerg, Nick.

large Image
We found that one mtDNA type beat another in different ways across many different tissue type. Here, the height (or depth) of a column represents how much the mtDNA from a wild mouse wins (or loses) against that from a lab mouse in different tissues. The bottom row corresponds to the smallest difference between wild and lab mtDNA; the top row corresponds to the greatest difference.

Thursday, 10 April 2014

What's the difference? Telling apart two sets of signals

We are constantly observing ordered patterns all around us, from the shapes of different types of objects (think of different leaf shapes, yoga poses), to the structured patterns of sound waves entering our ears and the fluctuations of wind on our faces. Understanding the structure in observations like these have much practical utility: For example, how do we make sense of the ordered patterns of heart beat intervals for medical diagnosis, or the measurements of some industrial process for quality checking? We have recently published an article that automatically learns the discriminating structure in labeled datasets of ordered measurements (or time series or signals)---that is, what is it about production-line sensor measurements that predict a faulty process, or what is it about the shape of Eucalyptus leaves that distinguish them from other types of leaves?

Conventional methods for comparing time series (within the area of time-series data mining) involve comparing their measurements through time, often using sophisticated methods (with science fiction names like "dynamic time warping") that squeeze together pairs of time series patterns to find the best match. This approach can be extremely powerful, allowing new time series to be classified (e.g., in the case of a heart beat measurement, labelling it as a "healthy" heart beat or a "congestive heart failure"; or in the case of leaf shapes, labelling it as "Eucalyptus", "Oak", etc.), by matching them to a database of known time series and their classifications. While this approach can be good at telling you whether your leaf is a "Eucalyptus", it does not provide much insight into what it is about Eucalyptus leaves that is so distinctive. It also requires one to compare a new leaf to all other leaves in your database, which can be an intensive process. 

A) Comparing time series by alignment B) Comparing time series by their structural features: in this we probe many structural features of the time series simultaneously (ii) and then distil out the relevant ones (iii).
Our method learns the properties of a given class of time series (e.g., the distinguishing characteristics of Eucalyptus leaves) and classifies new time series according to these learned properties. It does so by comparing thousands of different time-series properties simultaneously, that we developed in previous work that we blogged about here. Although there is a one-time cost to learn the distinguishing properties, this investment provides interpretable insights into the properties of a given dataset (this kind of task is very useful for scientists when they want to understand the difference between their control data and the data from their experimental interventions) and can allow new time series to be classified rapidly. The result is a general framework for understanding the differences in structure between sets of time series. It can be used to understand differences between various types of leaves, heart beat intervals, industrial sensors, yoga poses, rainfall patterns, etc. and is a contribution to helping the data science/ big-data/ time-series data mining literature deal with...bigger data.
Each of the dots corresponds to a time series. The colours correspond to (computer generated) time series of six different types. We identify features that allow us to do a good job of distinguishing these six types.

Our work will be appearing with the name "Highly comparative, feature based, time-series classification" in the acronymically titled IEEE TKDE and you can find a free version of it here. Ben and Nick.