## Wednesday, 27 February 2019

### Guessing the spreading time of rumours

Social scientists are fascinated by social influence. That is, how people's beliefs, opinions and actions are influenced by others. This is relevant for understanding voting, health behaviour or opinions on issues like vaccination and climate change (topics our group is interested in). Mathematically inclined social scientists often interpret social influence using network theory.  Networks or graphs are used to represent systems consisting of many individual units, known as nodes, and the interactions between them, which are referred to as edges or links. In social networks the nodes represent people and the links represent social ties such as friendships.

Given a particular graph there are tools for modelling how opinions and beliefs can spread through a graph. However, in practice we often don’t know the structure of the social network itself. This could be because: i) the data we would like is unavailable ii) privacy concerns about social network data mean we can't share it even if we have it iii) the data exists but is full of errors or omissions. Fortunately, we know a lot about the structure of social networks from decades of past research by social scientists and statisticians. For example, many social networks are known to be homophilous - this means that people who are similar to each other are more likely to share a social connection (e.g many of your friends are probably a similar age to you).

Inspired by this, we consider a simple mathematical model for homophilous networks known as a Random Geometric Graph (RGG). In an RGG the nodes are assigned random positions in a (unit) box. Nodes are connected to all the nodes which are within a set distance (see figure), which we refer to as the connection radius. Positions of nodes may represent the positions of individuals in geographic space or in some “social space” where the coordinate axis might represent attributes such as age, income and education level. Since social networks are homophilous we will expect those who are closer together in “social space” to share a social tie.
 Example of a Random Geometric Graph with 100 nodes and a connection radius of 0.2.
One basic question we can ask about a network is: “how long does it take something to spread across it?” We refer to this as the diffusion timescale. The diffusion timescale in a graph is indicative of how well connected the graph is and governs how quickly we might expect a disease, rumour or the adoption of a new behaviour to spread through it (or even how long it will take a zombie apocalypse to take hold). In our recent research we focus on the question:

“If we do not know the network (but perhaps know some of its properties) how precisely can we know the diffusion timescale?”

We show that different RGGs drawn at random with the same number of nodes and connection radius can have very different diffusion timescales. This implies that if we don’t have a good grasp of the graph structure then it could be difficult to predict the outcomes of processes such as the spread of an opinion through a social network. Or alternatively we can gain lots of extra information about diffusion timescales if we happen to know the social co-ordinates of individuals. On the other hand, we do find some classes of RGGs where the diffusion time scale is very predictable given only knowledge of the number of nodes and the connection radius.

Our work helps put limitations on how accurately we can forecast the outcome of processes on networks given the available data (which is always imperfect). Future work may involve asking the same questions for real world datasets. In addition, most of our new results were obtained through computer simulations, meaning that there is also scope for more theory.

You can read about our research in the paper Large algebraic connectivity fluctuations in spatial network ensembles imply a predictive advantage from node location information” for free here or for not-free here in Physical Review E. Matt and Nick.

### How mitochondria can vary, and consequences for human health

Mitochondria are components of the cell which are involved in generating “energy currency” molecules called ATP across much of complex life. Since many mitochondria exist within single cells (often hundreds or thousands), it is possible for the characteristics of individual mitochondria to vary within cells, and within tissues. This variation of mitochondrial characteristics can affect biological function and human health.

Since mitochondria possess their own, small, circular, DNA molecules (mtDNA), we can split mitochondrial characteristics into two categories: genetic and non-genetic. In our review, we discuss a number of aspects in which mitochondria vary, from both genetic and non-genetic perspectives.

In terms of mitochondrial genetics, the amount of mtDNA per cell is variable. When a cell divides, its daughters receive a share of its parents mtDNA, but the split isn’t precisely 50/50, so cell division can cause variability in the number of mtDNAs per cell. As mtDNAs are replicated and degraded over time, errors in the copying process may give rise to mtDNA mutations, which may spread throughout a cell. Factors such as: the total amount, the rate of degradation/replication, the mean fraction of mutants, and the extent of fragmentation in the mitochondrial network, can all influence how variable the fraction of mutated mtDNAs becomes through time (see here for a preview of some upcoming work on this topic). The total amount, and mutated fraction of mtDNAs, are implicated in diseases such as neurodegeneration, as well as the ageing process.

Apart from genetic variations, there are many non-genetic features of mitochondria which also vary within and between cells. Changes in mtDNA sequence can change the amino-acid sequence of the proteins encoded by mtDNA, causing structural changes in the molecular machines which generate ATP. The shape of the membranes of mitochondria are also highly variable, and respond to mitochondrial activity through quantities such as pH, where mitochondrial activity itself may depend on mtDNA sequence. The previous two examples (mitochondrial protein and membrane structure) demonstrate how the genetic state of mitochondria may influence their non-genetic characteristics. Mitochondrial non-genetic characteristics may also influence the genetic state: for instance, mitochondrial membrane potential can influence the probability of a mitochondria being degraded, along with its mtDNA.

The inter-dependence of genetic and non-genetic characteristics demonstrate the complex feedback loops linking these two aspects of mitochondrial physiology. We suggest here that, since changes in mitochondrial genetics occur more slowly than most physical aspects of mitochondrial physiology, understanding mitochondrial genetics may be especially important in explaining phenomena such as ageing, which appears to be closely related to mitochondrial heterogeneity. You can freely access our work, which has recently been published in Frontiers in Genetics, as “Mitochondrial Heterogeneity” https://www.frontiersin.org/articles/10.3389/fgene.2018.00718/full Juvid, Iain and Nick.

## Saturday, 20 October 2018

### Mutated islands of brain matter from development might be common in the human population

You are made up of a lot of cells, and so is your brain. You were also derived from a single cell: the union of a sperm and an egg. In order for your body to grow from a single cell into an adult human, a massive amount of cell division must occur, which means that the DNA inside your cells must also be replicated intensively. In copying all of this DNA, “spelling mistakes” can sometimes be made. If that mistake occurs early enough in development, all of the subsequent cells which are copied from the mutant parent also receive this mistake, which potentially gives rise to large islands of mutated cells (called “somatic mosaicism”). If a copying error occurs at a particularly important base of DNA, this could potentially cause disease in the tissue once you have fully developed into an adult.

Inherited mutations in certain genes are known, in rare cases, to cause neurodegenerative disease (such as Alzheimer's and Parkinson's disease). We wondered whether non-heritable “spelling mistakes” in these disease-causing genes is common enough in the human population to potentially explain the more common forms of neurodegenerative disease.

Our experimental collaborators at the University of Cambridge went searching for mutated chunks of brain matter in post-mortem samples of brains from 54 human individuals. Using genetic sequencing technology, they found evidence for these mutated islands of grey matter. However, none of these samples were pathological themselves, since only a small fraction of the brain per individual was sampled. This provided an opportunity for mathematical modelling of how the brain develops, so that we may predict the prevalence of pathological mutations in human brains, given the experimental data.

Our mathematical model is incredibly simple (and crude -- others have developed much more sophisticated approaches): we assume that, in order to grow a brain, you take the initial cell from which you were derived, and double it repeatedly until the mass of cells corresponds to the number of cells in the brain (this is called a binary tree). Each copying event is called a “generation”, and corresponds to a row of the tree below. If a mutation occurs whilst copying the DNA of a particular cell at a particular generation, then a fixed fraction of all the subsequent daughters will also be mutated, generating a mutant region. Repeatedly simulating brain development using a computer allows us to gather statistics about the probability of an individual harbouring pathological islands of brain matter.

 Mathematical modelling of brain development reveals that islands of pathologically mutated cells are potentially common in the population. Left: We modelled neurodevelopment as a simple binary tree, where an initial cell doubles repeatedly until the final adult brain is created. DNA copying errors are carried forward into daughter cells. Right: A typical simulated individual. Coloured circles represent islands of pathologically mutated cells in the adult human brain. Whole brain area (black circle) is not to scale with the mutated regions (coloured circles). The mutated regions are really tiny proportionately.

Neurodevelopment is, of course, much more complicated than a series of doubling events. Amongst other effects, regions spatially re-arrange themselves, cells die, and cell division isn't always symmetric (i.e. daughter cells may not always be capable of dividing themselves). We explored several modifications to the simple model above, and found that our extrapolations were surprisingly robust. We argue that, once the developing human brain consists of about 1 million cells, as long as each daughter cell gives rise to roughly the same number of daughter cells in subsequent divisions, and that spatial mixing of the brain isn't too strong, every individual is expected to harbour about 1 pathologically mutated island of cells consisting of about 10,000 to 100,000 cells. The basic idea is that if those 1 million cells replicate once then they are really likely to have a pathological mutation crop up in one of those 1 million divisions. Larger regions may also occur, but are rarer, and conversely, smaller regions are more common (see the right panel of the figure above). This kind of argument suggests that for a whole range of possible ways in which our brain develops we’re likely to have islands of mutation.

We also discuss an observation which emerges from the tree-structure of neurodevelopment which may allow us to directly estimate the mutation rate from a simple back-of-the-envelope calculation. Any particular experiment will have a certain detection sensitivity, in that it will be able to detect mutations common to a minimum number of cells in a sample, and no fewer (in our case this was ~0.5% of cells in a sample). Because of the tree structure of neurodevelopment, the most common mutations observed will occur at exactly the detection sensitivity: larger mutated islands become exponentially rarer, whereas smaller mutations are too small to be measurable.

Now consider cutting a whole brain up into a number of equally-sized chunks. As the size of the chunks increases (where we are able to detect mutations affecting 0.5% of each chunk), each chunk is tuning into a mutation event which affects more cells, and therefore higher up in the neurodevelopmental tree. But, the number of mutated cells from any particular generation in the tree is a constant: mutations high up in the tree are larger, but also rarer, and these two effects precisely balance each other. Therefore, regardless of how large each chunk is, the total number of cells which you expect to be able to detect is independent of chunk size. The total number of detectably mutated cells does, of course, depend on the mutation rate and the total number of bases that are sequenced. Furthermore, we may say that the total number of detectably mutated cells equals: (number of mutated chunks) x (number of mutated cells per mutated chunk): this argument itself is also independent of the size of each chunk, only depending on the detection sensitivity and the fraction of detectably mutated chunks across the whole experiment. Therefore, we may equate the total number of mutations from any given generation, and the total number of detectably mutated, to write down the mutation rate entirely independently of the size of each brain chunk. Another way of putting the above argument is that (for very simple models of the brain we describe) we expect that the quantity: (the fraction of chunks containing mutation) x (sensitivity of the detection technique), is an invariant directly linked to the mutation rate (specifically to the number of mutations expected in a single replication of the sequences studied). This doesn't depend on the size of the chunks or the size of the brain. As such, if our experiment had half the sensitivity to mutated cells per brain chunk (so 1% instead of 0.5%), we’d have had to measure twice as many bits of brain to obtain a similar number of detectably mutated brain chunks. It's obviously crude but helpful for insight -- and we're order-of-magnitude enthusiasts (see this and this and this

Overall, our results suggest that pathologically mutated islands of brain matter are potentially possessed by all of us. These islands may potentially be sources of protein aggregates, which could spread in the brain and cause neurodegeneration; perhaps they’re regions which could be thought of as randomly triggering pathology sometime over our lives with the rate of triggering proportional to the size of the region. Future work is required to verify this, by direct observation of pathologically mutated islands, and mechanistic studies to quantify how large an island is “large enough” to have a high chance of inducing disease within a human lifespan.

You can freely access our work, which has recently been published in Nature Communications, as "High prevalence of focal and multi-focal somatic genetic variants in the human brain" https://www.nature.com/articles/s41467-018-06331-w Juvid, Nick and our friends in the Department of Clinical Neurosciences at the University of Cambridge especially Mike, Wei and Patrick.

## Tuesday, 9 October 2018

### Towards fully automated remote ecosystem monitoring

Natural ecosystems around the world are being impacted by human activity at an ever-increasing rate. However, we still don’t fully understand the true extent of our actions on these complex systems, limiting our ability to develop sustainable, well informed best practices.
Much of the problem is in collecting sufficient amounts of data from environments which are often difficult for scientists to access and survey thoroughly (e.g. polar regions, tropical rainforests, savannas). Therefore, we have been working on methods of fully-automating ecosystem monitoring in a cost-effective way, that will provide huge amounts of data on the health of a remote ecosystem over long time periods, with minimal effort required by field scientists to maintain the system.
Our first step towards this goal has been to develop a device that continuously records data from a variety of sensors (microphones, cameras, humidity sensors etc.) and uploads the data to the internet directly from the field using a standard mobile phone internet connection. The device is also powered by a solar panel setup, meaning that battery replacements are unnecessary. In theory, once initially set up, this device can sit out in a remote field site indefinitely, with the data sent straight to scientists almost instantaneously. Mobile phone connections are patchy at best in remote locations so a key challenge was to have a low-power system that could opportunistically exploit the available mobile signal.
The kit is cheap and open source so you can make your own and you're welcomed to have an explore: http://www.rpi-eco-monitoring.com and, if you subscribe, you can read more in a recent New Scientist article.  You can read our full paper for free in Methods in Ecology and Evolution for more details - https://besjournals.onlinelibrary.wiley.com/doi/abs/10.1111/2041-210X.13089
Of particular interest to our group has been using audio to identify calling animals in the tropical forests of Borneo. We work at the SAFE project site in Sabah where ecologists from around the world investigate the effect of logging and the oil palm industry on the biodiversity of these ancient rainforests. A growing network of 12 monitoring devices are currently scattered around the SAFE landscape (see Rob Ewers's work) in areas varying from old-growth forest (with almost no impact from humans) to oil palm plantations.

 A real-time acoustic monitoring unit, deployed in the tropical forests of Sabah, Borneo at the SAFE project site. Data used from these devices helps investigate the effect of the oil palm industry on the biodiversity found in the region.
We are developing algorithms using a wide array of machine learning techniques that will automatically listen to the masses of audio from these monitoring devices in Borneo and give us a real-time measure of the biodiversity in the different forest locations. With a finer scale understanding of the full human impact on these fragile ecosystems we can help inform better sustainable practices for the oil-palm industry to minimise their damage on the threatened species of this region. Sarab, Lorenzo, Rob and Nick