Sunday, 23 September 2012

Organizing networks using their dense regions

Many systems in fields ranging from biology to sociology, to politics and finance can be represented as networks. For example, in protein interaction networks each node represents a protein and each link, connecting a pair of nodes, quantifies the strength of the interaction between those proteins. Similarly, in political voting networks nodes represent politicians and the edges connecting pairs of politicians represent the similarity of their legislative voting records. Despite the significant differences in the underlying systems, the common network representation enables researchers in different fields to ask questions that can be surprisingly similar. Given this, it would be useful to have a systematic method to highlight similarities in networks from different fields to identify problems that might be tackled using the same techniques. For example, if a biological network representing covariation in neural activity in different regions of the brain could be shown to be structurally similar to a financial network representing correlation of stock returns, certain analytical tools and models might be applicable to both problems.
A taxonomy of networks

In our paper, we tackle this problem by first developing a method to quantify the similarity of different networks based on their community structure. A community in a network, loosely put, is a set of nodes which are more connected to each other than they are to the rest of the network (like a group of friends who have the majority of the social interactions with each-other). We introduce the idea of “mesoscopic response functions” which are curves that summarize the community structure of each network at different scales and enable us to define a single number that quantifies the similarity of network pairs. Importantly, this approach allows us to compare networks with different numbers of nodes and different link densities. We then use this similarity measure to construct taxonomies of networks. From an historical perspective, classification of objects in this way has been central to the progress of science, as demonstrated by the periodic table of elements in chemistry and the phylogenetic tree of organisms in biology.

The taxonomies constructed using our approach are successful at grouping networks that are known to be similar. For example, political voting networks for the US Congress, UK House of Commons and United Nation are clustered together in the same group. Perhaps more importantly, the method also identifies networks that are not grouped with members of the same class and are therefore unusual in some way. For example, a Facebook network for Caltech is not grouped with the Facebook networks of other universities. We also used the technique to detect historically significant financial and political changes in temporal sequences of networks; we found the stock market network corresponding to the 1987 crash and the voting network corresponding to the American Civil War to stand out from their respective sequences of networks.

You can read the full story in our paper “Taxonomies of networks from community structure” in Physical Review E 86, 036104 (2012)  In the paper, we demonstrate the range of fields in which this approach can be usefully applied using a set of 746 networks and case studies that include US Congressional voting, Facebook friendship, fungal growth, United Nations voting, and stock market return correlation networks. Dan, JP and Nick

Friday, 21 September 2012

How conserved are protein-protein interactions? And why would you want to know?

A comparison of biological sequences from multiple species shows a great deal of evolutionary conservation. An overall question of interest is the following: what is the connection between similarities in biological sequence between species and similarities in the function of their components and cells?

We started to think about this question in the specific context of proteins: the construction blocks of cells which are specified by a sequence (of amino acids). If two human proteins are known to physically interact (stick to each other) will their equivalent (homologous) proteins in mouse physically interact? (To say that two proteins are homologous means that they are similar through common evolutionary descent: in some sense, they are the 'same' protein).

The answer, it is often assumed, is 'yes': a fairly similar sequence that specifies the protein makes for a fairly similar function of that protein. Indeed, partly because new sequence data is being generated at a much faster rate than any other type of data, it is common practice to 'transfer' functional knowledge (such as interaction partners) from a functionally characterised protein to its unstudied matches in other species. But is this legitimate?

If we know the interaction between proteins in the network of the green organism what can we say about the  interactions of similar proteins in the blue organism's network?

An answer could also shed light on more theoretical questions.
- If only small changes in sequence can lead to new interactions between proteins, then this could be a fast evolutionary mechanism to generate new functions.
- Homologs are also found within a species - do these maintain the same interactions (a form of robustness), or rapidly lose them (release from evolutionary selection)?

Our paper just appeared in PLoS Computational Biology with the title "What Evidence is There for the Homology of Protein-Protein Interactions?". Our results returned some expected conclusions: more closely related species have more conserved interactions; the more stringent the definition to consider two proteins as homologs, the more conservation observed. An overall conclusion was that, at definitions of homology/similarity frequently used in the community, conservation of interactions is low, and hence 'transferred' functional annotations should be used with care. We also compared the transfer of interactions between and within species, and found within-species transfers were less reliable than between-species transfers. Using our method we also made some guesses as to the rate at which protein-protein interactions are lost through evolutionary time and about the total number of interactions that are present between all the proteins in an organism.

Our work is preliminary in many ways: better attempts at dealing with interaction data errors could be made; we treated all the interactions as independent of each other, which of course they are not; we didn't compare our derived rate of loss of interactions with other evolutionary rates; and much else besides! Hopefully someone else will pick up where we have left off... Anna and Nick

Predicting network flows


Many biological, geophysical and technological systems involve the transport of material over a network by bulk fluid flow (advection) and diffusion within that fluid. The analogy is that ink spilled in the middle of a river both spreads out symmetrically by diffusion (even if the river were stationary) and also gets transported bodily with the flow of the river (advection). Bulk fluid transport systems are found in the vast majority of multi-cellular organisms, as the component cells of such organisms require resources for metabolism and growth, and the speed of diffusion alone is often such that it is only an effective means of exchange at microscopic length scales.  Molecules of interest are carried by advection and diffusion through the networks that make up fungi, the blood vessel networks of animals, the xylem and phloem elements of plants, and various body cavities of many different animals. Advection and diffusion are also fundamental to transport in geological and technological systems, such as rivers and drainage networks, gas pipelines, sewer systems and ventilation systems.

In all of these cases the particles of interest diffuse within a moving fluid, which is constrained to flow within a given network. Furthermore, the molecules that are carried through the network may be consumed or delivered out of the network at a particular rate. For example, glucose molecules are carried through the blood, and at each point in the network there is some probability that a given glucose molecule will be transported out of the vascular system and into the surrounding tissue. We have recently developed an algorithm for predicting how the spatial distribution of nutrients in a network will vary over space and time, when the resource in question is subject to given rates of advection, diffusion and delivery. We explain the algorithm in our paper "Advection, diffusion, and delivery over a network" that recently appeared in Physical Review E. 

(a) Phanerochaete velutina in a 24cm x 24cm microcosm, photographed just before radio-tracer was dripped onto the inoculum.
(b) Data from the photon counting camera. The brightness of the image reflects the concentration of the tracer in each part of the network.
(c) Digitized network, coloured to indicate the tracer concentration. Concentration is measured in arbitrary units, and edges that could not be measured are coloured black.
(d) Predicted concentration measured in arbitrary units, under the assumption that the tracer enters the network at the inoculum at a constant rate, each edge in the final network continues to grow (or shrink) at the same rate that was observed over the final time step, and 10% of each edge is occupied by transport vessels.
(e) Predicted intensity in arbitrary units under the same assumptions as diagram d), except that in this case we assume that 20% of each edge is occupied by transport vessels.

We are particularly interested in modeling the movement of radio-labelled tracers in growing fungal networks. As mentioned in a previous post, we hypothesize that within fungal networks, there is a bulk movement of fluid from the sites of water uptake to the sites of growth. To test this hypothesis, we allowed the fungi Phanerochaete velutina to grow on a dish for a four week period, taking photographs every three days. An image analysis program was then used to convert the sequence of photographs into a sequence of networks, comprised of edges of measured length and volume.

After taking a final photograph of our fungi, we added a radio-labelled tracer, placed a scintillation screen over the network, and used a photon counting camera to see where the tracer moved. This experiment gave us empirical data which we could use to evaluate our model of transport in fungal networks. Our model has one free parameter, corresponding to the fraction of each edge that is occupied by transport vessels. We found that our model (see Fig. d) of growth-induced mass flows was remarkably good at predicting where the tracers would spread (compare to Fig c), if we make the biologically plausible assumption that the fluid flows occur within transport vessels that occupy 10% of each edge in the network. Luke and Nick