Wednesday, 3 April 2013

A compound methodological eye on nature’s signals


A compound methodological eye on nature’s signals: Background signals are both empirical (e.g. ECGs and human speech) and simulated (e.g. correlated noise and maps); the arctic krill eye shows output from thousands of time-series analysis methods wrapped around it [Fig.1 of our paper showing the results of applying 8651 methods to a set of time series]. Image created by B. D. Fulcher Accreditation details for the krill eye can be found here.

We are constantly interacting with signals in the world around us: noticing the fluctuating breeze against our faces, observing the intermittent flickering of a candle, or becoming absorbed in the regularity of one’s own pulse. Researchers across science have developed highly sophisticated methods for understanding the structure in these types of time-varying processes, and identifying the types of mechanisms that produce them. However, scientists collaborate between disciplines surprisingly rarely, and therefore tend to use a small number of familiar methods from their own discipline. But how do the standard methods used in economics relate to those used in biomedicine or statistical physics?

In a recent article "Highly comparative time-series analysis: the empirical structure of time series and their methods" that appeared, accessible free, in Journal of the Royal Society Interface, we investigated what can be learned by comparing such methods from across science simultaneously. We collected over 9000 scientific methods for analysing signals, and compared their behaviour on a collection of over 35 000 diverse real-world and model-generated time series. The result provides a more unified and highly comparative scientific perspective on how scientists measure and understand structure in their data. For example, we showed how methods from across science that display similar behaviour to a given target can be retrieved automatically, or how different real-world or model-generated data with similar properties to a target time series can be retrieved similarly. Further examples of the kinds of questions we ask are in the boxes in the figure below. The result provides an interdisciplinary scientific context for both data and their methods. We also introduced a range of techniques for exploiting our library of methods to treat specific challenges in classification and medical diagnosis. For example, we showed how useful methods for diagnosing pathological heart beat series or Parkinsonian speech segments can be selected automatically, often yielding unexpected methods developed in disparate disciplines or in the distant past.

Representing a time series by the results of the behaviour of a set of automatically selected statistical methods and, unusually, representing statistical methods by their behaviour on a set of time series provides a form of empirical fingerprint for our time series and our methods. Given this fingerprint we can automatically answer questions like those posed in the boxes above. This gives us a powerful complement to the more conventional process of studying our methods and our data. [Based on Fig 2 of our paper]

We are developing a web platform to help this kind of comparative interdisciplinary scientific analysis, which can be found at http://www.comp-engine.org/timeseries/ The plan is to use this to allow people to exchange data, code for methods and to put each object in its context. Ben, Max and Nick