Information and Entropy in Biological Systems

April 8-10, 2015

This was a workshop at the NIMBioS, the National Institute for Mathematical and Biological Synthesis. Here you can see slides of all 12 talks, and videos of some. Elsewhere you can see a complete description of the workshop and participants.
  1. John BaezInformation and entropy in biological systems.
    Abstract. Information and entropy are being used in biology in many different ways: for example, to study biological communication systems, the 'action-perception loop', the thermodynamic foundations of biology, the structure of ecosystems, measures of biodiversity, and evolution. Can we unify these? To do this, we must learn to talk to each other. This will be easier if we share some basic concepts which I'll sketch here.

    The talk is full of links, in blue. If you click on these you can get more details. You can also watch a video:

  2. For more, read:

  3. Chris LeeEmpirical information, potential information and disinformation as signatures of distinct classes of information evolving machines.
    Abstract. Information theory is an intuitively attractive way of thinking about biological evolution, because it seems to capture a core aspect of biology—life as a solution to "information problems"—in a fundamental way. However, there are non-trivial questions about how to apply that idea, and whether it has actual predictive value. For example, should we think of biological systems as being actually driven by an information metric? One idea that can draw useful links between information theory, evolution and statistical inference is the definition of an information evolving machine (IEM) as a system whose elements represent distinct predictions, and whose weights represent an information (prediction power) metric, typically as a function of sampling some iterative observation process. I first show how this idea provides useful results for describing a statistical inference process, including its maximum entropy bound for optimal inference, and how its sampling-based metrics ("empirical information", Ie, for prediction power; and "potential information", Ip, for latent prediction power) relate to classical definitions such as mutual information and relative entropy. These results suggest classification of IEMs into several distinct types:

    1. Ie machine: e.g. a population of competing genotypes evolving under selection and mutation is an IEM that computes an Ie equivalent to fitness, and whose gradient (Ip) acts strictly locally, on mutations that it actually samples. Its transition rates between steady states will decrease exponentially as a function of evolutionary distance.
    2. "Ip tunneling" machine: a statistical inference process summing over a population of models to compute both Ie, Ip can directly detect "latent" information in the observations (not captured by its model), which it can follow to "tunnel" rapidly to a new steady state.
    3. disinformation machine (multiscale IEM): an ecosystem of species is an IEM whose elements (species) are themselves IEMs that can interact. When an attacker IEM can reduce a target IEM's prediction power (Ie) by sending it a misleading signal, this "disinformation dynamic" can alter the evolutionary landscape in interesting ways, by opening up paths for rapid co-evolution to distant steady-states. This is especially true when the disinformation attack targets a feature of high fitness value, yielding a combination of strong negative selection for retention of the target feature, plus strong positive selection for escaping the disinformation attack. I will illustrate with examples from statistical inference and evolutionary game theory. These concepts, though basic, may provide useful connections between diverse themes in the workshop.

    You can also watch a video:

  4. John HarteMaximum entropy as a foundation for theory building in ecology.

    Abstract. Constrained maximization of information entropy (MaxEnt) yields least-biased probability distributions. In statistical physics, this powerful inference method yields classical statistical mechanics/thermodynamics under the constraints imposed by conservation laws. I apply MaxEnt to macroecology, the study of the distribution, abundance, and energetics of species in ecosystems. With constraints derived from ratios of ecological state variables, I show that MaxEnt yields realistic abundance distributions, species-area relationships, spatial aggregation patterns, and body-size distributions over a wide range of taxonomic groups, habitats and spatial scales. I conclude with a brief summary of some of the major opportunities at the frontier of MaxEnt-based macroecological theory.

  5. Annette OstlingThe neutral theory of biodiversity and other competitors to the principle of maximum entropy.
    Abstract. I am a bit of the odd man out in that I will not talk that much about information and entropy, but instead about neutral theory and niche theory in ecology. My interest in coming to this workshop is in part out of an interest in what greater insights we can get into neutral models and stochastic population dynamics in general using entropy and information theory.

    I will present the niche and neutral theories of the maintenance of diversity of competing species in ecology, and explain the dynamics included in neutral models in ecology. I will also briefly explain how one can derive a species abundance distribution from neutral models. I will present the view that neutral models have the potential to serve as more process-based null models than previously used in ecology for detecting the signature of niches and habitat filtering. However, tests of neutral theory in ecology have not as of yet been as useful as tests of neutral theory in evolutionary biology, because they leave open the possibility that pattern is influenced by "demographic complexity" rather than niches. I will mention briefly some of the work I've been doing to try to construct better tests of neutral theory. Finally I'll mention some connections that have been made so far between predictions of entropy theory and predictions of neutral theory in ecology and evolution.

    You can also watch a video:

    For more, read:

  6. David WolpertThe Landauer limit and thermodynamics of biological organisms.

    You can also watch a video:

  7. Susanne StillEfficient computation and data modeling.

    You can also watch a video:

  8. For more, read:

  9. Matina Donaldson-MatasciThe fitness value of information in an uncertain environment.
    Abstract. Abstract. Communication and information are central concepts in evolutionary biology. In fact, it is hard to find an area of biology where these concepts are not used. However, quantifying the information transferred in biological interactions has been difficult. How much information is transferred when the first spring rainfall hits a dormant seed, or when a chick begs for food from its parent? One measure that is commonly used in such cases is fitness value: by how much, on average, an individual's fitness would increase if it behaved optimally with the new information, compared to its average fitness without the information. Another measure, often used to describe neural responses to sensory stimuli, is the mutual information — a measure of reduction in uncertainty, as introduced by Shannon in communication theory. However, mutual information has generally not been considered to be an appropriate measure for describing developmental or behavioral responses at the organismal level, because it is blind to function; it does not distinguish between relevant and irrelevant information. There is in fact a surprisingly tight connection between these two measures in the important context of evolution in an uncertain environment. In this case, a useful measure of fitness benefit is the increase in the long-term growth rate, or the fold increase in number of surviving lineages. In many cases the fitness value of a developmental cue, when measured this way, is exactly equal to the reduction in uncertainty about the environment, as described by the mutual information.

    For more, read:

  10. Roderick DewarMaximum entropy and maximum entropy production in biological systems: survival of the likeliest?

    For more, read:

  11. Marc HarperInformation transport and evolutionary dynamics.

    You can also watch a video:

    For more, read:

  12. Tobias FritzCharacterizations of Shannon and Rényi entropy.
    Abstract. Measures of information like Shannon entropy and Rényi entropy have found widespread use in biology. But they are defined in terms of complicated-looking formulas, and the question arises: why do these quantities take these particular forms? Or should we really be using other measures of information defined differently? In this talk, I will approach these questions by explaining some of the special properties enjoyed by Shannon and Rényi entropy and discussing how these properties single out the Shannon and Rényi entropies among other measures of information.
  13. Christina CobboldBiodiversity measures and the role of species similarity.
    Abstract. Biodiversity is associated to properties of ecosystem function and productivity or at the molecular level it can describe genetic diversity with connections to quantities such as host-pathogen fitness. Diversity is clearly an important concept and it begs the question how should one measure diversity. There are literally dozens of measures of diversity in the literature, perhaps one of the most common examples being species richness or simply the number of species in the community concerned. Even such a simple definition raises questions, what defines a species? Sometimes this question is simple to answer, but for microbial communities, for example, there is no simple definition of a species and yet we still wish to quantify diversity.

    To address this issue we present a natural family of diversity measures which not only takes into account relative abundance of species, but also accounts for differences or similarities between species. This latter point allows us to deal with communities where the notion of species is unclear. The family of measures are closely related to Rényi's generalised entropies and I will illustrate how the formulas introduced by Tobias Fritz in his talk relate to our family of diversity measures.

    We demonstrate that our new measure of diversity is not simply an addition to the already long list of indices: instead, a single formula subsumes many of the most popular indices, including Shannon's, Simpson's, species richness, and Rao's quadratic entropy. These popular indices can then be used and understood in a unified way, and the relationships between them are made plain. The new measures are, moreover, effective numbers, so that percentage changes and ratio comparisons of diversity value are meaningful.

    You can also watch a video:

    For more, read:

  14. Tom LeinsterMaximizing biological diversity.
    Abstract. Different ecologists, shown two communities, will make different judgements on which is the more diverse. One axis of difference is the relative importance attached to rare and common species: one person might prioritize conservation of rare species, while another prioritizes overall community balance. This spectrum of viewpoints is captured by the family of measures known to ecologists as the Hill numbers and to information theorists as the exponentials of the Rényi entropies. It is a one-parameter family, the parameter q indicating one's position on the spectrum of viewpoints.

    However, all these measures use a crude model in which the varying similarities between species are ignored. They behave as if distinct species have nothing whatsoever in common. Christina Cobbold's talk showed how to repair this defect, factoring in inter-species similarity.

    A natural question then arises. Given a list of species with known similarities, and a choice q of viewpoint, which frequency distribution maximizes the diversity?

    The big surprise is this: there is a single frequency distribution that maximizes diversity from all viewpoints simultaneously. No matter whether one's priority is rare species (low q) or common species (high q), this distribution is optimal. Moreover, the value of the maximum diversity is the same for all q. Thus, any list of species of known similarities has an unambiguous maximum diversity.

    For an early account of the main result (with a proof that has since been simplified), read:

© 2015 John Baez