Some research areas and recent results are outlined below.
A periodic table of protein
complexes
(Science
2015; Nature Communications 2015)
Biological evolution has produced an enormous variety of protein
complexes, which arise when several proteins bind together to form
larger structures. We show that the vast majority of protein complexes
can be broken down in terms of three different fundamental steps of
protein evolution. These steps can combine in many different ways,
giving rise to the observed variety of protein complexes. What this
reveals is that heteromeric protein complexes, which are complexes
that consist of more than one type of protein, can be represented as
homomeric complexes of repeated multi-protein units. This approach
also allows us to classify protein complexes in a periodic table, and
to predict topologies of complexes that have not been observed yet.
The periodic table of protein complexes can be browsed here.
Related to this we have also explored protein complexes of uneven stoichiometry, which represent exceptions to the periodic table (Nature Communications 2015).
Universal properties of genotype-phenotype
maps
(Biophysical Journal 2023; Royal Society Interface
2023 (2); Royal Society Interface
2023; Nature Ecology & Evolution
2022; PNAS
2022; Europhysics Letters
2022; Royal Society Interface
2022; Royal Society Interface
2021; Royal Society Interface
2020; Royal Society Interface
2020 (2); Europhysics Letters
2018; Royal Society Interface
2018; Royal Society Interface
2017; PLOS Computational Biology
2016; Royal Society Interface
2015; Royal Society Interface 2014)
We introduce a simple genotype-phenotype (GP) map for biological
self-assembly on a lattice, and show that it shares many properties
with the well-established GP maps of both RNA secondary structure and
the HP model. These properties include a heavily skewed distribution
of the number of genotypes per phenotype, shape space covering, and
positively correlated phenotypic evolvability and robustness. The fact
that these important properties emerge in three very different GP maps
underline their fundamental importance for biological evolution. It
also means that the lattice model, which is highly simplified and
therefore tractable, can be used to study a wide variety of
evolutionary phenomena.
In further work (2015) we show that all of the properties described above also arise in a much simpler GP map. The defining characteristic of this simple map is the presence of 'coding' and 'non-coding' sequence regions. The boundary between these two regions is itself defined in the sequence, much like start and stop codons in DNA. The fact that the properties of biologically realistic GP maps emerge in this extremely simple model suggests that the fundamental organisation of biological sequences into constrained and unconstrained regions has a profound impact on the structure of GP maps, and therefore on biological evolution.
We have demonstrated that genotypes of the same phenotype are highly correlated (2016), and re-examined the importance of the mutable boundary between constrained and unconstrained sequences (2018), showing that it leads to a positive correlation of phenotypic evolvability and robustness.
We have also explored ways to measure structural properties of the RNA secondary structure GP map using small samples of genotypes, and investigated the topology of neutral components in genotype space (2020). In addition we have examined the genotype-phenotype map of RNA secondary structure with regard to insertions and deletions (2021), and developed a fast approach for neutral set size estimation based on free energy (2022).
In 2022 we published results in PNAS in collaboration with the research group of Ard Louis, demonstrating a close relationship between neutral spaces and phenotypic complexity, which offers an explanation for the ubiquity of symmetry and modularity in biological evolution.
Also in 2022 (published in Nature Ecology & Evolution) we showed that fitness landscapes are in general likely to be highly navigable due to structural properties of genotype-phenotype maps, meaning that these landscape contain very few valleys.
In the context of biological complexity and GP maps I co-organised the ESFCB 2012 conference on the evolution of structural and functional complexity in biology, and the IWGPM 2016 workshop on genotype-phenotype maps.
Self-assembly, modularity and physical
complexity
(PLOS Computational Biology
2019; Physical Review E
2016; Physical Review E 2016
(2); Physical Review E
2011; Physical Review E 2010; see
also Royal Society Interface 2014)
Self-assembly is not just a ubiquitous phenomenon in biology and
physics, it is also a language that can be used to describe a physical
structure, and measure its complexity and modularity. To illustrate
this, we introduce a versatile lattice model of self-assembly, before
applying our approach to more general structures such as molecules and
protein complexes. In further work we show that genetic algorithms can
be used in conjunction with our lattice model to answer questions
about the emergence of symmetry and modularity in biological
evolution.
In our most recent contributions on this topic we also study non-deterministic self-assembly in this lattice model. We show that even very simple non-deterministic two-tile sets can exhibit a wide variety of concentration-dependent growth behaviours. Furthermore we also demonstrate, both computationally and experimentally, that asymmetric interactions can limit the growth of such non-deterministic tile sets.
Network analysis of historical correspondence
(Huntington Library Quarterly 2023; Huntington Library Quarterly 2023 (2); Oxford University Press 2023; Oxford University Press 2023 (2); Cultural Analytics 2021; Cambridge University Press 2020; History Workshop Journal
2019; English Literary History
2015; Leonardo 2014)
I collaborate with Ruth Ahnert on the AHRC-funded Tudor
Networks of Power project, which examines the correspondence
network of the Tudor State Papers (1509-1603) with around 132,000
letters between around 20,000 individuals across Britain and
Europe. In recent work we show that network analysis of this
correspondence can highlight conspirators and intelligencers, as they
show unique network connectivity profiles due to their particular
historical roles. Together with designer Kim Albrecht
we developed an interactive visualisation of this data, which can be found at tudornetworks.net.
This work is described in our book Tudor Networks of Power (Oxford University Press, 2023), which won the 2024 Richard Deswarte Prize in Digital History, awarded by the Institute of Historical Research, University of London. It was also shortlisted for the 2024 SHARP Book History Prize.
In addition I was a Co-Investigator on the AHRC-funded Networking Archives project (2018-2021), which aimed to (a) combine three archives of Early Modern political and intellectual correspondence into one large network of around 432,000 letters, spanning the sixteenth, seventeenth, and eighteenth centuries across Europe, and (b) use computational analysis to inform, shape, and answer historical research questions, with a particular focus on the history of intelligence networks. This project was the subject of a special issue of the Huntington Library Quarterly.
In previous work we applied network analysis to a curated social network of the Protestant underground community during the reign of Mary I of England (1553-1558), derived from the contents of several hundred letters sent by members of this community. This quantitative approach identifies individuals in the network who did not necessarily have many connections to others, but who nevertheless occupied strategically important positions in the network. The importance of these individuals is confirmed by historical evidence of their role as sustainers who passed messages, provided shelter and financial support, and who continued to hold the network together after most of the leading figures had been executed by Mary I. This work was published in English Literary History (2015).
This work was also covered in the New Scientist.
With Ruth Ahnert, Nicole Coleman, and Scott Weingart I have written a related book, The Network Turn (Cambridge University Press, published Open Access), which examines the use of network analysis in the Arts and Humanities.
Network analysis of chemical flavour
compounds
(Flavour
2013; Scientific Reports 2011)
Using network analysis we investigate the widespread hypothesis that
foods with compatible flavours share chemical flavour compounds. Until
now this hypothesis has relied on anecdotal rather than quantitative
evidence. We construct a bipartite network of flavour compounds and
ingredients, and compare it to large recipe data sets. This reveals
that the shared compound hypothesis holds in some regional cuisines
but not in others. More generally our analysis demonstrates how the
type of large-scale data analysis that has transformed biology in
recent years can lead to new results in other fields, such as food
science.
Our article in Scientific Reports was the most downloaded article across all Nature Publishing Group journals in December 2011, exceeding 100,000 PDF downloads and HTML page views in the first four weeks following publication. It also received attention from the Scientific American, Nature News, New Scientist, The Huffington Post, The Technology Review, BioTechniques, and Ingeniøren, among others. A poster of the network between food ingredients can be downloaded here.
In the context of this work I also organised a Royal Society International Scientific Seminar in 2014, bringing together a wide range of experts including computational scientists, food scientists, neuroscientists, and chefs to discuss the impact of data science on food consumption and culinary culture.
Power graph compression of networks reveals dominant
relationships
(Scientific Reports
2014; Molecular BioSystems 2013; see
also Nature 2015)
We show that compression of complex networks into power graphs with
freely overlapping power nodes allows us to detect dominant
connectivity patterns in a wide range of different networks. This
approach can be applied to undirected, directed and bipartite networks
such as social networks, food webs and recipe-ingredient
networks. When applied to genetic transcription networks we can assign
meaning to power nodes by using GO term enrichment, which reveals that
functional modules in genetic transcription networks are highly
overlapping.
This method has also been used to map the functional organisation of the gene regulatory network in Arabidopsis responsible for xylem specification and secondary wall biosynthesis (Nature 2015).
Pattern detection in microarray
data
(Science
2010; PLOS One
2008; Bioinformatics 2006)
Over the last decade, microarrays have generated an unprecedented
amount of genetic expression data. Here we introduce an approach for
detecting statistically significant patterns in these datasets without
making prior assumptions about the nature of the pattern. This method
is based on concepts from Algorithmic Information Theory.
I am also interested in genome statistics, Boolean
networks, natural language processing, and Gaussian
processes, among other things, and am co-organiser of
the Cambridge Networks
Network meetings. Past research interests of mine include quantum
measurement and molecular dynamics.
Complex Networks - Slides can be
found here.
Quantum Information Theory - Lecture notes can be
found here.
Current graduate students:
Nabiha Khawer (MPhil)
Paula Garcia Galindo (PhD)
Runfeng Lin (PhD)
Sung Soo Moon (PhD)
William Lowe (PhD)
Nicholas Katritsis (PhD)
Former postdoctoral researchers:
Yavor Novev
Former graduate students:
Joshua Yarrow (DTP rotation project)
Chun Wan (Part III)
Runfeng Lin (MPhil)
Nora Martin (PhD)
Marcel Weiß (PhD)
Alexander Leonard (PhD)
Will Grant (PhD)
Yuanyie Chen (PhD, co-supervised)
Alexander Johnston (MPhil)
Salvatore Tesoro (PhD)
Emma Towlson (PhD)
Sam Greenbury (PhD)
Pascal Grobecker (Part III)
Former undergraduate students:
Elliot Vaughan (CET IIB project)
James Simkins (CET IIB project)
Former summer students:
Isabelle Kekwick
Reina Zheng
Fátima González
Toby Baker
Marijana Vujadinović
Pranav Reddy
Eniak Alarcón
Giles Barton-Owen
Laura Imperatori
Robert Baldock
Some of my collaborators, past and present:
Yong-Yeol Ahn
Ruth Ahnert
Albert-László Barabási
Siobhan Brady
Ed Bullmore
Guido Caldarelli
Gábor Csányi
Thomas Fink
Howard Hotson
Iain Johnston
Ard Louis
Mike Payne
Chris Pickard
Sarah Teichmann
Andrei Zinovyev
Links to pages on various scientific and non-scientific topics.
BiProjector - an online tool for projecting bipartite networks
Imbrella - A free and invisible umbrella
How to play Go on a Hypercube
John Baez's Homepage
The Chocolate Revolution
The biggest number
The Clay Millenium Prize
The Klein Bottle Shop
The Complexity Zoo
Non-Transitive Dice
Non-Transitive Lizards
'Math In LaTeX'
The CSS Zen Garden
The Simulation Argument
Minds, Machines and Gödel by John Lucas
Robert J. Lang's Origami Designs
The elgooG Google mirror
Iocaine Powder
57 Optical Illusions
Puzzles
recent publications
Systematic annotation of a complete adult male Drosophila nerve cord connectome reveals principles of functional organisation
eLife 13:RP97766 (2024)
Searching for Missing Links in the Republic of Letters: Vossius and the Dutch Dimension of Hartlib's Circle.
Huntington Library Quarterly 86 (2) 283-313 (2023)
Shadow Networks: Identifying Intercepted Letters in the Elizabethan State Papers Foreign
Huntington Library Quarterly 86 (2) 345-375 (2023)
Chapter 10: Networks in Archives: Power, Truth, and Fiction
Oxford University Press (2023)
The Boltzmann distributions of molecular structures predict likely changes through random mutations
Biophysical Journal 122 (22), 4467 (2023)
Tudor Networks of Power
Oxford University Press (2023)
The non-deterministic genotype-phenotype map of RNA secondary structure
Journal of the Royal Society Interface 20, 20230132 (2023)
Maximum mutational robustness in genotype-phenotype maps follows a self-similar blancmange-like curve
Journal of the Royal Society Interface 20, 20230169 (2023)
Current data and modeling bottlenecks for predicting crop yields in the United Kingdom
Frontiers in Sustainable Food Systems 7, 1023169 (2023)
Compression ensembles quantify aesthetic complexity and the evolution of visual art
EPJ Data Science 12, 21 (2023)
Automated extraction of pod phenotype data from micro-computed tomography
Frontiers in Plant Science 14:1120182 (2023)
Predicting phenotype transition probabilities via conditional algorithmic probability approximations
Journal of the Royal Society Interface 19, 20220694 (2022)
The structure of genotype-phenotype maps makes fitness landscapes navigable
Nature Ecology & Evolution 6, 1742 (2022)
Thermodynamics and neutral sets in the RNA sequence-structure map
Europhysics Letters 139, 3 (2022)
Fast free-energy-based neutral set size estimates for the RNA genotype-phenotype map
Journal of the Royal Society Interface 19, 20220072 (2022)
Symmetry and simplicity spontaneously emerge from the algorithmic nature of evolution
Proceedings of the National Academy of Sciences 119, e2113883119 (2022).