Some recent results are outlined below.
A periodic table of protein complexes
Biological evolution has produced an enormous variety of protein complexes, which arise when several proteins bind together to form larger structures. We show that the vast majority of protein complexes can be broken down in terms of three different fundamental steps of protein evolution. These steps can combine in many different ways, giving rise to the observed variety of protein complexes. What this reveals is that heteromeric protein complexes, which are complexes that consist of more than one type of protein, can be represented as homomeric complexes of repeated multi-protein units. This approach also allows us to classify protein complexes in a periodic table, and to predict topologies of complexes that have not been observed yet.
The periodic table of protein complexes can be browsed here.
Universal properties of genotype-phenotype maps
(Royal Society Interface 2017; PLOS Computational Biology 2016; Royal Society Interface 2015; Royal Society Interface 2014)
We introduce a simple genotype-phenotype (GP) map for biological self-assembly on a lattice, and show that it shares many properties with the well-established GP maps of both RNA secondary structure and the HP model. These properties include a heavily skewed distribution of the number of genotypes per phenotype, shape space covering, and positively correlated phenotypic evolvability and robustness. The fact that these important properties emerge in three very different GP maps underline their fundamental importance for biological evolution. It also means that the lattice model, which is highly simplified and therefore tractable, can be used to study a wide variety of evolutionary phenomena.
In further work (2015) we show that all of the properties described above also arise in a much simpler GP map. The defining characteristic of this simple map is the presence of 'coding' and 'non-coding' sequence regions. The boundary between these two regions is itself defined in the sequence, much like start and stop codons in DNA. The fact that the properties of biologically realistic GP maps emerge in this extremely simple model suggests that the fundamental organisation of biological sequences into constrained and unconstrained regions has a profound impact on the structure of GP maps, and therefore on biological evolution.
Network analysis identifies the sustainers of historical underground communities
(English Literary History 2015; Leonardo 2014)
We apply network analysis to a curated social network of the Protestant underground community during the reign of Mary I of England (1553-1558), derived from the contents of several hundred letters sent by members of this community. This quantitative approach identifies individuals in the network who did not necessarily have many connections to others, but who nevertheless occupied strategically important positions in the network. The importance of these individuals is confirmed by historical evidence of their role as sustainers who passed messages, provided shelter and financial support, and who continued to hold the network together after most of the leading figures had been executed by Mary I.
This work was also covered in the New Scientist.
Self-assembly, modularity and physical complexity
(Physical Review E 2016; Physical Review E 2016 (2); Physical Review E 2011; Physical Review E 2010; see also Royal Society Interface 2014)
Self-assembly is not just a ubiquitous phenomenon in biology and physics, it is also a language that can be used to describe a physical structure, and measure its complexity and modularity. To illustrate this, we introduce a versatile lattice model of self-assembly, before applying our approach to more general structures such as molecules and protein complexes. In further work we show that genetic algorithms can be used in conjunction with our lattice model to answer questions about the emergence of symmetry and modularity in biological evolution.
In our most recent contributions on this topic we also study non-deterministic self-assembly in this lattice model. We show that even very simple non-deterministic two-tile sets can exhibit a wide variety of concentration-dependent growth behaviours. Furthermore we also demonstrate, both computationally and experimentally, that asymmetric interactions can limit the growth of such non-deterministic tile sets.
Network analysis of chemical flavour compounds
(Flavour 2013; Scientific Reports 2011)
Using network analysis we investigate the widespread hypothesis that foods with compatible flavours share chemical flavour compounds. Until now this hypothesis has relied on anecdotal rather than quantitative evidence. We construct a bipartite network of flavour compounds and ingredients, and compare it to large recipe data sets. This reveals that the shared compound hypothesis holds in some regional cuisines but not in others. More generally our analysis demonstrates how the type of large-scale data analysis that has transformed biology in recent years can lead to new results in other fields, such as food science.
Our article in Scientific Reports was the most downloaded article across all Nature Publishing Group journals in December 2011, exceeding 100,000 PDF downloads and HTML page views in the first four weeks following publication. It also received attention from the Scientific American, Nature News, New Scientist, The Huffington Post, The Technology Review, BioTechniques, and Ingeniøren, among others. A poster of the network between food ingredients can be downloaded here.
In the context of this work I also organised a Royal Society International Scientific Seminar in 2014, bringing together a wide range of experts including computational scientists, food scientists, neuroscientists, and chefs to discuss the impact of data science on food consumption and culinary culture.
Power graph compression of networks reveals dominant relationships
(Scientific Reports 2014; Molecular BioSystems 2013; see also Nature 2015)
We show that compression of complex networks into power graphs with freely overlapping power nodes allows us to detect dominant connectivity patterns in a wide range of different networks. This approach can be applied to undirected, directed and bipartite networks such as social networks, food webs and recipe-ingredient networks. When applied to genetic transcription networks we can assign meaning to power nodes by using GO term enrichment, which reveals that functional modules in genetic transcription networks are highly overlapping.
This method has also been used to map the functional organisation of the gene regulatory network in Arabidopsis responsible for xylem specification and secondary wall biosynthesis (Nature 2015).
Pattern detection in microarray data
(Science 2010; PLOS One 2008; Bioinformatics 2006)
Over the last decade, microarrays have generated an unprecedented amount of genetic expression data. Here we introduce an approach for detecting statistically significant patterns in these datasets without making prior assumptions about the nature of the pattern. This method is based on concepts from Algorithmic Information Theory.
I am also interested in genome statistics, Boolean networks, natural language processing, and Gaussian processes, among other things, and am co-organiser of the Cambridge Networks Network meetings. Past research interests of mine include quantum measurement and molecular dynamics.
Salvatore Tesoro (PhD)
Emma Towlson (PhD)
Sam Greenbury (PhD)
Pascal Grobecker (Part III)
Eniak Alarcon (summer student)
Giles Barton-Owen (summer student)
Laura Imperatori (summer student)
Robert Baldock (summer student)
Some of my collaborators, past and present:
Links to pages on various scientific and non-scientific topics.
BiProjector - an online tool for projecting bipartite networks
Imbrella - A free and invisible umbrella
How to play Go on a Hypercube
John Baez's Homepage
The Chocolate Revolution
The biggest number
The Clay Millenium Prize
The Klein Bottle Shop
The Complexity Zoo
'Math In LaTeX'
The CSS Zen Garden
The Simulation Argument
Minds, Machines and Gödel by John Lucas
Robert J. Lang's Origami Designs
The elgooG Google mirror
57 Optical Illusions
Social network fragmentation and community health
Proceedings of the National Academy of Sciences 10.1073/pnas.1700166114 (2017)
Structural properties of genotype-phenotype maps
Journal of the Royal Society Interface 14, 20170275 (2017)
Data-driven Methods for the Study of Food Perception, Preparation, Consumption, and Culture
Frontiers in ICT 4, 15 (2017)
Residue Geometry Networks: A Rigidity-Based Approach to the Amino Acid Network and Evolutionary Rate Analysis
Scientific Reports 6, 33213 (2016)
Nondeterministic self-assembly with asymmetric interactions
Physical Review E 94, 022404 (2016)
Form and function in gene regulatory networks: the structure of network motifs determines fundamental properties of their dynamical state space
Journal of the Royal Society Interface 13, 20160179 (2016)
Nondeterministic self-assembly of two tile types on a lattice
Physical Review E 93, 042412 (2016)
Genetic correlations greatly increase mutational robustness and can both reduce and enhance evolvability
PLOS Computational Biology 12(3): e1004773 (2016)
Principles of assembly reveal a periodic table of protein complexes
Science 350, 1331 (2015)
The organization of biological sequences into constrained and unconstrained parts determines fundamental properties of genotype-phenotype maps
Journal of The Royal Society Interface 12, 20150724 (2015)
Protestant letter networks in the reign of Mary I:
A quantitative approach
English Literary History 82, 1 (2015)
Structural and evolutionary versatility in protein complexes with uneven stoichiometry
Nature Communications 6, 6394 (2015)
An Arabidopsis Gene Regulatory Network for Xylem Specification and Secondary Wall Biosynthesis
Nature 517, 571 (2015)