The breakthrough in protein structure prediction

Proteins are the essential agents of all living systems. Even though they are synthesized as linear chains of amino acids, they must assume specific three-dimensional structures in order to manifest their biological activity. These structures are fully specified in their amino acid sequences — and therefore in the nucleotide sequences of their genes. However, the relationship between sequence and structure, known as the protein folding problem, has remained elusive for half a century, despite sustained efforts. To measure progress on this problem, a series of doubly blind, biennial experiments called CASP (critical assessment of structure prediction) were established in 1994. We were part of the assessment team for the most recent CASP experiment, CASP14, where we witnessed an astonishing breakthrough by DeepMind, the leading artificial intelligence laboratory of Alphabet Inc. The models filed by DeepMind's structure prediction team using the program AlphaFold2 were often essentially indistinguishable from experimental structures, leading to a consensus in the community that the structure prediction problem for single protein chains has been solved. Here, we will review the path to CASP14, outline the method employed by AlphaFold2 to the extent revealed, and discuss the implications of this breakthrough for the life sciences.

One of the key pieces of information required to understand a biological process is the structure of its constitutive proteins, but experimental approaches to structure determination are often timeconsuming, laborious, and have uncertain outcomes, requiring large investments of time and resources. In contrast, protein sequences are readily obtained by translating genomic sequence and are available in great abundance. Since the structure of a protein is fully specified by its sequence, attempts to deduce one from the otherknown as the protein folding problemhave been ongoing for half a century, rising in importance with the exponential growth in sequence databases and in frustration with the succession of methods that failed to bring decisive advance. Indeed, starting with the first decade of this century, there was a growing realization in the protein science community that this problem was one of the grand challenges of computational biology [1].
Things did not start this way. The secondary structures modeled by Linus Pauling from stereochemical considerations of the polypeptide chain [2,3] and soon afterwards the demonstration that such secondary structures could be assembled into three-dimensional models for α-keratin [4,5] and collagen [6,7] led to the expectation that a combination of geometric considerations, model-building, and parametric equations could solve the principles of protein structure, as they had already done for nucleic acids. However, the first protein crystal structures and their astonishing irregularity gave way to the realization that these principles might be considerably more complex than expected [8].
Despite this, excitement at the beginning of the 1990s about progress achievable through simplified biophysical representations of the polypeptide chain [9,10] and threading [11,12] led to the perception of rapid, often decisive, advance in deducing structure from amino acid sequence. This was, however, not matched by real-life applications of these methods, and it became apparent that some of the reported successes might have been due to 'postdiction', that is to the prediction of targets whose structure was already known to the predictors. To obtain an objective assessment of the state of the art in protein structure prediction, a group of scientists led by John Moult of the University of 2017 with the demonstration that deep learning methods could not only extract high-quality contact maps from multiple alignments in this way, even in cases with few homologs, but also interpret the predicted contacts into a set of distances that allowed for a finer-grained geometric fingerprint for the underlying fold [28]. The application of convolutional neural networks for distance map prediction was used by leading structure prediction groups in CASP13 (2018) and had a powerful effect on the hard targets, for which the GDT-TS of the best models went from around 40 to over 60 (see slide 19 in [23]).
Among the groups scoring highly in CASP13 was an unexpected newcomer, AlphaFold, fielded by DeepMind, the leading artificial intelligence laboratory of Alphabet Inc. To everybody's surprise, this group bested all participants with the key insight that the probability distribution of the distance map could be converted to a protein-specific statistical potential, which could generate the protein fold by minimization [29,30]. While AlphaFold's lead in CASP13 was larger than the typical distance between the firstand second-ranked groups at previous CASP experiments, its overall performance was more incremental than transformational, providing the best model in only about a third of the cases, albeit with a larger lead for harder targets than for easier ones (Figure 1).
No one was, therefore, prepared for the transformational performance of AlphaFold's second incarnation, AlphaFold2, at CASP14, where it placed far ahead of all other participants, achieving a median GDT-TS of 92.4 for its predictions! To recall, this is in the range of experimental structures, leading many to conclude that the structure prediction problem for single protein chains was now solved, as stated by John Moult in his concluding remarks to the CASP14 conference. A comparison of AlphaFold2 predictions with the best models submitted by other groups (Figure 1) makes the extent of the advance clear, as AlphaFold2 predictions usually had GDT-TS scores >80 even for the hardest targets (structure correct in most details), while the second-best models for these targets were below 60 (overall topology correct).
As an illustration of this, we would like to briefly recount the case of target T1100, an archaeal transmembrane receptor, for which AlphaFold2 submitted a model with GDT-TS around 80 and the next best groups models with GDT-TS around 55 ( Figure 2). Our group entered this target as a result of an online meeting of organizers and assessors in August 2020, at which the astonishing predictions of group 427 (revealed later to be AlphaFold2) were brought to a point succinctly by Nick Grishin, one of the assessors: 'So, either this group is close to solving the folding problem or they cheated somehow'. In response, we mentioned that we had diffraction data for a transmembrane receptor, which we had failed to solve for almost a decade because of phasing problems. Would group 427 file models sufficiently good to solve the dataset by molecular replacement? Surely there was no way to cheat on this. The short of it is that the structure could be solved readily with the AlphaFold2 models. Other submitted models had good overall topology, but many local departures from the structure, making them poor templates for molecular replacement ( Figure 2). As an interesting side aspect of this, 12 of the 20 highest-ranking groups for this target submitted the co-ordinates of a public Figure 1. Graph illustrating the predictive success of AlphaFold in CASP13 (orange; darker dots; 114 models) and AlphaFold2 in CASP14 (blue; darker dots; 93 models) relative to the best models entered by any other group (lighter dots). The latter are ordered along the x-axis by ascending GDT-TS. The graph was inspired by the blog of Mohammed AlQuraishi [29]. prediction server as their best answer, occasionally with minor attempts at refinement. The server, tFold, is run by the AI laboratory of the Chinese technology company Tencent, showing that DeepMind is not the only company laboratory interested to enter the fray.
What allowed AlphaFold2 to build this commanding lead? A more detailed evaluation will have to await the publication of the method in the CASP14 proceedings, but from the presentation of John Jumper for the AlphaFold2 team at the CASP14 conference [31] and the opinion of experts in the field [29,32], the architecture of the prediction network has changed in fundamental ways. Whereas AlphaFold used convolutional neural networks for distance map prediction and applied gradient descent optimization to construct models from these restraints, AlphaFold2 built an end-to-end network for which the model parameters could be tuned jointly, from the sequence input to the structure output, in order to optimize the final model instead of proxy measures along the way. Such end-to-end training for network optimization was proposed by Mohammed AlQuraishi after CASP13 [33] and was shown here to be an important component in predictive success. Furthermore, AlphaFold2 used attention modules to derive distance constraints and built structural models from them with 3D equivariant transformer neural networks, which operate directly on atoms in three-dimensional space. Attention modules, which originated in natural language processing (for an excellent presentation on them see [34]), do not derive summary statistics from the input multiple sequence alignment, but choose a subset of sequences to focus on and derive a first distance map, on the basis of which they decide which sequences to focus on in the next iteration. In this way, through iterative optimization (which may have required more than a hundred rounds in some cases), the network can extract a richer set of constraints even from sequence alignments that contain few full-length homologs, accounting for its particularly impressive performance on hard targets relative to all other methods. The overall strategy of this network architecture seems to aim for the best local solutions in order to assemble the global model from them and this has clearly been highly successful. (red), whose model was used to solve the experimental dataset by molecular replacement, is clearly leading the field by a wide margin, ahead of the best human group (Baker; green) and the best server (tFold; cyan). The co-ordinates predicted by the tFold server, which is public, were filed by 11 other groups as the best prediction (highlighted in ochre), occasionally with minor tweaks (light ochre). (B) Superposition of the AlphaFold2 model So, has DeepMind solved the protein folding problem? In its basic formdeducing the native structure of a protein from its amino acid sequencethe answer from CASP14 appears to be yes for most proteins, provided the program has access to the protein sequence and structure databases, and the target protein is folded. Objections that a solution implies understanding or that the prediction is not made from the single amino acid sequence boil down to semantics, in our opinion. However, the protein folding problem is more complex than just deducing a static three-dimensional structure from a sequence. A protein sequence not only contains the information for the structure, but also for the path by which this structure will be reached, for the dynamic adjustments it will undergo in response to changing conditions and binding partners, for the components of the cellular machinery it will need to engage to reach its native location. From the information in its sequence, a protein can recognize its binding partners (copies of itself, other proteins, cellular structures such as the membrane, small molecules) and know whether it will alter these by catalysis or through conformational changes, and whether it will fold or unfold conditionally upon encountering them. All these aspects, which are currently outside the scope of AlphaFold2, are essential for the biological function of proteins and scientists are understandably most excited about them. We would therefore conclude that no, AlphaFold2 was not the last step towards solving the protein folding problem, but rather the first step on a very exciting new path towards goals in protein structure prediction that may now have come within reach.
Does this mean that the advance obtained by AlphaFold2 has been hyped and is in fact not all that impressive? Definitely no to this as well. We find that the advance is absolutely astounding, something we have stressed repeatedly in our contributions to the CASP14 media coverage (see for example [35,36]). We think that the long, arduous journey to this breakthrough, involving some of the brightest minds in biophysics and computational biology, is ample evidence for the magnitude of this achievement. Indeed, the need to introduce deep learning methods for this advance prompts us to ask whether the structure prediction problem may have been too hard for the human mind to solve. Paraphrasing J.B.S. Haldane, who suspected that the universe is not only stranger than we suppose, but stranger than we can suppose, might the problem have been harder than we could have solved?
We fear that this is the case and that one of the reasons for the success of end-to-end training is the elimination of human bias. Decades of efforts by highly trained scientists and many billions of dollars in public investment clearly produced the data needed to breach the problem, but the breakthrough required computational networks which, unlike the human brain, were optimized for the analysis of non-linear correlations. Like so many other groupsathletes and chess players to name twowe will have to become used to the fact that machines have capabilities beyond our biological range. We look forward to what we think will be a wave of advanced prediction servers, both from the leading academic groups and from companies with advanced machine learning capabilities, which will make the structure space of proteins as broadly and rapidly accessible as BLAST did for the sequence space 25 years ago, marking a similar revolution in the life sciences.