The race to find a coronavirus vaccine is on, with about 35 companies and academic institutions across the world working feverishly on the case. But Sars-CoV-2, the virus that causes Covid-19, is a novel, as well as a large and complex structure. The process of discovering a vaccine is complemented and accelerated by building a solid ground layer in knowledge about the virus. One of the projects helping to plug the gaps in our understanding is Folding@home, based at Stanford University. It’s a distributed computing project that links up the machines of ‘citizen scientists’ across the world willing to donate excess computing resources from their devices to help run simulations of disease proteins at scale.
For the past 20 years, the project has been mapping disease proteins involved in Alzheimers and cancer, but in late February it began modelling the protein structures of Covid-19 too. This decision prompted the project’s biggest ever spike in new volunteers signing up via downloadable software – around 600,000 so far, putting it on track to reach one million total users. The network is now operating at an ‘exaflop’ of computing power: 1,000,000,000,000,000,000 (a billion billion) operations per second.
Historically, vaccines contain enfeebled versions of the virus that trigger specific antibodies – priming the human body’s immune system to react effectively to the real thing. But in the case of Covid-19, most research groups around the world are developing newer ‘recombinant’ nucleic acid vaccines that contain scraps of the virus’ genetic code (DNA or RNA).
The ball was set rolling in mid-January when Chinese scientists published the full genome of the Covid-19 virus (all 29,903 nucleic bases). Scientists are able to use this information to single out sets of genes that correspond to specific proteins that make up the building blocks of the virus’ form – essential information to formulating a vaccine. But this is only the beginning.
The proteins of Covid-19 are constantly shuffling and rearranging in response to their environment, and it’s these “dynamical motions” that Folding@home’s molecular simulations are attempting to map. “In a nutshell, this means simulating in a computer how each atom in a very large biomolecule ‘wiggles and jiggles’ over time,” says Vincent Voelz, associate professor in theory and computation at Temple University and a member of Folding@home. These movements indicate how the virus functions. As Voelz puts it, Covid-19 proteins are “the nanoscale machines that the virus tricks an infected cell into making so it can propagate”.
Of particular interest to Folding@home, and research groups investigating Covid-19 more generally, is the S-protein making up the spikes on the virus’ outer shell, that it uses to gain access to human cells. Folding@home has created a simulation of the spike protein, that is composed of three interlocking proteins, and a pocket that helps the virus bind to human cells and infect them.
“The point of mapping proteins is to find out which parts of proteins the immune system might target,” says Jim Naismith, professor of structural biology at the University of Oxford. In Covid-19, the spike protein is a particularly popular binding spot for human antibodies, meaning it could be key to developing an effective vaccine. Scientists are “mapping all those epitopes [protein segments] where people are mounting good responses to them, and then they’ll test those antibodies in trials,” says Naismith.
Running computations to produce simulations of this type of biological puzzle is time and energy-intensive. Folding@home’s distributed network of computers is able to run calculations with greater speed and efficiency than any single computer could. In effect, large calculations are broken down into smaller ones that are run concurrently on thousands of displaced machines. The power of Folding@home’s distributed network is not directly comparable to one supercomputer, because the system is not operating as a single unit on a single problem. But if it was, it would be faster. The fastest supercomputers available today operate at a scale of hundreds of petaflops – between a third and a half of the speed of an exaflop.
Folding@home isn’t the only project directing vast quantities of computing power towards understanding Covid-19. In the US, a partnership including the US government, IBM, and others has began to grant promising Covid-19 projects access to 16 supercomputers. Summit, the world’s most powerful non-distributed computer system in the world, was tasked with identifying compounds that would be effective in binding to the spike proteins of the Covid-19 organism, thereby preventing the attachment of the virus to host cells. It came up with 77 matches.
Beyond brute computing force, artificial intelligence is also playing an increasingly important role in virus modelling. Traditionally, experiments to determine the structure have taken months or longer. But computational methods can provide a much speedier way to predict protein structures from amino acids sequences. In cases where the structure of a similar protein has already been experimentally defined, algorithms based on ‘template modelling’ can provide accurate predictions of the protein structure. Google’s DeepMind recently announced AlphaFold, a deep learning system that focuses on predicting protein structure accurately when no structures of similar proteins are available, called ‘free modelling’.
While Folding@home’s work is not pitched directly at creating a vaccine, it’s useful for modern computational drug discovery, which relies on sampling the many possible conformations of the proteins, and modelling how drug molecules might bind to them. At present, “there are not good experimental techniques that can probe these motions at the atomic scale that can be achieved with computational modelling”, says Voelz.
Computational mapping complements structural mapping of the virus using laboratory techniques such as cryogenic electron microscopy. “What you can do with computing is, if possible, use evolutionarily related proteins that we already know something about – the architecture and the active site – and then build a computing model using those,” says Tom Blundell, biochemist and structural biologist at Cambridge University.
Folding@home has been able to go one further. Voelz’s group at Temple University are partnering with researchers at the Diamond Light Source in the UK who have done groundbreaking work in solving over a thousand different crystal structures of the coronavirus main protease, and have discovered several drug fragments that bind to sites on the protein. Based on these initial fragment screening results, the computing power of Folding@home is being used to virtually screen a huge number of potential drug compounds – including those from the COVID Moonshot project – to prioritise which to synthesise and experimentally test.