How is the “AlphaFold” AI that predicts the three-dimensional structure of proteins changing the world of biology? –GASINE

An artificial intelligence company under the umbrella of AlphabetDeepMindIs an AI that predicts the three-dimensional structure of proteins from amino acid sequence information in 2018Alpha foldingWas developed. After that, he was a journalist in the scientific journal Nature on the impact of AlphaFold, further improved and made open source in July 2021, on the world of biology.Ewen CallawayM. explains.

What’s Next for AlphaFold and the AI ​​Protein Folding Revolution

Proteins are substances involved in almost all biological processes such as muscle contraction, blood transport, light detection and energy transformation of food. Such a protein is a three-dimensional polymeric compound in which a large number of 20 types of L-amino acids are linked in a chain.Amino acid residuesHow called amino acid units are connected can only be understood from one-dimensional sequence information.

Although more than 200 million proteins have been discovered by man, most of them only have known amino acid sequences, and few have identified the three-dimensional structure of proteins. Since the three-dimensional structure of a protein is closely related to its behavior and function, it is not possible to infer the three-dimensional structure from the amino acid sequence.Protein folding problemHas been a major biological problem for many years.

Previous studies have been conducted to clarify the three-dimensional structure of proteins.Cryogenic electron microscopynuclear magnetic resonanceX-ray crystal structure analysisHowever, it is time-consuming and expensive, so in recent years, AI was expected to solve the folding problem. Developed by DeepMind in 2018, AlphaFold won the 2018 International Protein Structure Prediction (CASP) Competition, and the latest version of AlphaFold will be available at CASP 2020.Record the same level of precision as the experimental methodIt is said that he attracted more attention by doing this.

Before AlphaFold became open source, researchers relied on lectures by John Jumper and others who lead the AlphaFold team at DeepMind, “RoseTTAFoldI was developing my own AI tool. Finally, in July 2021, AlphaFold was released as open source, making AlphaFold widely available to researchers.

Artificial intelligence company DeepMind releases “AlphaFold” protein structure analysis algorithm as open source, making it available to everyone –GIGAZINE


“AlphaFold is a game changer. It’s like an earthquake. You can see its effects everywhere,” said Ora Schueler-Furman, a protein researcher at the Hebrew University of Jerusalem. .. “Every conference I attend, people say, ‘Why don’t you try AlphaFold?’ said Christine Orengo, a computational biologist at University College London. ..

In fact, attempts are underway to apply AlphaFold to protein research. A research team led by Martin Beck, a molecular biologist at the Max Planck Institute for Biophysics in Germany, has found that cellsnuclearSubstances entering and leaving thenuclear pore complexAnd dial itNucleoporinAbout the protein family called 2016To researchIntroduction of a model that covers about 30% of the nuclear pore complex. After that, the model was fitted using AlphaFold, which was made open source in 2021, and in October 2021, a model covering about 60% of the nuclear pore complex has been released.publicationI could do it.

DeepMind also plans to publish a total of over 100 million protein three-dimensional structure predictions by 2022. The 100 million number represents about half of known proteins, and is said to be hundreds of times more than proteins whose three-dimensional structure has been identified by the experimental method contained in the structural repository of the Protein Data Bank. (APB).

The graph below shows the number of “papers using AlphaFold” published by researchers, with a light orange color published in a scientific journal and a dark orange color uploaded to a preprint server. It can be seen that the number of articles has increased sharply since AlphaFold became open source in July 2021.

AlphaFold is trained on experimentally identified protein data from PBDs and other databases. Given the new amino acid sequence, AlphaFold first searches the database for related sequences to identify amino acids that tend to have similar conformations. The structure of existing related proteins also helps to estimate the distance between amino acids in the new amino acid sequence. Based on these different indices, AlphaFold predicts the three-dimensional structure of proteins.

So far, more than 400,000 people have been reported, according to DeepMind.European Institute of Molecular BiologyYou access the AlphaFold database managed by. Also, some users configure AlphaFold on their servers, try to predict the structure of proteins that are not in the database, and some users customize AlphaFold in their own way.

Many biologists are impressed with the accuracy of AlphaFold. Thomas Boesen, a structural biologist at Aarhus University in Denmark, conducted a test using AlphaFold to predict the three-dimensional structure of proteins that have not yet been made public, although his research team has experimentally elucidated the three-dimensional structure. . says that. As a result, AlphaFold was able to accurately predict the three-dimensional structure: “That’s a big check on my part.” “I have great confidence in AlphaFold based on what I’ve seen,” Boesen said.

It should also be useful for research on protein evolution and the origin of life by applying the mechanism of three-dimensional structure prediction from the gene sequence of AlphaFold proteins. Researchers typically compare gene sequences to determine how an organism’s genes are related across species, but for genes with long enough relationships, the sequence changes are too large for both. It can be difficult to see the relationship. However, by comparing protein structures that change more slowly than genetic sequences, it may be possible to uncover ancient relationships that have been overlooked so far. “This opens up a great opportunity to study protein evolution and the origin of life,” said Pedro Beltrao, a computational biologist at the Swiss Federal Institute of Technology.

On the other hand, AlphaFold is not an immediate solution for researchers who wish to understand the detailed three-dimensional structure of a specific protein, and ultimately requires an experimental decision. However, the three-dimensional structure prediction of AlphaFold is an approximate value which is useful when interpreting data obtained by experimental methods, and it is said to have led to accelerated research. “(AlphaFold) completely changed the direction of our research,” said Randy Read, a structural biologist at the University of Cambridge, saying that combining X-ray crystallography data with AlphaFold changed the approach.

AlphaFold was also designed to predict the shape of a single peptide chain, but just days after AlphaFold was open sourced, it was a protein researcher at the University of Tokyo.Yoshitaka Moriwaki“AlphaFold can also predict the interaction between two protein sequences,” he tweeted. DeepMind then released a feature called AlphaFold-Multimer that predicts the structure of protein complexes.

AlphaFold2 can also predict heterocomplexes. All you have to do is grab the two sequences you want to predict and connect them with a long linker.

— Yoshitaka Moriwaki (@Ag_smith)

Of course, AlphaFold cannot always predict the exact 3D structure, and it also has the ability to label the reliability of the prediction. The following three three-dimensional structure prediction diagrams are for “Good” when the three-dimensional structure prediction is successful, when “Bad” is not so successful, and when “Ugly” is almost unpredictable. Regarding the color coding of the three-dimensional structure, purple is the existing three-dimensional structure in PBD, blue is the very reliable three-dimensional structure, light blue is the very reliable three-dimensional structure, yellow is the unreliable three-dimensional structure. dimensional structure, and orange is the fairly reliable three-dimensional structure. Shows a low three-dimensional structure. It can be seen that the less reliable the three-dimensional structure prediction is, the more chaotic the spaghetti shapes are and the more yellow and orange parts there are.

One of the limitations of AlphaFold is that it is difficult to predict the effect of mutations on conformation because it relies on existing protein information in the database. It is also difficult for AlphaFold to predict how proteins will change shape due to the presence of other interacting proteins or molecules such as drugs.

Bryan Roth, a structural biologist at the University of North Carolina at Chapel Hill, said AlphaFold is definitelyG protein-coupled receptorHe said he was able to save research time by making accurate predictions on about half of the proteins, but pointed out that the other half was useless. Also, there seem to have been cases where the prediction failed even though the labeling reliability was quite high. Roth, which conducts drug discovery research,ligandIt is questionable whether AlphaFold is useful in drug discovery research because it may not be possible to predict the three-dimensional structure when it binds to drugs or drugs.

Although there are still problems with AlphaFold, it is expected that research using AlphaFold will continue to accelerate and various discoveries will be made. “Things are changing rapidly and we will see a very big breakthrough with AlphaFold next year,” said David Baker, a biochemist at the University of Washington. Janet Thornton, a computational biologist at the European Molecular Biology Laboratory, said one of AlphaFold’s biggest impacts has prompted biologists to change their minds from a computational and theoretical approach.

Jan Kosinski, a structural biologist at the European Molecular Biology Laboratory, said AlphaFold-inspired tools will allow the modeling of individual proteins and complexes, as well as whole cellular organs and protein molecules. “It’s a dream we will pursue for decades to come,” Kosinski said.

Copy the title and URL of this article

Leave a Comment