An artificial intelligence owned by Google predicted the structure of almost all known proteins ; some 200 million essential molecules to understand the biology of all living beings on the planet and the mechanisms of some of the most prevalent diseases, from malaria to Alzheimer’s and cancer.
“This work ushers in a new era of digital biology,” celebrated Demis Hassabis, the 45-year-old programming and neuroscience expert who is the main creator of AlphaFold , the neural network system that was able to almost completely solve one of the biggest problems in biology.
The British Hassabis was a young chess and video game talent who founded Deepmind in 2010, a company focused on creating an artificial intelligence capable of learning like humans. In 2013, this system proved to be better than anyone playing video games from the Atari company. The following year, Google bought the company for around 500 million euros. In 2017, AlphaGo swept the top champions of Go, the highly complex chess-like Asian board game. Since then, Hassabis has focused his efforts on a much bigger challenge: predicting the three-dimensional shape that a protein will have by reading only its genetic sequence, written in two dimensions with DNA letters.
Knowing the three-dimensional structure of these molecules from their genetic sequence is essential to understand their function, but it is a problem of immense difficulty. It is like finishing a puzzle with tens of thousands of pieces without knowing what image it represents .
Until the appearance of this system, elucidating the shape of a single protein made up of 100 basic units —called amino acids— could take 13.7 billion years, the age of the universe. At best, it took scientists years using electron microscopy or huge particle accelerators like the European synchrotron in Grenoble, France. Instead, Google’s algorithm predicts the structure of any protein in a few seconds.
“This universe of proteins” is “a gift to humanity,” Hassabis stressed during the presentation of the new database, during a press conference held last Tuesday, together with scientists from the European Molecular Biology Laboratory (EMBL), a public institution that has collaborated in the development of AlphaFold.
High reliability
Before the arrival of this technology, the structure of some 200,000 proteins had been determined, a task that took 60 years and the participation of thousands of scientists . This database has been the learning material of Google’s artificial intelligence , which has searched for valid patterns that predict the shape of proteins whose only two-dimensional sequence is known. In 2021, the system has already solved the structure of a million proteins, including all human ones . This year’s new shipment extends the record to 200 million: practically all the known proteins of all living beings on the planet.
Access to this new database is open and free and the computer code of its artificial intelligence is open and downloadable . This Google of life shows the two-dimensional sequence of any protein and a three-dimensional model that indicates the level of reliability of the prediction, which has a similar or even lower margin of error than conventional methods.
It is important to note that AlphaFold does not determine reality, but rather predicts it. Read the genetic sequence and estimate the most likely way the amino acids will be configured. The prediction has high reliability, saving scientists a lot of time and money to do theoretical work without using expensive equipment to determine the actual structure of a protein until absolutely necessary.
The applications of this new tool are almost endless, since microscopic proteins are involved in any imaginable biological process, from the mass death of bees to the resistance of crops to heat, passing through an infinity of diseases.
The team led by Matt Higgins, from the University of Oxford, UK, used AlphaFold as part of their project to develop an antibody – a type of protein – capable of neutralizing one of the essential proteins for the malaria pathogen to reproduce. . Within years, this type of research could achieve the first highly protective vaccine against this disease, since it would prevent the transmission of the parasite from one person to another through mosquito bites.
Achievements
Another of the milestones already achieved is the most detailed structure to date of the nuclear pores, a donut-shaped complex of proteins that is the entrance and exit door of the nucleus of human cells and that is related to endless diseases, including cancer and cardiovascular diseases. This new tool allows unprecedented access to understand “how the recipe for life [written in the genome] comes into operation when it is translated into proteins,” Jan Kosinski, an EMBL researcher who co-authored this finding, explained to this newspaper.
Hassabis and the rest of those responsible for Deepmind and the EMBL affirmed that the possible risks involved in publishing this database and making it accessible to everyone were analyzed . “ The benefits are clearly greater than the threats ”, stressed the creator of the system, who added that in the future, as this technology develops, it will be the international community that must decide whether its use should be limited.
One of the most tangible applications is the design of tailor-made molecules that can block harmful proteins or, better yet, modulate their activity, a much more desirable effect in the design of new drugs, explains Carlos Fernández, CSIC scientist and group leader of structural biology of the Spanish Society of Molecular Biology. His team has used AlphaFold to elucidate part of the structure of a complex made up of several proteins essential for the propagation of the trypanosome that causes sleeping sickness that exists in sub-Saharan African countries.
Now years of work lie ahead to confirm whether the predictions are correct , explains biologist José Márquez, an expert in protein structure at the Grenoble synchrotron. “The next frontier will be that AlphaFold can contribute to the design of protein-blocking or protein-activating drugs , a problem they are already addressing,” he explains. Another stumbling block: the system does not say why a protein gets its final shape, something that can be essential in the investigation of diseases such as Alzheimer’s or Parkinson’s , related to incorrect protein folding.
Alfonso Valencia, director of life sciences at the National Supercomputing Center , talks about the shortcomings of the system. “Not everything is solved, because AlphaFold can only predict things that are in the domain of known things. For example, it cannot predict the structure of a type of protein that protects against freezing well because they are rare and there are not many examples in the databases. It also cannot predict the consequence of mutations, which is a very negative point in medicine”, he highlights.
It also acknowledges one of its strengths: that the code for the entire system is open, meaning that other scientists can improve or modify it as they please, even if Google decides to take the system offline. “It is evident that the people of Deepmind are seeking to win the Nobel Prize by acting in this transparent way,” says Valencia. “On the one hand, they get a big image and an advantage over their competitors, like Facebook. On the other hand, they have already suggested that they reserve the private use of specific data on health and for the design of drugs, ”she adds.