An end-to-end, integrated chemical research system unveiled by IBM last week gives us a glimpse of how artificial intelligence, robotics and the cloud might change the future of drug discovery.
And it’s a good time as any to see some a breakthrough in the field.
The world is still struggling with the covid-19 pandemic, and the race to the find a vaccine for the dangerous novel coronavirus has not yet yielded reliable results. Researchers are bound by travel and social distancing limitations imposed by the virus, and for the most part, they still rely on manual methods that can take many years. While in some cases, such delays can result in inconvenience, in the case of covid-19, it means more lives lost.
Called RoboRXN, IBM’s new system leverages deep learning algorithms, IBM’s cloud, and robotic labs to automate the entire process and assist chemists in their work without requiring physical presence in a research lab. After seeing the presentation by IBM Research, I would describe RoboRXN as an example of bringing together the right pieces to solve a pressing problem.
It’s not yet clear whether this or any of the other efforts led by other large tech companies will help facilitate in developing the coronavirus vaccine. But they will surely help lay the groundwork for the next generation of drug and chemicals research tools and make sure we are more prepared in the future.
Using AI for chemical synthesis and retrosynthesis
IBM’s RoboRXN is the culmination of three years of research and development in applying AI to chemical research. In 2017, the company developed an AI system for predicting chemical reactions in forward synthesis.
Hypothesizing about chemical reactions and experimenting with different chemical components is one of the most time-consuming parts of chemical research. It requires a lot of experience, and chemists usually specialized in specific fields, making it challenging for them to tackle new tasks.
IBM’s AI is a neural machine translation system tailored to chemical synthesis. Artificial neural networks have made great inroads in natural language processing in recent years. While neural networks do not understand the context of human language, their broader capabilities in processing sequential data can serve many fields, including chemical research.
For instance, recurrent neural networks (RNN) and transformers can perform sequence-to-sequence mapping. Train an RNN on a set of input strings and their corresponding output strings, and it will find statistical correlations that map the inputs to outputs (you still need quality data, though). These strings can contain any kind of symbols, including letters, musical notes, or character representations of atoms and molecules. As long as there is consistency in the data and there are patterns to be learned, the neural network can find a way to map the inputs to the outputs.
Trained on a dataset of more than 2 million chemical reactions, the neural network was first introduced in a paper presented by the IBM Research team at the NIPS 2017 AI conference. The next year, IBM developed the AI into RXN for Chemistry, a cloud-based platform for chemical research, and presented it at the American Chemical Society annual exposition. RXN for Chemistry aids chemists in predicting the likely outcome of chemical reactions, saving research time, and reducing the years it takes to acquire experience.
In 2019, the IBM Research team improved the AI behind RXN for Chemistry to also support retrosynthesis. This is the inverse process of chemical synthesis. In this case, you already know the molecular structure you want to achieve. The AI must predict the series of steps and chemical components needed to reach the desired result.
“The retrosynthesis planning model models were developed in collaboration with retrosynthesis experts from the University of Pisa, who constantly gave us feedback how to improve our models,” Teodoro Laino, the manager of IBM Research Zurich, told TechTalks.
IBM RXN for Chemistry also has the possibility to design retrosynthetic routes in an interactive mode.
In the interactive mode, the human chemist goes through the route step by step, getting suggestions by the AI at each stage. “Chemical synthesis becomes a human-AI interaction game,” Laino says.
Bringing the AI pieces together
Philippe Schwaller, predoctoral researcher at IBM Research Zurich, told TechTalks that the final AI system used in RoboRXN is composed of several sequence-to-sequence transformer models, each performing one part of the task.
“Given a target molecule, RoboRXN breaks it down in multiple recipe steps using predictions by a retro reaction prediction and a pathway scoring model until the system finds commercially available molecules,” Schwaller said. “Then, for each step in the recipe, the reaction equations are converted using another seq-2-seq transformer model to all necessary actions, which the robot has to perform, to successfully run the chemical reaction. This model predicts reaction conditions (e.g. temperature, duration) for the different actions (e.g. add, stir, filter).”
In the process of creating the AI, the team published their findings in several peer-reviewed journals and made their AI models available on a GitHub repository. Their latest paper, published in Nature in July, explores the use of transformers to translate the chemical experiments written in open-prose to distinct steps. This is a key component in integrating the AI system with robo-labs, which expect distinct commands.
“For a given target molecule, RoboRXN provides not only a recipe made of multiple chemical reactions that would lead from commercially available molecules to the target molecule, but is also able to generate for each step in the recipe, the specific actions that a robot or human has to perform to successfully run the reaction step,” Laino says.
To draw an analogy with cooking, if you ask the system how to cook pizza, one AI layer will predict the ingredients, and a second will predict the sequence of operations to go from the ingredients to the final dish.
“In all cases, the AI can choose between several predictions. We provide the ones with the highest confidence score, but a user can always override the recommendations and give human feedback,” Laino says. In Wednesday’s presentation, the team showed how a user could jump in the process by adding, removing, or modifying the steps predicted by the neural networks.
Tackling explainability issues
A pure neural network–based approach comes with some benefits. The AI models scale well with the availability of data. And the system will benefit from all the research going into deep learning in general and transformers in particular.
But deep learning comes with interpretability challenges. Neural networks are very good at finding and exploiting correlations between different data points in their training corpus, but those correlations do not necessarily have causal value can yield erroneous results. The scientists employing the system should be able to explore and correct the reasoning used by the AI system.
The fact that the system provides a step-by-step procedure of creating the target molecule provides a level of explainability, making it easier for scientists to review the entire process. But the IBM researchers acknowledged that providing more granular explanations of the individual steps is still a work in progress.
Schwaller told TechTalks that the team has investigated BERT and ALBERT, two other transformer-based neural network architectures, to improve the interpretability of the predictions, classify them into named reactions, and link the predicted reactions back to similar reactions in the patents. The researchers have published their findings in two separate papers published in the ChemRxiv preprint server.
“Recently, we have also investigated why language models learn organic chemistry and chemical reactions so well and discovered that, without human labelling or supervision Transformer models capture how atoms rearrange during a chemical reaction,” Laino adds. “From this so-called atom-mapping signal we can extract the rules and grammar of chemical reactions and make our prediction models more interpretable.”
The team has developed a visualization tool for the RXN AI models and made it available online.
Integration with the robotics lab
The original idea for the fully automated chemistry lab came when IBM presented RXN for Chemistry at the American Chemical Society annual exposition in 2018. “It was surprising to see that irrespective of the flaws that every data-driven model has (including RXN) the reaction of the chemical community was overwhelming—we actually had a line of people at our booth to try out demo,” Laino said. “We saw the real potential in front of us. I asked myself: Can an AI model drive an autonomous chemical lab?”
After discussing the idea with the rest of the team, the idea of RoboRXN was conceived. “The rest was only an intense but gratifying run to build everything: the remaining AI models, the integration of commercially existing hardware and the deployment of all services in the cloud,” Laino says.
During the online presentation, Laino and his team ran a hypothetical experiment with RoboRXN. A user connected to the IBM Cloud application and provided a target molecule to RoboRXN. The AI system processed the request and provided a suggestive instruction set for the experiment. After the user tweaked and confirmed the result, the instructions RoboRXN fed the commands to the robotic research lab and the experiment was kicked off. A live camera view allowed us to follow the steps as the robotic lab conducted the experiments.
The hardware used in the project is already commercially available, making it possible to integrate it with robotic labs organizations already have in place.
“Rather than developing our own hardware we decided to use industry standard hardware and use AI and Cloud to solve the issue of programming and accessing the robot remotely,” Laino said. “The project is hardware agnostic. Different types of hardware can be easily interfaced.”
The team also envisions RoboRXN to scale and run parallel experiments. Research labs can use the platform to coordinate operations across multiple labs and speed up the process of testing hypothesis and gathering the results.
Research during the pandemic and beyond
Automated tools such as RoboRXN could give a boost to research labs and scientists who have been constrained by the covid-19 lockdown.
“The pandemic rang a bell to each of us on how to integrate all existing digital solutions to avoid similar disruption in the future. Lab chemists, even today, are facing severe limitations to come back to work,” said Matteo Manica, machine learning researcher at IBM. “Computational scientists can work remotely, accessing supercomputing resources available online. We decided to provide the same at the level of a chemical lab. A chemical laboratory accessible remotely, that is supervised by AI and executed by robotic chemical hardware.”
But the benefits can go beyond just providing remote access and help direct the cognitive capacity of human scientists where it is needed most.
“RoboRXN can be considered for chemists what robotic vacuum cleaners are for humans. They do not necessarily make things faster, but they make things in a very reproducible way and during their work, you can focus on doing something else,” Laino said.
The increased adoption of automated labs will also generate more digital data, which can help improve the performance of the AI models in the future. Organizations can use the IBM Cloud to run RoboRXN and store the results obtained from the robotic labs. Alternatively, they can have the entire system installed on-premise or in a private cloud. IBM does not currently have plans to use data obtained from RoboRXN to finetune its AI models. Researchers using the platform can, however, integrate their own results with other open datasets and use them to train the deep learning models IBM has publicly made available.