Open Journal Systems

Drug discovery is a painstaking process that often takes biomedical scientists more than a decade, is extremely expensive, and has about a one-in-fifty success rate (1). However, exciting research developments show that automation could greatly speed up and enhance drug discovery in the near future. For instance, a new automated drug discovery unit at the Rosalind Franklin Institute in the UK promised to produce new drugs up to 10 times faster (1). By utilizing the ability of artificial intelligence to make informed predictions based on data, drug design and testing can become more efficient and inexpensive. 

After examining the basic manual drug discovery process, the advantages of automated drug discovery are clear. First, the drug must first be carefully designed. Some candidate drug molecules or “lead molecules,” are chosen to be tested. Chemists use expert knowledge to gradually tweak and optimize the lead molecules by adding and subtracting functional groups, groups of atoms and bonds with specific properties (2). Second, the drug’s effects must be characterized as thoroughly as possible through lots of testing. It is infeasible to perform an experiment for every combination of cell type or genetic mutation, so researchers must either exhaustively test a few conditions or select some experiments to perform to predict the results of the experiments that were not performed (3). Humans are not particularly well-suited for either of these tasks because they require reasoning about many different possible outcomes (3). Thus, computer-powered automation could greatly improve drug discovery speed and accuracy.

To improve the drug design process, researchers from MIT developed a computer model that can select better lead molecule candidates based on desired properties (2). Other existing systems have attempted to automate molecule design but sometimes generate molecules that are invalid under chemical rules. These systems use “simplified molecular-input line-entry systems,” or SMILES, where “long strings of letters, numbers, and symbols represent individual atoms or bonds that can be interpreted by computer software” (2).

MIT’s model solves SMILES' issue by representing lead molecules with digital representations, called graphs. The graph’s nodes represent atoms and its edges represent bonds. Next, the model uses a neural network that breaks down an input molecular graph into clusters that represent functional groups in a scaffold tree structure. Then, both the scaffold tree and molecular graph structures’ molecules are grouped by similarity when they are turned into vectors. This makes finding and modifying molecules easier when given the desired property. Finally, the model can use a prediction algorithm to choose molecular functional groups to edit to achieve a higher potency score. This new method of molecular representation proved that automated drug design can be both fast and accurate. In their first test, the MIT researchers’ model generated 100 percent chemically valid molecules with improved solubility from a sample distribution, compared to SMILES models that generated only 43 percent valid molecules from the same distribution (2). Now, the next step towards automated drug design is developing a model that can find and enhance molecular properties by using a more limited amount of molecular graph (2).

Scientists are also researching how to automate testing of drugs after they have been designed. Biomedical scientists have already invested much effort in making it easier to perform numerous experiments faster and cheaper  for drug testing. Unfortunately, this process usually involves trial and error with numerous combinations of chemicals and targets, with many of these experimental outcomes being predictable and a waste of resources. CMU researchers recently addressed this problem by developing a computer model that can intelligently pick the most productive chemical and target combinations to experiment. The model uses active machine learning to learn the effects of chemicals on protein targets. The model had no prior knowledge about the chemicals and targets, yet correctly predicted “the effects of 48 chemical compounds on the subcellular localization of 48 proteins while performing only 29% of all possible experiments” (3). Though studies with larger amounts of combinations tested should be conducted to confirm this method’s effectiveness, such a reduction in the number of experiments needed shows automated drug testing could greatly reduce the cost of discovering new drug and target interactions. This finding also demonstrates that similar machine learning models can be applied to optimize research processes, like analyzing older scientific literature for missed knowledge, and greatly speed up scientific research (4).

By taking advantage of the most recent technologies in computing, automated drug discovery could soon become a reality. Manual drug discovery is slow and error-prone, but scientists have begun to computerize drug design and testing with promising results. Additional research in modeling drug molecules and experiment-predicting models can help finalize automated drug discovery development. In the future, automated drug discovery may revolutionize the pace of innovation in the pharmaceutical industry and make life-saving drugs available faster and cheaper for all people.



  1. The Pharma Letter. (2018, July 08). First Fully-Automated Drug Discovery Unit. Retrieved from
  2. Matheson, R. (2018, July 06). Automating Molecule Design to Speed Up Drug Development. Retrieved from
  3. Naik, A. W., Kangas, J. D., Sullivan, D. P., & Murphy, R. F. (2016, February 03). Active Machine Learning-Driven Experimentation to Determine Compound Effects on Protein Patterns. eLife. doi:10.7554/eLife.10047
  4. Mok, K.. (2019, July 12). AI Makes New Scientific Discoveries by Analyzing Old Research Papers. Retrieved from