Type to search…

Deepchem

Introduction

During the early stages of drug discovery, virtual screening based on artificial intelligence/machine learning/deep learning has become an essential tool.

In this tool, a trained model is used to examine (test)

  • a catalogue of small molecules to identify potential drug candidates against a target protein (for drug discovery) or proteins of an organism against a drug (for drug repurposing).

The goal is to either predict

  • whether the small molecule interacts with the protein or not (called drug-target interaction prediction, which requires a classifier model with a binary output) or

  • an affinity value between the small molecule and the protein (called drug-target affinity prediction, which requires a regression model with a continuous-value output).

Machine learning-based virtual screening methods can be categorized into two types according to the input:

  • ligand-based (only the compound/ligand is given as input) and
  • pairwise input (both the compound/ligand and the protein are presented as input).

DepChem

https://www.openchemistry.org/gsoc/

DeepChemm

shell
uv install deepchem

Linux

Install uv:

shell
curl -LsSf https://astral.sh/uv/install.sh | sh
source $HOME/.local/bin/env

We are going to use a model based on tensorflow, because of that we’ve added [tensorflow] to the uv add command to ensure the necessary dependencies are also installed

shell
uv init chem
cd chem
uv add deepchem tensorflow[and-cuda]

Install PyCharm

shell
tar xzf pycharm-*.tar.gz -C
cd pycharm-*/bin
sh pycharm.sh
python
import deepchem as dc

tasks, datasets, transformers = dc.molnet.load_delaney(featurizer='GraphConv')
train_dataset, valid_dataset, test_dataset = datasets

model = dc.models.GraphConvModel(n_tasks=1, mode='regression', dropout=0.2,batch_normalize=False)
model.fit(train_dataset, nb_epoch=100)

metric = dc.metrics.Metric(dc.metrics.pearson_r2_score)
print("Training set score:", model.evaluate(train_dataset, [metric], transformers))
print("Test set score:", model.evaluate(test_dataset, [metric], transformers))

solubilities = model.predict_on_batch(test_dataset.X[:10])
for molecule, solubility, test_solubility in zip(test_dataset.ids, solubilities, test_dataset.y):
    print(solubility, test_solubility, molecule)

Medium- Google

Tutorial

TODO