Convolutional Neural Network Architectures for Predicting DNA-Protein Binding

Zeng H., Edwards M.D., Gifford D. K.(2015) "Convolutional Neural Network Architectures for Predicting DNA-Protein Binding".
Proceedings of Intelligent Systems for Molecular Biology (ISMB) 2016
Bioinformatics, 32(12):i121-i127. doi: 10.1093/bioinformatics/btw255.

Abstract: We present a systematic exploration of convolutional neural network architectures for predicting DNA sequence binding using a large compendium of transcription factor datasets. We identify the best-performing architectures by varying convolutional neural network width, depth, and pooling designs. We find that adding convolutional kernels to a network is important for motif discovery and the use of local max-pooling is important for differentiating bound versus unbound sequences when both sequences contain a factor’s cognate motif. We explore the sufficiency of training data in the performance of these learning approaches, and have created a flexible cloud-based framework that permits the rapid exploration of alternative neural network architectures for problems in computational biology.

Source code and documenation

Genomics-tailored deep learning platform to efficiently perform hyper-parameter tuning, training and testing: Caffe-based, Keras-based

Amazon Elastic Cloud (EC2) launcher that efficiently deploys deep learning models (and any software capsulated in Docker) on the cloud: Github

Docker version of DeepBind that is runnable on any GPU machine: Github

Docker version of DeepSEA that is runnable on any GPU machine: Training new model, Making predictions


Other supplementary data for the paper

Caffe model specification for models compared: files .

Training and testing data: Motif Discovery, Motif Occupancy .

DeepBind's prediction on motif discovery task: files


Contact

For questions or to request additional data please contact Haoyang Zeng (haoyangz@mit.edu), or David Gifford (gifford@mit.edu).


Last updated Nov. 2, 2016.