Google open sourced their machine intelligence library TensorFlow the 1st of November 2015. The primary use case for the library is deep learning, but it can be used more generally as a distributed mathematical library, e.g. solving partial differential equations. I recently worked on a research project where I implemented a sequence model for a traditional NLP task with very good results. This post is not about the details of that model, but rather an evaluation of working with a high level tool like TensorFlow for doing deep learning.
Tensorflow’s core is implemented in C++ and CUDA C, but you write your application in Python (wrappers for other languages are on its way). Programming wise it is a bit different using TensorFlow compared to normal (imperative) programming since you design your computational graph of nodes that will be run later, e.g a mix of imperative and symbolic programming. Below is an image of a one hidden layer feed forward neural network on the MNIST digit recognition dataset. I will not go into details of how to use TensorFlow but rather share my experience using it. For those interested in learning TensorFlow have a look at the official tutorials.
To me there are three major advantages.
I had three weeks from start to article deadline. I had no prior experience with TensorFlow. It took me two weeks to implement a new state-of-the-art model, which is not very much time. Especially considering that the first week was basically just to learn the library. I would estimate that it would take me 2-3x times longer to implement the final model in C++. However, the biggest gain in productivity was the flexibility. E.g. to make the training of the the initial state to the LSTM cell configurable I only needed to
init_state = tf.Variable([...]) if train_init_state else tf.zeros([...])
But what about the gradients? You need not to worry about that, since your computational graph uses automatic differentiation (basically the chain rule on the composed function that is your graph). This feature is great, but nothing new, see theano.
Distributes in a cluster with(out) GPUs
The benefit of not coding towards a specific hardware configuration is tremendous. TensorFlow will use the resources you have. If it is not fast enough, get another gpu or add another node and run again without changing your code.
Easy to use in production
TensorFlow is excellent for research with its flexibility, but it is also designed for production. When you are happy with your model you can export the C++ code for the model and use on any device with a recent C++ compiler and call it from any language that can call C/C++ functions.
… and two disadvantages, of which one is hopefully temporarily.
The open source release is quite immature. A lot of operations don’t have a CUDA implementation, it’s currently not distributed (official reason is that it is entangled with internal apis, but will be released soon) and the performance are/were not as good as its competitors.
Once the graph is created there is not much flexibility left
When the graph is constructed you are supposed to execute it, not change it. This means that there is no way of deferring decisions with conditionals and loops, e.g. I had a lot of issues with variable length sequences. You can partly mitigate the problem by doing bucketing of sentences of similar lengths, but padding is still required.
I very much believe that the future of tools for computationally expensive program is to analyze high level programs and distribute it for you on the available accelerators and nodes. The question is if it will be TensorFlow or any other similar tool (mxnet).
At the moment TensorFlow has great traction, the github repo is quite active and Google has put a lot of effort into to library and the open source launch. Personally, I would gladly use TensorFlow for my next deep learning projects.