A Study in Red Herrings

I was recently assigned a programming assignment as part of the application process for a job. While I’ll respect the confidentiality of the actual coding assignment (it was weird), I can talk about the study tips they gave us in the homework invitation email, as these essentially had nothing to do with the actual assignment.

Applicants were encouraged to bone up on multi-layer dense neural networks, aka multi-layer perceptrons, using TensorFlow and TensorBoard. To get ready for the assignment, I built two six-layer MLPs at different levels of abstraction: a lower-level MLP using explicit matrix multiplication and activation, and a higher-level MLP using tf.layers and tf.contrib.learn. I used the iris, wine, and digits datasets from scikit-learn as these are small enough to iterate over a lot of variations without taking too much time. Although the exercise didn’t end up being specifically useful to the coding assignment, I did get more familiar with using TensorBoard and tf.summary commands.

Although my intention was to design identical models using different tools, and despite using the same Adam optimizer for training, the higher-level abstracted model performed much better (often achieving 100% accuracy on the validation datasets) than the model built around tf.matmul operations. Being a curious sort I set out to find out what was leading to the performance difference and built two more models mixing tf.layers, tf.contrib.learn, and tf.matmul.

In genetics research it’s common practice to determine relationships between genes and traits by breaking things until the trait disappears, than trying to restore the trait by externally adding specific genes back to compensate for the broken one. This would go fall under the terms “knockout” and “rescue,” respectively, and I took a similar approach here. My main findings were:

  • Replacing tf.matmul operations with tf.layers didn’t have much effect. Changing dropout and other hyperparameters did not seem to effect the low-level and high-level models differently.
  • “Knocking out” the use of learn.Estimator.fit from tf.contrib.learnand running the training optimizer directly led to significantly degraded performance of the tf.layers model.
  • The model built around tf.matmul could be “rescued” by training with learn.Estimator.fitinstead of train_op.run.
  • The higher-level model using layers did generally perform a little better than the lower-level model, especially on the digits dataset.

Cross-validation curves demonstrating the training efficacy of the different models are shown below:

Cross-validation accuracy curves for different random seeds using the tf.layers model.

Cross-validation accuracy curves for different random seeds using the tf.matmul model.

These MLPs perform pretty well (and converge in just a few minutes) on the small sklearn datasets. The four models are built to be readily modifiable and iterable, and can be accessed from the Git repository