ICLR 2016

ICLR 2016

Basic Information

View of the hotel


May 2 - 4, 2016


Caribe Hilton, San Juan, Puerto Rico

We recommend landing at the Luis Munoz Marin International Airport. The best way to get from this airport to the hotel is by taxi, which is about a 15 minute drive and costs around $20.

Important: Local transmission of the Zika virus has been reported in Puerto Rico. While in most cases the symptoms of Zika are mild, women who are pregnant and women and men who may conceive a child in the near future have more reason to be concerned about the virus.

The US Centers for Disease Control have reliable and current information on the Zika virus here:


We recommend that anyone with concerns about Zika virus review this information.

Call for Papers (Main Track)

For instructions on the submission process, go here.

Call for Papers (Workshop Track)

For instructions on the submission process, go here.

Registration and Hotel Reservations

To register and make hotel reservations, go here.

On the day of the meeting, come pick up your badge at the Grand Salon Los Rosales, which is right next to the hotel (ask the staff of the hotel for more directions).

Conference Wireless Access

network: hmeeting
password: iclr16

Video recordings of talks

Talks are now available on videolectures.net:


Discussion, Forum, Pictures on the ICLR Facebook Page

Feedback Poll

We've created a poll to gather feedback and suggestions on ICLR:


Please participate by upvoting the suggestions you like or adding your own suggestions.


Senior Program Chair

Hugo Larochelle, Twitter and Université de Sherbrooke

Program Chairs

Samy Bengio, Google
Brian Kingsbury, IBM Watson Group

General Chairs

Yoshua Bengio, Université de Montreal
Yann LeCun, New York University and Facebook

Area Chairs

Ryan Adams, Twitter and Harvard
Antoine Bordes, Facebook
KyungHyun Cho, New York University
Adam Coates, Baidu
Aaron Courville, Université de Montréal
Trevor Darrell, University of California, Berkeley
Ian Goodfellow, Google
Roger Grosse, University of Toronto
Nicolas Le Roux, Criteo
Honglak Lee, University of Michigan
Julien Mairal, INRIA
Chris Manning, Stanford University
Roland Memisevic, Université de Montréal
Joelle Pineau, McGill University
John Platt, Google
Marc'Aurelio Ranzato, Facebook
Tara Sainath, Google
Ruslan Salakhutdinov, University of Toronto
Raquel Urtasun, University of Toronto



We are currently taking sponsorship applications for ICLR 2016. Companies interested in sponsoring should contact us at iclr2016.programchairs@gmail.com.








Conference Schedule

Date Start End Event Details
May 2 7:30 8:50 breakfast San Cristobal Ballroom [Sponsored by Baidu Research]
8:50 12:30 Oral Session - Los Rosales Grand Salon A&B
8:50 9:00 opening Opening remarks
9:00 9:40 keynote Sergey Levine (University of Washington): Deep Robotic Learning
9:40 10:00 oral Neural Programmer-Interpreters by Scott Reed, Nando de Freitas (Best Paper Award Recipient)
10:00 10:20 oral Regularizing RNNs by Stabilizing Activations by David Krueger, Roland Memisevic
10:20 10:50 coffee break
10:50 11:30 keynote Chris Dyer (CMU): Should Model Architecture Reflect Linguistic Structure?
11:30 11:50 oral BlackOut: Speeding up Recurrent Neural Network Language Models With Very Large Vocabularies by Shihao Ji, Swaminathan Vishwanathan, Nadathur Satish, Michael Anderson, Pradeep Dubey
11:50 12:10 oral The Goldilocks Principle: Reading Children's Books with Explicit Memory Representations by Felix Hill, Antoine Bordes, Sumit Chopra, Jason Weston
12:10 12:30 oral Towards Universal Paraphrastic Sentence Embeddings by John Wieting, Mohit Bansal, Kevin Gimpel, Karen Livescu
12:30 14:00 lunch On your own
14:00 17:00 posters Workshop Track Posters (May 2nd) - Los Rosales Grand Salon C,D,E and Garita & Cariba Salons [Sponsored by Facebook]
17:30 19:00 dinner San Cristobal Ballroom
May 3 7:30 9:00 breakfast Las Olas Terrace [Sponsored by NVIDIA]
9:00 12:30 Oral Session - Los Rosales Grand Salon A&B
9:00 9:40 keynote Anima Anandkumar (UC Irvine): Guaranteed Non-convex Learning Algorithms through Tensor Factorization
9:40 10:00 oral Convergent Learning: Do different neural networks learn the same representations? by Yixuan Li, Jason Yosinski, Jeff Clune, Hod Lipson, John Hopcroft
10:00 10:20 oral Net2Net: Accelerating Learning via Knowledge Transfer by Tianqi Chen, Ian Goodfellow, Jon Shlens
10:20 10:50 coffee break
10:50 11:30 keynote Neil Lawrence (University of Sheffield): Beyond Backpropagation: Uncertainty Propagation
11:30 11:50 oral Variational Gaussian Process by Dustin Tran, Rajesh Ranganath, David Blei
11:50 12:10 oral The Variational Fair Autoencoder by Christos Louizos, Kevin Swersky, Yujia Li, Max Welling, Richard Zemel
12:10 12:30 oral A note on the evaluation of generative models by Lucas Theis, Aäron van den Oord, Matthias Bethge
12:30 14:00 lunch On your own
14:00 17:00 posters Workshop Track Posters (May 3rd) - Los Rosales Grand Salon C,D,E and Garita & Cariba Salons [Sponsored by Google]
17:30 19:00 dinner San Cristobal Ballroom [Sponsored by Intel]
May 4 7:30 9:00 breakfast Las Olas Terrace
9:00 12:30 Oral Session - Los Rosales Grand Salon A&B
9:00 9:30 town hall ICLR town hall meeting (open discussion)
9:30 9:40 break
9:40 10:00 oral Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding by Song Han, Huizi Mao, Bill Dally (Best Paper Award Recipient)
10:00 10:20 oral Neural Networks with Few Multiplications by Zhouhan Lin, Matthieu Courbariaux, Roland Memisevic, Yoshua Bengio
10:20 10:50 coffee break
10:50 11:30 keynote Raquel Urtasun (University of Toronto): Incorporating Structure in Deep Learning
11:30 11:50 oral Order-Embeddings of Images and Language by Ivan Vendrov, Ryan Kiros, Sanja Fidler, Raquel Urtasun
11:50 12:10 oral Generating Images from Captions with Attention by Elman Mansimov, Emilio Parisotto, Jimmy Ba, Ruslan Salakhutdinov
12:10 12:30 oral Density Modeling of Images using a Generalized Normalization Transformation by Johannes Ballé, Valero Laparra, Eero Simoncelli
12:30 14:00 lunch On your own
14:00 17:00 posters Conference Track Posters - Los Rosales Grand Salon C,D,E and Garita & Cariba Salons [Sponsored by Twitter]
17:30 19:00 dinner San Cristobal Ballroom

Keynote Talks

Sergey Levine

Deep Robotic Learning

The problem of building an autonomous robot has traditionally been viewed as one of integration: connecting together modular components, each one designed to handle some portion of the perception and decision making process. For example, a vision system might be connected to a planner that might in turn provide commands to a low-level controller that drives the robot's motors. In this talk, I will discuss how ideas from deep learning can allow us to build robotic control mechanisms that combine both perception and control into a single system. This system can then be trained end-to-end on the task at hand. I will show how this end-to-end approach actually simplifies the perception and control problems, by allowing the perception and control mechanisms to adapt to one another and to the task. I will also present some recent work on scaling up deep robotic learning on a cluster consisting of multiple robotic arms, and demonstrate results for learning grasping strategies that involve continuous feedback and hand-eye coordination using deep convolutional neural networks.

BIO: Sergey Levine is an assistant professor at the University of Washington. His research focuses on robotics and machine learning. In his PhD thesis, he developed a novel guided policy search algorithm for learning complex neural network control policies, which was later applied to enable a range of robotic tasks, including end-to-end training of policies for perception and control. He has also developed algorithms for learning from demonstration, inverse reinforcement learning, efficient training of stochastic neural networks, computer vision, and data-driven character animation.

Chris Dyer

Should Model Architecture Reflect Linguistic Structure?

Sequential recurrent neural networks (RNNs) over finite alphabets are remarkably effective models of natural language. RNNs now obtain language modeling results that substantially improve over long-standing state-of-the-art baselines, as well as in various conditional language modeling tasks such as machine translation, image caption generation, and dialogue generation. Despite these impressive results, such models are a priori inappropriate models of language. One point of criticism is that language users create and understand new words all the time, challenging the finite vocabulary assumption. A second is that relationships among words are computed in terms of latent nested structures rather than sequential surface order (Chomsky, 1957; Everaert, Huybregts, Chomsky, Berwick, and Bolhuis, 2015).

In this talk I discuss two models that explore the hypothesis that more (a priori) appropriate models of language will lead to better performance on real-world language processing tasks. The first composes sub word units (bytes, characters, or morphemes) into lexical representations, enabling more naturalistic interpretation and generation of novel word forms. The second, which we call recurrent neural network grammars (RNNGs), is a new generative model of sentences that explicitly models nested, hierarchical relationships among words and phrases. RNNGs operate via a recursive syntactic process reminiscent of probabilistic context-free grammar generation, but decisions are parameterized using RNNs that condition on the entire (top-down, left-to-right) syntactic derivation history, greatly relaxing context-free independence assumptions. Experimental results show that RNNGs obtain better results in generating language than models that don’t exploit linguistic structures.

BIO: Chris Dyer is an assistant professor in the Language Technologies Institute and Machine Learning Department at Carnegie Mellon University. He obtained his PhD in Linguistics at the University of Maryland under Philip Resnik in 2010. His work has been nominated for—and occasionally received—best paper awards at EMNLP, NAACL, and ACL.

Anima Anandkumar

Guaranteed Non-convex Learning Algorithms through Tensor Factorization

Modern machine learning involves massive datasets of text, images, videos, biological data, and so on. Most learning tasks can be framed as optimization problems which turn out to be non-convex and NP-hard to solve. This hardness barrier can be overcome by: (i) focusing on conditions which make learning tractable, (ii) replacing the given optimization objective with better behaved ones, and (iii) exploiting non-obvious connections that abound in learning problems.

I will discuss the above in the context of: (i) unsupervised learning of latent variable models and (ii) training multi-layer neural networks, through a novel framework involving spectral decomposition of moment matrices and tensors. Tensors are rich structures that can encode higher order relationships in data. Despite being non-convex, tensor decomposition can be solved optimally using simple iterative algorithms under mild conditions. In practice, tensor methods yield enormous gains both in running times and learning accuracy over traditional methods for training probabilistic models such as variational inference. These positive results demonstrate that many challenging learning tasks can be solved efficiently, both in theory and in practice.

BIO: Anima Anandkumar is a faculty at the EECS Dept. at U.C.Irvine since August 2010. Her research interests are in the areas of large-scale machine learning, non-convex optimization and high-dimensional statistics. In particular, she has been spearheading the development and analysis of tensor algorithms for a variety of learning problems. She is the recipient of the Alfred. P. Sloan Fellowship, Microsoft Faculty Fellowship, Google research award, ARO and AFOSR Young Investigator Awards, NSF CAREER Award, Early Career Excellence in Research Award at UCI, Best Thesis Award from the ACM SIGMETRICS society, IBM Fran Allen PhD fellowship, and best paper awards from the ACM SIGMETRICS and IEEE Signal Processing societies. She received her B.Tech in Electrical Engineering from IIT Madras in 2004 and her PhD from Cornell University in 2009. She was a postdoctoral researcher at MIT from 2009 to 2010, and a visiting faculty at Microsoft Research New England in 2012 and 2014. Anima

Neil Lawrence

Beyond Backpropagation: Uncertainty Propagation

Deep learning is founded on composable functions that are structured to capture regularities in data and can have their parameters optimized by backpropagation (differentiation via the chain rule). Their recent success is founded on the increased availability of data and computational power. However, they are not very data efficient. In low data regimes parameters are not well determined and severe overfitting can occur. The solution is to explicitly handle the indeterminacy by converting it to parameter uncertainty and propagating it through the model. Uncertainty propagation is more involved than backpropagation because it involves convolving the composite functions with probability distributions and integration is more challenging than differentiation.

We will present one approach to fitting such models using Gaussian processes. The resulting models perform very well in both supervised and unsupervised learning on small data sets. The remaining challenge is to scale the algorithms to much larger data.

BIO: Neil Lawrence is Professor of Machine Learning at the University of Sheffield. His expertise is in probabilistic modelling with a particular focus on Gaussian processes and a strong interest in bridging the worlds of mechanistic and empirical models.

Raquel Urtasun

Title: Incorporating Structure in Deep Learning

Deep learning algorithms attempt to model high-level abstractions of the data using architectures composed of multiple non-linear transformations. A multiplicity of variants have been proposed and shown to be extremely successful in a wide variety of applications including computer vision, speech recognition as well as natural language processing. In this talk I’ll show how to make these representations more powerful by exploiting structure in the outputs, the loss function as well as in the learned embeddings.

Many problems in real-world applications involve predicting several random variables that are statistically related. Graphical models have been typically employed to represent and exploit the output dependencies. However, most current learning algorithms assume that the models are log linear in the parameters. In the first part of the talk I’ll show a variety of algorithms that can learn arbitrary functions while exploiting the output dependencies, unifying deep learning and graphical models.

Supervised training of deep neural nets typically relies on minimizing cross-entropy. However, in many domains, we are interested in performing well on metrics specific to the application domain. In the second part of the talk I’ll show a direct loss minimization approach to train deep neural networks, which provably minimizes the task loss. This is often non-trivial, since these loss functions are neither smooth nor decomposable and thus are not amenable to optimization with standard gradient-based methods. I’ll demonstrate the applicability of this general framework in the context of maximizing average precession, a structured loss commonly used to evaluate ranking problems.

Deep learning has become a very popular approach to learn word, sentence and/or image embeddings. Neural embeddings have shown great performance in tasks such as image captioning, machine translation and paraphrasing. In the last part of my talk I’ll show how to exploit the partial order structure of the visual semantic hierarchy over words, sentences and images to learn order embeddings. I’ll demonstrate the utility of these new representations for hypernym prediction and image-caption retrieval.

BIO: Raquel Urtasun is an Assistant Professor in the Department of Computer Science at the University of Toronto and a Canada Research Chair in Machine Learning and Computer Vision. Prior to this, she was an Assistant Professor at the Toyota Technological Institute at Chicago (TTIC), an academic computer science institute affiliated with the University of Chicago. She received her Ph.D. degree from the Computer Science department at Ecole Polytechnique Federal de Lausanne (EPFL) in 2006 and did her postdoc at MIT and UC Berkeley. Her research interests include machine learning, computer vision and robotics. Her recent work involves perception algorithms for self-driving cars, deep structured models and exploring problems at the intersection of vision and language. She is a recipient of a Ministry of Education and Innovation Early Researcher Award, two Google Faculty Research Awards, a Connaught New Researcher Award and a Best Paper Runner up Prize awarded at the Conference on Computer Vision and Pattern Recognition (CVPR). She is also Program Chair of CVPR 2018, an Editor of the International Journal in Computer Vision (IJCV) and has served as Area Chair of multiple machine learning and vision conferences (i.e., NIPS, UAI, ICML, ICLR, CVPR, ECCV, ICCV).

Best Paper Awards

This year, the program committee has decided to grant two Best Paper Awards to papers that were singled out for their impressive and original scientific contributions.

The recipients of a Best Paper Award for ICLR 2016 are:

The selection was made by the program chairs and was informed by the feedback from the reviewers and area chairs.

Workshop Track Posters (May 2nd)

  1. Deep Motif: Visualizing Genomic Sequence Classifications
    Jack Lanchantin, Ritambhara Singh, Zeming Lin, & Yanjun Qi
  2. Lookahead Convolution Layer for Unidirectional Recurrent Neural Networks
    Chong Wang, Dani Yogatama, Adam Coates, Tony Han, Awni Hannun, Bo Xiao
  3. Hardware-Oriented Approximation of Convolutional Neural Networks
    Philipp Gysel, Mohammad Motamedi, Soheil Ghiasi
  4. Neurogenic Deep Learning
    Timothy J. Draelos, Nadine E. Miner, Jonathan A. Cox, Christopher C. Lamb, Conrad D. James, James B. Aimone
  5. Neural Network Training Variations in Speech and Subsequent Performance Evaluation
    Ewout van den Berg, Bhuvana Ramabhadran, Michael Picheny
  6. Neural Variational Random Field Learning
    Volodymyr Kuleshov, Stefano Ermon
  7. Improving Variational Inference with Inverse Autoregressive Flow
    Diederik P. Kingma, Tim Salimans, Max Welling
  8. Learning Genomic Representations to Predict Clinical Outcomes in Cancer
    Safoora Yousefi, Congzheng Song, Nelson Nauata, Lee Cooper
  9. Fixed Point Quantization of Deep Convolutional Networks
    Darryl D. Lin, Sachin S. Talathi, V. Sreekanth Annapureddy
  10. Understanding Visual Concepts with Continuation Learning
    William F. Whitney, Michael Chang, Tejas Kulkarni, Joshua B. Tenenbaum
  11. Input-Convex Deep Networks
    Brandon Amos, J. Zico Kolter
    (moved to May 3rd)
  12. Learning to SMILE(S)
    Stanisł‚aw Jastrzębski, Damian Leśniak, Wojciech Marian Czarnecki
  13. Learning Retinal Tiling in a Model of Visual Attention
    Brian Cheung, Eric Weiss, Bruno Olshausen
  14. Hardware-Friendly Convolutional Neural Network with Even-Number Filter Size
    Song Yao, Song Han, Kaiyuan Guo, Jianqiao Wangni, Yu Wang
  15. Do Deep Convolutional Nets Really Need to be Deep (Or Even Convolutional)?
    Gregor Urban, Krzysztof J. Geras, Samira Ebrahimi Kahou, Ozlem Aslan Shengjie Wang, Rich Caruana, Abdelrahman Mohamed, Matthai Philipose, Matt Richardson
  16. Generative Adversarial Metric
    Daniel Jiwoong Im, Chris Dongjoo Kim, Hui Jiang, Roland Memisevic
  17. Revise Saturated Activation Functions
    Bing Xu, Ruitong Huang, Mu Li
  18. Multi-layer Representation Learning for Medical Concepts
    Edward Choi, Mohammad Taha Bahadori, Jimeng Sun, Elizabeth Searles, Catherine Coffey
  19. Alternative structures for character-level RNNs
    Piotr Bojanowski, Armand Joulin, Tomas Mikolov
  20. Revisiting Distributed Synchronous SGD
    Jianmin Chen, Rajat Monga, Samy Bengio, Rafal Jozefowicz
  21. A Differentiable Transition Between Additive and Multiplicative Neurons
    Wiebke Koepp, Patrick van der Smagt, Sebastian Urban
  22. Deep Autoresolution Networks
    Gabriel Pereyra, Christian Szegedy
  23. Robust Convolutional Neural Networks under Adversarial Noise
    Jonghoon Jin, Aysegul Dundar, Eugenio Culurciello
  24. Resnet in Resnet: Generalizing Residual Architectures
    Sasha Targ, Diogo Almeida, Kevin Lyman
  25. Doctor AI: Predicting Clinical Events via Recurrent Neural Networks
    Edward Choi, Mohammad Taha Bahadori, Andy Schuetz, Walter F. Stewart, Joshua C. Denny, Bradley A. Malin, Jimeng Sun
  26. On-the-fly Network Pruning for Object Detection
    Marc Masana, Joost van de Weijer, Andrew D. Bagdanov
  27. Rectified Factor Networks for Biclustering
    Djork-Arné Clevert, Thomas Unterthiner, Sepp Hochreiter
  28. Persistent RNNs: Stashing Weights on Chip
    Greg Diamos, Shubho Sengupta, Bryan Catanzaro, Mike Chrzanowski, Adam Coates, Erich Elsen, Jesse Engel, Awni Hannun, Sanjeev Satheesh
  29. Scale Normalization
    Henry Z Lo, Kevin Amaral, Wei Ding
  30. Guided Sequence-to-Sequence Learning with External Rule Memory
    Jiatao Gu, Baotian Hu, Zhengdong Lu, Hang Li, Victor O.K. Li
  31. Neural Text Understanding with Attention Sum Reader
    Rudolf Kadlec, Martin Schmid, Ondř™ej Bajgar, Jan Kleindienst
  32. Variational Inference for On-line Anomaly Detection in High-Dimensional Time Series
    Maximilian Sölch, Justin Bayer, Marvin Ludersdorfer, Patrick van der Smagt
  33. Sequence-to-Sequence RNNs for Text Summarization
    Ramesh Nallapati, Bing Xiang, Bowen Zhou
  34. Neural Generative Question Answering
    Jun Yin, Xin Jiang, Zhengdong Lu, Lifeng Shang, Hang Li, Xiaoming Li
  35. Neural Enquirer: Learning to Query Tables in Natural Language
    Pengcheng Yin, Zhengdong Lu, Hang Li, Ben Kao
  36. End to end speech recognition in English and Mandarin
    Dario Amodei, Rishita Anubhai, Eric Battenberg, Carl Case, Jared Casper, Bryan Catanzaro, Jingdong Chen, Mike Chrzanowski, Adam Coates, Greg Diamos, Erich Elsen, Jesse Engel, Linxi Fan, Christopher Fougner, Tony Han, Awni Hannun, Billy Jun, Patrick LeGresley, Libby Lin, Sharan Narang, Andrew Ng, Sherjil Ozair, Ryan Prenger, Jonathan Raiman, Sanjeev Satheesh, David Seetapun, Shubho Sengupta, Yi Wang, Zhiqian Wang, Chong Wang, Bo Xiao, Dani Yogatama, Jun Zhan, Zhenyao Zhu
  37. Lessons from the Rademacher Complexity for Deep Learning
    Jure Sokolic, Raja Giryes, Guillermo Sapiro, Miguel R. D. Rodrigues
  38. Coverage-based Neural Machine Translation
    Zhaopeng Tu, Zhengdong Lu, Yang Liu, Xiaohua Liu, Hang Li
  39. Learning stable representations in a changing world with on-line t-SNE: proof of concept in the songbird
    Stéphane Deny, Emily Mackevicius, Tatsuo Okubo, Gordon Berman, Joshua Shaevitz, Michale Fee

Workshop Track Posters (May 3rd)

  1. Comparative Study of Caffe, Neon, Theano, and Torch for Deep Learning
    Soheil Bahrampour, Naveen Ramakrishnan, Lukas Schott, Mohak Shah
  2. Action Recognition using Visual Attention
    Shikhar Sharma, Ryan Kiros, Ruslan Salakhutdinov
  3. Visualizing and Understanding Recurrent Networks
    Andrej Karpathy, Justin Johnson, Li Fei-Fei
  4. Learning visual groups from co-occurrences in space and time
    Phillip Isola, Daniel Zoran, Dilip Krishnan, Edward H. Adelson
  5. Spatio-Temporal Video Autoencoder with Differentiable Memory
    Viorica Patraucean, Ankur Handa, Roberto Cipolla
  6. Task Loss Estimation for Structured Prediction
    Dzmitry Bahdanau, Dmiriy Serdyuk, Philémon Brakel, Nan Rosemary Ke, Jan Chorowski, Aaron Courville, Yoshua Bengio
  7. Conditional computation in neural networks for faster models
    Emmanuel Bengio, Pierre-Luc Bacon, Joelle Pineau, Doina Precup
  8. Bidirectional Helmholtz Machines
    Jorg Bornschein, Samira Shabanian, Asja Fischer, Yoshua Bengio
  9. A Controller-Recognizer Framework: How Necessary is Recogntion for Control?
    Marcin Moczulski, Kelvin Xu, Aaron Courville, Kyunghyun Cho
  10. Nonparametric Canonical Correlation Analysis
    Tomer Michaeli, Weiran Wang, Karen Livescu
  11. Document Context Language Models
    Yangfeng Ji, Trevor Cohn, Lingpeng Kong, Chris Dyer, Jacob Eisenstein
  12. Convolutional Clustering for Unsupervised Learning
    Aysegul Dundar, Jonghoon Jin, and Eugenio Culurciello
  13. ParseNet: Looking Wider to See Better
    Wei Liu, Andrew Rabinovich, Alexander C. Berg
  14. Binding via Reconstruction Clustering
    Klaus Greff, Rupesh Srivastava, Jürgen Schmidhuber
  15. Dynamic Capacity Networks
    Amjad Almahairi, Nicolas Ballas, Tim Cooijmans, Yin Zheng, Hugo Larochelle, Aaron Courville
  16. Learning Representations of Affect from Speech
    Sayan Ghosh, Eugene Laksana, Louis-Philippe Morency, Stefan Scherer
  17. Neural Variational Inference for Text Processing
    Yishu Miao, Lei Yu, Phil Blunsom
  18. A Deep Memory-based Architecture for Sequence-to-Sequence Learning
    Fandong Meng, Zhengdong Lu, Zhaopeng Tu, Hang Li, Qun Liu
  19. Deconstructing the Ladder Network Architecture
    Mohammad Pezeshki, Linxi Fan, Philemon Brakel, Aaron Courville, Yoshua Bengio
  20. LSTM-based Deep Learning Models for non-factoid answer selection
    Ming Tan, Cicero dos Santos, Bing Xiang, Bowen Zhou
  21. Using Deep Learning to Predict Demographics from Mobile Phone Metadata
    Bjarke Felbo, Pål Sundsøy, Alex 'Sandy' Pentland, Sune Lehmann, Yves-Alexandre de Montjoye
  22. Convolutional Models for Joint Object Categorization and Pose Estimation
    Mohamed Elhoseiny, Tarek El-Gaaly, Amr Bakry, Ahmed Elgammal
  23. Fine-grained pose prediction, normalization, and recognition
    Ning Zhang, Evan Shelhamer, Yang Gao, Trevor Darrell
  24. Unitary Evolution Recurrent Neural Networks
    Martin Arjovsky, Amar Shah, Yoshua Bengio
  25. Learning Dense Convolutional Embeddings for Semantic Segmentation
    Adam W. Harley, Konstantinos G. Derpanis, Iasonas Kokkinos
  26. Generating Sentences from a Continuous Space
    Samuel R. Bowman, Luke Vilnis, Oriol Vinyals, Andrew M. Dai, Rafal Jozefowicz, Samy Bengio
  27. Stacked What-Where Auto-encoders
    Junbo Zhao, Michael Mathieu, Ross Goroshin, Yann LeCun
  28. Blending LSTMs into CNNs
    Krzysztof J. Geras, Abdel-rahman Mohamed, Rich Caruana, Gregor Urban, Shengjie Wang, Ozlem Aslan, Matthai Philipose, Matthew Richardson, Charles Sutton
  29. Empirical performance upper bounds for image and video captioning
    Li Yao, Nicolas Ballas, Kyunghyun Cho, John R. Smith, Yoshua Bengio
  30. Adversarial Autoencoders
    Alireza Makhzani, Jonathon Shlens, Navdeep Jaitly, Ian Goodfellow
  31. Deep Reinforcement Learning with an Action Space Defined by Natural Language
    Ji He, Jianshu Chen, Xiaodong He, Jianfeng Gao, Lihong Li, Li Deng, Mari Ostendorf
  32. Variance Reduction in SGD by Distributed Importance Sampling
    Guillaume Alain, Alex Lamb, Chinnadhurai Sankar, Aaron Courville, Yoshua Bengio
  33. Adding Gradient Noise Improves Learning for Very Deep Networks
    Arvind Neelakantan, Luke Vilnis, Quoc V. Le, Ilya Sutskever, Lukasz Kaiser, Karol Kurach, James Martens
  34. Black Box Variational Inference for State Space Models
    Evan Archer, Il Memming Park, Lars Buesing, John Cunningham, Liam Paninski
  35. Input-Convex Deep Networks
    Brandon Amos, J. Zico Kolter

Accepted Papers (Conference Track)

  1. The Variational Fair Autoencoder
    Christos Louizos, Kevin Swersky, Yujia Li, Max Welling, Richard Zemel
  2. A note on the evaluation of generative models
    Lucas Theis, Aäron van den Oord, Matthias Bethge
  3. Learning to Diagnose with LSTM Recurrent Neural Networks
    Zachary Lipton, David Kale, Charles Elkan, Randall Wetzel
  4. Prioritized Experience Replay
    Tom Schaul, John Quan, Ioannis Antonoglou, David Silver
  5. Importance Weighted Autoencoders
    Yuri Burda, Ruslan Salakhutdinov, Roger Grosse
  6. Variationally Auto-Encoded Deep Gaussian Processes
    Zhenwen Dai, Andreas Damianou, Javier Gonzalez, Neil Lawrence
  7. Training Convolutional Neural Networks with Low-rank Filters for Efficient Image Classification
    Yani Ioannou, Duncan Robertson, Jamie Shotton, roberto Cipolla, Antonio Criminisi, Jamie Shotton
  8. Neural Networks with Few Multiplications
    Zhouhan Lin, Matthieu Courbariaux, Roland Memisevic, Yoshua Bengio
  9. Reducing Overfitting in Deep Networks by Decorrelating Representations
    Michael Cogswell, Faruk Ahmed, Ross Girshick, Larry Zitnick, Dhruv Batra
  10. Generating Images from Captions with Attention
    Elman Mansimov, Emilio Parisotto, Jimmy Ba, Ruslan Salakhutdinov
  11. Reasoning about Entailment with Neural Attention
    Tim Rocktäschel, Edward Grefenstette, Karl Moritz Hermann, Tomáš Kočiský, Phil Blunsom
  12. Convolutional Neural Networks With Low-rank Regularization
    Cheng Tai, Tong Xiao, Yi Zhang, Xiaogang Wang, Weinan E
  13. Unifying distillation and privileged information
    David Lopez-Paz, Leon Bottou, Bernhard Schölkopf, Vladimir Vapnik
  14. All you need is a good init [code]
    Dmytro Mishkin, Jiri Matas
  15. Bayesian Representation Learning with Oracle Constraints
    Theofanis Karaletsos, Serge Belongie, Gunnar Rätsch
  16. Neural Programmer: Inducing Latent Programs with Gradient Descent
    Arvind Neelakantan, Quoc Le, Ilya Sutskever
  17. Towards Universal Paraphrastic Sentence Embeddings [code]
    John Wieting, Mohit Bansal, Kevin Gimpel, Karen Livescu
  18. Regularizing RNNs by Stabilizing Activations
    David Krueger, Roland Memisevic
  19. SparkNet: Training Deep Networks in Spark
    Philipp Moritz, Robert Nishihara, Ion Stoica, Michael Jordan
  20. MuProp: Unbiased Backpropagation For Stochastic Neural Networks
    Shixiang Gu, Sergey Levine, Ilya Sutskever, Andriy Mnih
  21. Diversity Networks
    Zelda Mariet, Suvrit Sra
  22. Learning VIsual Predictive Models of Physics for Playing Billiards
    Katerina Fragkiadaki, Pulkit Agrawal, Sergey Levine, Jitendra Malik
  23. Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks [code] [data]
    Jason Weston, Antoine Bordes, Sumit Chopra, Sasha Rush, Bart van Merrienboer, Armand Joulin, Tomas Mikolov
  24. Evaluating Prerequisite Qualities for Learning End-to-end Dialog Systems [data]
    Jesse Dodge, Andreea Gane, Xiang Zhang, Antoine Bordes, Sumit Chopra, Alexander Miller, Arthur Szlam, Jason Weston
  25. Distributional Smoothing with Virtual Adversarial Training [code]
    Takeru Miyato, Shin-ichi Maeda, Masanori Koyama, Ken Nakae, Shin Ishii
  26. Multi-task Sequence to Sequence Learning
    Minh-Thang Luong, Quoc Le, Ilya Sutskever, Oriol Vinyals, Lukasz Kaiser
  27. A Test of Relative Similarity for Model Selection in Generative Models
    Eugene Belilovsky, Wacha Bounliphone, Matthew Blaschko, Ioannis Antonoglou, Arthur Gretton
  28. Compression of Deep Convolutional Neural Networks for Fast and Low Power Mobile Applications
    Yong-Deok Kim, Eunhyeok Park, Sungjoo Yoo, Taelim Choi, Lu Yang, Dongjun Shin
  29. Neural Programmer-Interpreters
    Scott Reed, Nando de Freitas
  30. Session-based recommendations with recurrent neural networks [code]
    Balázs Hidasi, Alexandros Karatzoglou, Linas Baltrunas, Domonkos Tikk
  31. Continuous control with deep reinforcement learning
    Timothy Lillicrap, Jonathan Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, Daan Wierstra
  32. Recurrent Gaussian Processes
    César Lincoln Mattos, Zhenwen Dai, Andreas Damianou, Jeremy Forth, Guilherme Barreto, Neil Lawrence
  33. Auxiliary Image Regularization for Deep CNNs with Noisy Labels
    Samaneh Azadi, Jiashi Feng, Stefanie Jegelka, Trevor Darrell
  34. Convergent Learning: Do different neural networks learn the same representations?
    Yixuan Li, Jason Yosinski, Jeff Clune, Hod Lipson, John Hopcroft
  35. Policy Distillation
    Andrei Rusu, Sergio Gomez, Caglar Gulcehre, Guillaume Desjardins, James Kirkpatrick, Razvan Pascanu, Volodymyr Mnih, Koray Kavukcuoglu, Raia Hadsell
  36. Neural Random-Access Machines
    Karol Kurach, Marcin Andrychowicz, Ilya Sutskever
  37. Gated Graph Sequence Neural Networks
    Yujia Li, Daniel Tarlow, Marc Brockschmidt, Richard Zemel, CIFAR
  38. Metric Learning with Adaptive Density Discrimination
    Oren Rippel, Manohar Paluri, Piotr Dollar, Lubomir Bourdev
  39. Censoring Representations with an Adversary
    Harrison Edwards, Amos Storkey
  40. Order-Embeddings of Images and Language [code]
    Ivan Vendrov, Ryan Kiros, Sanja Fidler, Raquel Urtasun
  41. Variable Rate Image Compression with Recurrent Neural Networks
    George Toderici, Sean O'Malley, Damien Vincent, Sung Jin Hwang, Michele Covell, Shumeet Baluja, Rahul Sukthankar, David Minnen
  42. Delving Deeper into Convolutional Networks for Learning Video Representations
    Nicolas Ballas, Li Yao, Pal Chris, Aaron Courville
  43. Data-dependent initializations of Convolutional Neural Networks [code]
    Philipp Kraehenbuehl, Carl Doersch, Jeff Donahue, Trevor Darrell
  44. Order Matters: Sequence to sequence for sets
    Oriol Vinyals, Samy Bengio, Manjunath Kudlur
  45. High-Dimensional Continuous Control Using Generalized Advantage Estimation
    John Schulman, Philipp Moritz, Sergey Levine, Michael Jordan, Pieter Abbeel
  46. BlackOut: Speeding up Recurrent Neural Network Language Models With Very Large Vocabularies [code]
    Shihao Ji, Swaminathan Vishwanathan, Nadathur Satish, Michael Anderson, Pradeep Dubey
  47. Deep Multi Scale Video Prediction Beyond Mean Square Error
    Michael Mathieu, camille couprie, Yann Lecun
  48. Grid Long Short-Term Memory
    Nal Kalchbrenner, Alex Graves, Ivo Danihelka
  49. Net2Net: Accelerating Learning via Knowledge Transfer
    Tianqi Chen, Ian Goodfellow, Jon Shlens
  50. Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs)
    Djork-Arné Clevert, Thomas Unterthiner, Sepp Hochreiter
  51. Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning
    Emilio Parisotto, Jimmy Ba, Ruslan Salakhutdinov
  52. Segmental Recurrent Neural Networks
    Lingpeng Kong, Chris Dyer, Noah Smith
  53. Deep Linear Discriminant Analysis [code]
    Matthias Dorfer, Rainer Kelz, Gerhard Widmer
  54. Learning Representations from EEG with Deep Recurrent-Convolutional Neural Networks [code]
    Pouya Bashivan, Irina Rish, Mohammed Yeasin, Noel Codella
  55. Digging Deep into the layers of CNNs: In Search of How CNNs Achieve View Invariance
    Amr Bakry, Mohamed Elhoseiny, Tarek El-Gaaly, Ahmed Elgammal
  56. Data-Dependent Path Normalization in Neural Networks
    Behnam Neyshabur, Ryota Tomioka, Ruslan Salakhutdinov, Nathan Srebro
  57. Reasoning in Vector Space: An Exploratory Study of Question Answering
    Moontae Lee, Xiaodong He, Wen-tau Yih, Jianfeng Gao, Li Deng, Paul Smolensky
  58. Neural GPUs Learn Algorithms [code] [video]
    Lukasz Kaiser, Ilya Sutskever
  59. ACDC: A Structured Efficient Linear Layer
    Marcin Moczulski, Misha Denil, Jeremy Appleyard, Nando de Freitas
  60. Density Modeling of Images using a Generalized Normalization Transformation
    Johannes Ballé, Valero Laparra, Eero Simoncelli
  61. Adversarial Manipulation of Deep Representations [code]
    Sara Sabour, Yanshuai Cao, Fartash Faghri, David Fleet
  62. Geodesics of learned representations
    Olivier Hénaff, Eero Simoncelli
  63. Sequence Level Training with Recurrent Neural Networks
    Marc'Aurelio Ranzato, Sumit Chopra, Michael Auli, Wojciech Zaremba
  64. Super-resolution with deep convolutional sufficient statistics
    Joan Bruna, Pablo Sprechmann, Yann Lecun
  65. Variational Gaussian Process
    Dustin Tran, Rajesh Ranganath, David Blei

Presentation Guidelines

Conference Orals

Talks should be no longer than 17 minutes, leaving 2-3 minutes for questions by the audience. The author who will be giving the talk must find the oral session chair in advance, to test the use of his/her personal laptop for presenting the slides.

Talks scheduled before the morning coffee break should do a laptop test before the morning session starts, while other talks can perform their tests during the coffee break.

Poster Presentations

The poster boards are 4 ft. high by 8 ft. wide. Poster presenters are encouraged to put up their posters as early as the day's morning coffee break (10:20 to 10:50).

Each poster is assigned a number, shown above. Presenters should use the poster board corresponding to the number for their work.

Once the poster session is over, presenters have until the end of the day to take off their posters from their assigned poster boards.