ICLR 2016

Basic Information

When

May 2 - 4, 2016

Where

We recommend landing at the Luis Munoz Marin International Airport. The best way to get from this airport to the hotel is by taxi, which is about a 15 minute drive and costs around $20.

Important: Local transmission of the Zika virus has been reported in Puerto Rico. While in most cases the symptoms of Zika are mild, women who are pregnant and women and men who may conceive a child in the near future have more reason to be concerned about the virus.

The US Centers for Disease Control have reliable and current information on the Zika virus here:

http://www.cdc.gov/zika/

We recommend that anyone with concerns about Zika virus review this information.

Call for Papers (Main Track)

For instructions on the submission process, go here.

Call for Papers (Workshop Track)

For instructions on the submission process, go here.

Registration and Hotel Reservations

To register and make hotel reservations, go here.

On the day of the meeting, come pick up your badge at the Grand Salon Los Rosales, which is right next to the hotel (ask the staff of the hotel for more directions).

Conference Wireless Access

network: hmeeting
password: iclr16

Video recordings of talks

Talks are now available on videolectures.net:

http://videolectures.net/iclr2016_san_juan/

Discussion, Forum, Pictures on the ICLR Facebook Page

https://www.facebook.com/iclr.cc

Feedback Poll

We've created a poll to gather feedback and suggestions on ICLR:

https://www.facebook.com/events/1737067246550684/permalink/1737070989883643/

Please participate by upvoting the suggestions you like or adding your own suggestions.

Committee

Senior Program Chair

Hugo Larochelle, Twitter and Université de Sherbrooke

Program Chairs

Samy Bengio, Google
Brian Kingsbury, IBM Watson Group

General Chairs

Yoshua Bengio, Université de Montreal
Yann LeCun, New York University and Facebook

Area Chairs

Ryan Adams, Twitter and Harvard
Antoine Bordes, Facebook
KyungHyun Cho, New York University
Adam Coates, Baidu
Aaron Courville, Université de Montréal
Trevor Darrell, University of California, Berkeley
Ian Goodfellow, Google
Roger Grosse, University of Toronto
Nicolas Le Roux, Criteo
Honglak Lee, University of Michigan
Julien Mairal, INRIA
Chris Manning, Stanford University
Roland Memisevic, Université de Montréal
Joelle Pineau, McGill University
John Platt, Google
Marc'Aurelio Ranzato, Facebook
Tara Sainath, Google
Ruslan Salakhutdinov, University of Toronto
Raquel Urtasun, University of Toronto

Contact

iclr2016.programchairs@gmail.com

Conference Schedule

Date	Start	End	Event	Details
May 2	7:30	8:50	breakfast	San Cristobal Ballroom [Sponsored by Baidu Research]
	8:50	12:30		Oral Session - Los Rosales Grand Salon A&B
	8:50	9:00	opening	Opening remarks
	9:00	9:40	keynote	Sergey Levine (University of Washington): Deep Robotic Learning
	9:40	10:00	oral	Neural Programmer-Interpreters by Scott Reed, Nando de Freitas (Best Paper Award Recipient)
	10:00	10:20	oral	Regularizing RNNs by Stabilizing Activations by David Krueger, Roland Memisevic
	10:20	10:50	coffee break
	10:50	11:30	keynote	Chris Dyer (CMU): Should Model Architecture Reflect Linguistic Structure?
	11:30	11:50	oral	BlackOut: Speeding up Recurrent Neural Network Language Models With Very Large Vocabularies by Shihao Ji, Swaminathan Vishwanathan, Nadathur Satish, Michael Anderson, Pradeep Dubey
	11:50	12:10	oral	The Goldilocks Principle: Reading Children's Books with Explicit Memory Representations by Felix Hill, Antoine Bordes, Sumit Chopra, Jason Weston
	12:10	12:30	oral	Towards Universal Paraphrastic Sentence Embeddings by John Wieting, Mohit Bansal, Kevin Gimpel, Karen Livescu
	12:30	14:00	lunch	On your own
	14:00	17:00	posters	Workshop Track Posters (May 2nd) - Los Rosales Grand Salon C,D,E and Garita & Cariba Salons [Sponsored by Facebook]
	17:30	19:00	dinner	San Cristobal Ballroom

May 3	7:30	9:00	breakfast	Las Olas Terrace [Sponsored by NVIDIA]
	9:00	12:30		Oral Session - Los Rosales Grand Salon A&B
	9:00	9:40	keynote	Anima Anandkumar (UC Irvine): Guaranteed Non-convex Learning Algorithms through Tensor Factorization
	9:40	10:00	oral	Convergent Learning: Do different neural networks learn the same representations? by Yixuan Li, Jason Yosinski, Jeff Clune, Hod Lipson, John Hopcroft
	10:00	10:20	oral	Net2Net: Accelerating Learning via Knowledge Transfer by Tianqi Chen, Ian Goodfellow, Jon Shlens
	10:20	10:50	coffee break
	10:50	11:30	keynote	Neil Lawrence (University of Sheffield): Beyond Backpropagation: Uncertainty Propagation
	11:30	11:50	oral	Variational Gaussian Process by Dustin Tran, Rajesh Ranganath, David Blei
	11:50	12:10	oral	The Variational Fair Autoencoder by Christos Louizos, Kevin Swersky, Yujia Li, Max Welling, Richard Zemel
	12:10	12:30	oral	A note on the evaluation of generative models by Lucas Theis, Aäron van den Oord, Matthias Bethge
	12:30	14:00	lunch	On your own
	14:00	17:00	posters	Workshop Track Posters (May 3rd) - Los Rosales Grand Salon C,D,E and Garita & Cariba Salons [Sponsored by Google]
	17:30	19:00	dinner	San Cristobal Ballroom [Sponsored by Intel]

May 4	7:30	9:00	breakfast	Las Olas Terrace
	9:00	12:30		Oral Session - Los Rosales Grand Salon A&B
	9:00	9:30	town hall	ICLR town hall meeting (open discussion)
	9:30	9:40	break
	9:40	10:00	oral	Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding by Song Han, Huizi Mao, Bill Dally (Best Paper Award Recipient)
	10:00	10:20	oral	Neural Networks with Few Multiplications by Zhouhan Lin, Matthieu Courbariaux, Roland Memisevic, Yoshua Bengio
	10:20	10:50	coffee break
	10:50	11:30	keynote	Raquel Urtasun (University of Toronto): Incorporating Structure in Deep Learning
	11:30	11:50	oral	Order-Embeddings of Images and Language by Ivan Vendrov, Ryan Kiros, Sanja Fidler, Raquel Urtasun
	11:50	12:10	oral	Generating Images from Captions with Attention by Elman Mansimov, Emilio Parisotto, Jimmy Ba, Ruslan Salakhutdinov
	12:10	12:30	oral	Density Modeling of Images using a Generalized Normalization Transformation by Johannes Ballé, Valero Laparra, Eero Simoncelli
	12:30	14:00	lunch	On your own
	14:00	17:00	posters	Conference Track Posters - Los Rosales Grand Salon C,D,E and Garita & Cariba Salons [Sponsored by Twitter]
	17:30	19:00	dinner	San Cristobal Ballroom

Keynote Talks

Sergey Levine

Deep Robotic Learning

The problem of building an autonomous robot has traditionally been viewed as one of integration: connecting together modular components, each one designed to handle some portion of the perception and decision making process. For example, a vision system might be connected to a planner that might in turn provide commands to a low-level controller that drives the robot's motors. In this talk, I will discuss how ideas from deep learning can allow us to build robotic control mechanisms that combine both perception and control into a single system. This system can then be trained end-to-end on the task at hand. I will show how this end-to-end approach actually simplifies the perception and control problems, by allowing the perception and control mechanisms to adapt to one another and to the task. I will also present some recent work on scaling up deep robotic learning on a cluster consisting of multiple robotic arms, and demonstrate results for learning grasping strategies that involve continuous feedback and hand-eye coordination using deep convolutional neural networks.

BIO: Sergey Levine is an assistant professor at the University of Washington. His research focuses on robotics and machine learning. In his PhD thesis, he developed a novel guided policy search algorithm for learning complex neural network control policies, which was later applied to enable a range of robotic tasks, including end-to-end training of policies for perception and control. He has also developed algorithms for learning from demonstration, inverse reinforcement learning, efficient training of stochastic neural networks, computer vision, and data-driven character animation.

Chris Dyer

Should Model Architecture Reflect Linguistic Structure?

Sequential recurrent neural networks (RNNs) over finite alphabets are remarkably effective models of natural language. RNNs now obtain language modeling results that substantially improve over long-standing state-of-the-art baselines, as well as in various conditional language modeling tasks such as machine translation, image caption generation, and dialogue generation. Despite these impressive results, such models are a priori inappropriate models of language. One point of criticism is that language users create and understand new words all the time, challenging the finite vocabulary assumption. A second is that relationships among words are computed in terms of latent nested structures rather than sequential surface order (Chomsky, 1957; Everaert, Huybregts, Chomsky, Berwick, and Bolhuis, 2015).

In this talk I discuss two models that explore the hypothesis that more (a priori) appropriate models of language will lead to better performance on real-world language processing tasks. The first composes sub word units (bytes, characters, or morphemes) into lexical representations, enabling more naturalistic interpretation and generation of novel word forms. The second, which we call recurrent neural network grammars (RNNGs), is a new generative model of sentences that explicitly models nested, hierarchical relationships among words and phrases. RNNGs operate via a recursive syntactic process reminiscent of probabilistic context-free grammar generation, but decisions are parameterized using RNNs that condition on the entire (top-down, left-to-right) syntactic derivation history, greatly relaxing context-free independence assumptions. Experimental results show that RNNGs obtain better results in generating language than models that don’t exploit linguistic structures.

BIO: Chris Dyer is an assistant professor in the Language Technologies Institute and Machine Learning Department at Carnegie Mellon University. He obtained his PhD in Linguistics at the University of Maryland under Philip Resnik in 2010. His work has been nominated for—and occasionally received—best paper awards at EMNLP, NAACL, and ACL.

Anima Anandkumar

Guaranteed Non-convex Learning Algorithms through Tensor Factorization

Modern machine learning involves massive datasets of text, images, videos, biological data, and so on. Most learning tasks can be framed as optimization problems which turn out to be non-convex and NP-hard to solve. This hardness barrier can be overcome by: (i) focusing on conditions which make learning tractable, (ii) replacing the given optimization objective with better behaved ones, and (iii) exploiting non-obvious connections that abound in learning problems.

I will discuss the above in the context of: (i) unsupervised learning of latent variable models and (ii) training multi-layer neural networks, through a novel framework involving spectral decomposition of moment matrices and tensors. Tensors are rich structures that can encode higher order relationships in data. Despite being non-convex, tensor decomposition can be solved optimally using simple iterative algorithms under mild conditions. In practice, tensor methods yield enormous gains both in running times and learning accuracy over traditional methods for training probabilistic models such as variational inference. These positive results demonstrate that many challenging learning tasks can be solved efficiently, both in theory and in practice.

BIO: Anima Anandkumar is a faculty at the EECS Dept. at U.C.Irvine since August 2010. Her research interests are in the areas of large-scale machine learning, non-convex optimization and high-dimensional statistics. In particular, she has been spearheading the development and analysis of tensor algorithms for a variety of learning problems. She is the recipient of the Alfred. P. Sloan Fellowship, Microsoft Faculty Fellowship, Google research award, ARO and AFOSR Young Investigator Awards, NSF CAREER Award, Early Career Excellence in Research Award at UCI, Best Thesis Award from the ACM SIGMETRICS society, IBM Fran Allen PhD fellowship, and best paper awards from the ACM SIGMETRICS and IEEE Signal Processing societies. She received her B.Tech in Electrical Engineering from IIT Madras in 2004 and her PhD from Cornell University in 2009. She was a postdoctoral researcher at MIT from 2009 to 2010, and a visiting faculty at Microsoft Research New England in 2012 and 2014. Anima

Neil Lawrence

Beyond Backpropagation: Uncertainty Propagation

Deep learning is founded on composable functions that are structured to capture regularities in data and can have their parameters optimized by backpropagation (differentiation via the chain rule). Their recent success is founded on the increased availability of data and computational power. However, they are not very data efficient. In low data regimes parameters are not well determined and severe overfitting can occur. The solution is to explicitly handle the indeterminacy by converting it to parameter uncertainty and propagating it through the model. Uncertainty propagation is more involved than backpropagation because it involves convolving the composite functions with probability distributions and integration is more challenging than differentiation.

We will present one approach to fitting such models using Gaussian processes. The resulting models perform very well in both supervised and unsupervised learning on small data sets. The remaining challenge is to scale the algorithms to much larger data.

BIO: Neil Lawrence is Professor of Machine Learning at the University of Sheffield. His expertise is in probabilistic modelling with a particular focus on Gaussian processes and a strong interest in bridging the worlds of mechanistic and empirical models.

Raquel Urtasun

Title: Incorporating Structure in Deep Learning

Deep learning algorithms attempt to model high-level abstractions of the data using architectures composed of multiple non-linear transformations. A multiplicity of variants have been proposed and shown to be extremely successful in a wide variety of applications including computer vision, speech recognition as well as natural language processing. In this talk I’ll show how to make these representations more powerful by exploiting structure in the outputs, the loss function as well as in the learned embeddings.

Many problems in real-world applications involve predicting several random variables that are statistically related. Graphical models have been typically employed to represent and exploit the output dependencies. However, most current learning algorithms assume that the models are log linear in the parameters. In the first part of the talk I’ll show a variety of algorithms that can learn arbitrary functions while exploiting the output dependencies, unifying deep learning and graphical models.

Supervised training of deep neural nets typically relies on minimizing cross-entropy. However, in many domains, we are interested in performing well on metrics specific to the application domain. In the second part of the talk I’ll show a direct loss minimization approach to train deep neural networks, which provably minimizes the task loss. This is often non-trivial, since these loss functions are neither smooth nor decomposable and thus are not amenable to optimization with standard gradient-based methods. I’ll demonstrate the applicability of this general framework in the context of maximizing average precession, a structured loss commonly used to evaluate ranking problems.

Deep learning has become a very popular approach to learn word, sentence and/or image embeddings. Neural embeddings have shown great performance in tasks such as image captioning, machine translation and paraphrasing. In the last part of my talk I’ll show how to exploit the partial order structure of the visual semantic hierarchy over words, sentences and images to learn order embeddings. I’ll demonstrate the utility of these new representations for hypernym prediction and image-caption retrieval.

BIO: Raquel Urtasun is an Assistant Professor in the Department of Computer Science at the University of Toronto and a Canada Research Chair in Machine Learning and Computer Vision. Prior to this, she was an Assistant Professor at the Toyota Technological Institute at Chicago (TTIC), an academic computer science institute affiliated with the University of Chicago. She received her Ph.D. degree from the Computer Science department at Ecole Polytechnique Federal de Lausanne (EPFL) in 2006 and did her postdoc at MIT and UC Berkeley. Her research interests include machine learning, computer vision and robotics. Her recent work involves perception algorithms for self-driving cars, deep structured models and exploring problems at the intersection of vision and language. She is a recipient of a Ministry of Education and Innovation Early Researcher Award, two Google Faculty Research Awards, a Connaught New Researcher Award and a Best Paper Runner up Prize awarded at the Conference on Computer Vision and Pattern Recognition (CVPR). She is also Program Chair of CVPR 2018, an Editor of the International Journal in Computer Vision (IJCV) and has served as Area Chair of multiple machine learning and vision conferences (i.e., NIPS, UAI, ICML, ICLR, CVPR, ECCV, ICCV).

Best Paper Awards

This year, the program committee has decided to grant two Best Paper Awards to papers that were singled out for their impressive and original scientific contributions.

The recipients of a Best Paper Award for ICLR 2016 are:

Neural Programmer-Interpreters
Scott Reed, Nando de Freitas
Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding
Song Han, Huizi Mao, Bill Dally

The selection was made by the program chairs and was informed by the feedback from the reviewers and area chairs.

Workshop Track Posters (May 2nd)

Deep Motif: Visualizing Genomic Sequence Classifications
Jack Lanchantin, Ritambhara Singh, Zeming Lin, & Yanjun Qi
Lookahead Convolution Layer for Unidirectional Recurrent Neural Networks
Chong Wang, Dani Yogatama, Adam Coates, Tony Han, Awni Hannun, Bo Xiao
Joint Stochastic Approximation Learning of Helmoltz Machines
HaotianXu, Zhijian Ou
A Minimalistic Approach to Sum-Product Network Learning for Real Applications
Viktoriya Krakovna, Moshe Looks
Hardware-Oriented Approximation of Convolutional Neural Networks
Philipp Gysel, Mohammad Motamedi, Soheil Ghiasi
Neurogenic Deep Learning
Timothy J. Draelos, Nadine E. Miner, Jonathan A. Cox, Christopher C. Lamb, Conrad D. James, James B. Aimone
Deep Bayesian Neural Nets as Deep Matrix Gaussian Processes
Christos Louizos, Max Welling
Neural Network Training Variations in Speech and Subsequent Performance Evaluation
Ewout van den Berg, Bhuvana Ramabhadran, Michael Picheny
Neural Variational Random Field Learning
Volodymyr Kuleshov, Stefano Ermon
Improving Variational Inference with Inverse Autoregressive Flow
Diederik P. Kingma, Tim Salimans, Max Welling
Learning Genomic Representations to Predict Clinical Outcomes in Cancer
Safoora Yousefi, Congzheng Song, Nelson Nauata, Lee Cooper
Understanding Very Deep Networks via Volume Conservation
Thomas Unterthiner, Sepp Hochreiter
Fixed Point Quantization of Deep Convolutional Networks
Darryl D. Lin, Sachin S. Talathi, V. Sreekanth Annapureddy
CMA-ES for Hyperparameter Optimization of Deep Neural Networks
Ilya Loshchilov, Frank Hutter
Understanding Visual Concepts with Continuation Learning
William F. Whitney, Michael Chang, Tejas Kulkarni, Joshua B. Tenenbaum
Input-Convex Deep Networks
Brandon Amos, J. Zico Kolter (moved to May 3rd)
Learning to SMILE(S)
Stanisław Jastrzębski, Damian Leśniak, Wojciech Marian Czarnecki
Learning Retinal Tiling in a Model of Visual Attention
Brian Cheung, Eric Weiss, Bruno Olshausen
Hardware-Friendly Convolutional Neural Network with Even-Number Filter Size
Song Yao, Song Han, Kaiyuan Guo, Jianqiao Wangni, Yu Wang
Do Deep Convolutional Nets Really Need to be Deep (Or Even Convolutional)?
Gregor Urban, Krzysztof J. Geras, Samira Ebrahimi Kahou, Ozlem Aslan Shengjie Wang, Rich Caruana, Abdelrahman Mohamed, Matthai Philipose, Matt Richardson
Generative Adversarial Metric
Daniel Jiwoong Im, Chris Dongjoo Kim, Hui Jiang, Roland Memisevic
Revise Saturated Activation Functions
Bing Xu, Ruitong Huang, Mu Li
Multi-layer Representation Learning for Medical Concepts
Edward Choi, Mohammad Taha Bahadori, Jimeng Sun, Elizabeth Searles, Catherine Coffey
Alternative structures for character-level RNNs
Piotr Bojanowski, Armand Joulin, Tomas Mikolov
Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning
Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke
Revisiting Distributed Synchronous SGD
Jianmin Chen, Rajat Monga, Samy Bengio, Rafal Jozefowicz
A Differentiable Transition Between Additive and Multiplicative Neurons
Wiebke Koepp, Patrick van der Smagt, Sebastian Urban
Deep Autoresolution Networks
Gabriel Pereyra, Christian Szegedy
Unsupervised Learning with Imbalanced Data via Structure Consolidation Latent Variable Model
Fariba Yousefi, Zhenwen Dai, Carl Henrik Ek, Neil Lawrence
Robust Convolutional Neural Networks under Adversarial Noise
Jonghoon Jin, Aysegul Dundar, Eugenio Culurciello
GradNets: Dynamic Interpolation Between Neural Architectures
Diogo Almeida, Nate Sauder
Resnet in Resnet: Generalizing Residual Architectures
Sasha Targ, Diogo Almeida, Kevin Lyman
Doctor AI: Predicting Clinical Events via Recurrent Neural Networks
Edward Choi, Mohammad Taha Bahadori, Andy Schuetz, Walter F. Stewart, Joshua C. Denny, Bradley A. Malin, Jimeng Sun
On-the-fly Network Pruning for Object Detection
Marc Masana, Joost van de Weijer, Andrew D. Bagdanov
Deep Directed Generative Models with Energy-Based Probability Estimation
Taesup Kim, Yoshua Bengio
Rectified Factor Networks for Biclustering
Djork-Arné Clevert, Thomas Unterthiner, Sepp Hochreiter
RandomOut: Using a convolutional gradient norm to win The Filter Lottery
Joseph Paul Cohen, Henry Z. Lo, Wei Ding
Persistent RNNs: Stashing Weights on Chip
Greg Diamos, Shubho Sengupta, Bryan Catanzaro, Mike Chrzanowski, Adam Coates, Erich Elsen, Jesse Engel, Awni Hannun, Sanjeev Satheesh
Scale Normalization
Henry Z Lo, Kevin Amaral, Wei Ding
Close-to-clean regularization relates virtual adversarial training, ladder networks and others
Mudassar Abbas, Jyri Kivinen, Tapani Raiko
Guided Sequence-to-Sequence Learning with External Rule Memory
Jiatao Gu, Baotian Hu, Zhengdong Lu, Hang Li, Victor O.K. Li
Neural Text Understanding with Attention Sum Reader
Rudolf Kadlec, Martin Schmid, Ondřej Bajgar, Jan Kleindienst
Incorporating Nesterov Momentum into Adam
Timothy Dozat
Variational Inference for On-line Anomaly Detection in High-Dimensional Time Series
Maximilian Sölch, Justin Bayer, Marvin Ludersdorfer, Patrick van der Smagt
Sequence-to-Sequence RNNs for Text Summarization
Ramesh Nallapati, Bing Xiang, Bowen Zhou
Neural Generative Question Answering
Jun Yin, Xin Jiang, Zhengdong Lu, Lifeng Shang, Hang Li, Xiaoming Li
Learning Document Embeddings by Predicting N-grams for Sentiment Classification of Long Movie Reviews
Bofang Li, Tao Liu, Xiaoyong Du, Deyuan Zhang, Zhe Zhao
Autoencoding for Joint Relation Factorization and Discovery from Text
Diego Marcheggiani, Ivan Titov
Adaptive Natural Gradient Learning Based on Riemannian Metric of Score Matching
Ryo Karakida, Masato Okada, Shun-ichi Amari
Neural Enquirer: Learning to Query Tables in Natural Language
Pengcheng Yin, Zhengdong Lu, Hang Li, Ben Kao
End to end speech recognition in English and Mandarin
Dario Amodei, Rishita Anubhai, Eric Battenberg, Carl Case, Jared Casper, Bryan Catanzaro, Jingdong Chen, Mike Chrzanowski, Adam Coates, Greg Diamos, Erich Elsen, Jesse Engel, Linxi Fan, Christopher Fougner, Tony Han, Awni Hannun, Billy Jun, Patrick LeGresley, Libby Lin, Sharan Narang, Andrew Ng, Sherjil Ozair, Ryan Prenger, Jonathan Raiman, Sanjeev Satheesh, David Seetapun, Shubho Sengupta, Yi Wang, Zhiqian Wang, Chong Wang, Bo Xiao, Dani Yogatama, Jun Zhan, Zhenyao Zhu
Lessons from the Rademacher Complexity for Deep Learning
Jure Sokolic, Raja Giryes, Guillermo Sapiro, Miguel R. D. Rodrigues
Coverage-based Neural Machine Translation
Zhaopeng Tu, Zhengdong Lu, Yang Liu, Xiaohua Liu, Hang Li
Feed-Forward Networks with Attention Can Solve Some Long-Term Memory Problems
Colin Raffel, Daniel P. W. Ellis
Learning stable representations in a changing world with on-line t-SNE: proof of concept in the songbird
Stéphane Deny, Emily Mackevicius, Tatsuo Okubo, Gordon Berman, Joshua Shaevitz, Michale Fee

Workshop Track Posters (May 3rd)

Mixtures of Sparse Autoregressive Networks
Marc Goessling, Yali Amit
Comparative Study of Caffe, Neon, Theano, and Torch for Deep Learning
Soheil Bahrampour, Naveen Ramakrishnan, Lukas Schott, Mohak Shah
Action Recognition using Visual Attention
Shikhar Sharma, Ryan Kiros, Ruslan Salakhutdinov
Improving performance of recurrent neural network with relu nonlinearity
Sachin S. Talathi, Aniket Vartak
Visualizing and Understanding Recurrent Networks
Andrej Karpathy, Justin Johnson, Li Fei-Fei
Learning to Decompose for Object Detection and Instance Segmentation
Eunbyung Park, Alexander C. Berg
Learning visual groups from co-occurrences in space and time
Phillip Isola, Daniel Zoran, Dilip Krishnan, Edward H. Adelson
Spatio-Temporal Video Autoencoder with Differentiable Memory
Viorica Patraucean, Ankur Handa, Roberto Cipolla
Task Loss Estimation for Structured Prediction
Dzmitry Bahdanau, Dmiriy Serdyuk, Philémon Brakel, Nan Rosemary Ke, Jan Chorowski, Aaron Courville, Yoshua Bengio
Conditional computation in neural networks for faster models
Emmanuel Bengio, Pierre-Luc Bacon, Joelle Pineau, Doina Precup
A metric learning approach for graph-based label propagation
Pauline Wauquier, Mikaela Keller
Bidirectional Helmholtz Machines
Jorg Bornschein, Samira Shabanian, Asja Fischer, Yoshua Bengio
A Controller-Recognizer Framework: How Necessary is Recogntion for Control?
Marcin Moczulski, Kelvin Xu, Aaron Courville, Kyunghyun Cho
Online Batch Selection for Faster Training of Neural Networks
Ilya Loshchilov, Frank Hutter
Nonparametric Canonical Correlation Analysis
Tomer Michaeli, Weiran Wang, Karen Livescu
Document Context Language Models
Yangfeng Ji, Trevor Cohn, Lingpeng Kong, Chris Dyer, Jacob Eisenstein
Unsupervised Learning of Visual Structure using Predictive Generative Networks
William Lotter, Gabriel Kreiman, David Cox
Convolutional Clustering for Unsupervised Learning
Aysegul Dundar, Jonghoon Jin, and Eugenio Culurciello
ParseNet: Looking Wider to See Better
Wei Liu, Andrew Rabinovich, Alexander C. Berg
Why are deep nets reversible: A simple theory, with implications for training
Sanjeev Arora, Yingyu Liang, Tengyu Ma
Binding via Reconstruction Clustering
Klaus Greff, Rupesh Srivastava, Jürgen Schmidhuber
Dynamic Capacity Networks
Amjad Almahairi, Nicolas Ballas, Tim Cooijmans, Yin Zheng, Hugo Larochelle, Aaron Courville
Learning Representations of Affect from Speech
Sayan Ghosh, Eugene Laksana, Louis-Philippe Morency, Stefan Scherer
Neural Variational Inference for Text Processing
Yishu Miao, Lei Yu, Phil Blunsom
Recurrent Models for Auditory Attention in Multi-Microphone Distance Speech Recognition
Suyoun Kim, Ian Lane
A Deep Memory-based Architecture for Sequence-to-Sequence Learning
Fandong Meng, Zhengdong Lu, Zhaopeng Tu, Hang Li, Qun Liu
Deconstructing the Ladder Network Architecture
Mohammad Pezeshki, Linxi Fan, Philemon Brakel, Aaron Courville, Yoshua Bengio
Neural network-based clustering using pairwise constraints
Yen-Chang Hsu, Zsolt Kira
LSTM-based Deep Learning Models for non-factoid answer selection
Ming Tan, Cicero dos Santos, Bing Xiang, Bowen Zhou
Using Deep Learning to Predict Demographics from Mobile Phone Metadata
Bjarke Felbo, Pål Sundsøy, Alex 'Sandy' Pentland, Sune Lehmann, Yves-Alexandre de Montjoye
Efficient Inference in Occlusion-Aware Generative Models of Images
Jonathan Huang, Kevin Murphy
Convolutional Models for Joint Object Categorization and Pose Estimation
Mohamed Elhoseiny, Tarek El-Gaaly, Amr Bakry, Ahmed Elgammal
Basic Level Categorization Facilitates Visual Object Recognition
Panqu Wang, Garrison Cottrell
Learning to Represent Words in Context with Multilingual Supervision
Kazuya Kawakami, Chris Dyer
Fine-grained pose prediction, normalization, and recognition
Ning Zhang, Evan Shelhamer, Yang Gao, Trevor Darrell
Scalable Gradient-Based Tuning of Continuous Regularization Hyperparameters
Jelena Luketina, Mathias Berglund, Tapani Raiko
Unitary Evolution Recurrent Neural Networks
Martin Arjovsky, Amar Shah, Yoshua Bengio
Temporal Convolutional Neural Networks for Diagnosis from Lab Tests
Narges Razavian, David Sontag
PerforatedCNNs: Acceleration through Elimination of Redundant Convolutions
Michael Figurnov, Dmitry Vetrov, Pushmeet Kohli
How far can we go without convolution: Improving fully-connected networks
Zhouhan Lin, Roland Memisevic, Kishore Konda
Learning Dense Convolutional Embeddings for Semantic Segmentation
Adam W. Harley, Konstantinos G. Derpanis, Iasonas Kokkinos
Generating Sentences from a Continuous Space
Samuel R. Bowman, Luke Vilnis, Oriol Vinyals, Andrew M. Dai, Rafal Jozefowicz, Samy Bengio
Stacked What-Where Auto-encoders
Junbo Zhao, Michael Mathieu, Ross Goroshin, Yann LeCun
Bayesian Convolutional Neural Networks with Bernoulli Approximate Variational Inference
Yarin Gal, Zoubin Ghahramani
Blending LSTMs into CNNs
Krzysztof J. Geras, Abdel-rahman Mohamed, Rich Caruana, Gregor Urban, Shengjie Wang, Ozlem Aslan, Matthai Philipose, Matthew Richardson, Charles Sutton
Empirical performance upper bounds for image and video captioning
Li Yao, Nicolas Ballas, Kyunghyun Cho, John R. Smith, Yoshua Bengio
Adversarial Autoencoders
Alireza Makhzani, Jonathon Shlens, Navdeep Jaitly, Ian Goodfellow
Deep Reinforcement Learning with an Action Space Defined by Natural Language
Ji He, Jianshu Chen, Xiaodong He, Jianfeng Gao, Lihong Li, Li Deng, Mari Ostendorf
Universum Prescription: Regularization using Unlabeled Data
Xiang Zhang, Yann LeCun
Variance Reduction in SGD by Distributed Importance Sampling
Guillaume Alain, Alex Lamb, Chinnadhurai Sankar, Aaron Courville, Yoshua Bengio
Adding Gradient Noise Improves Learning for Very Deep Networks
Arvind Neelakantan, Luke Vilnis, Quoc V. Le, Ilya Sutskever, Lukasz Kaiser, Karol Kurach, James Martens
Black Box Variational Inference for State Space Models
Evan Archer, Il Memming Park, Lars Buesing, John Cunningham, Liam Paninski
Input-Convex Deep Networks
Brandon Amos, J. Zico Kolter

Accepted Papers (Conference Track)

Multi-Scale Context Aggregation by Dilated Convolutions
Fisher Yu, Vladlen Koltun
The Variational Fair Autoencoder
Christos Louizos, Kevin Swersky, Yujia Li, Max Welling, Richard Zemel
A note on the evaluation of generative models
Lucas Theis, Aäron van den Oord, Matthias Bethge
Learning to Diagnose with LSTM Recurrent Neural Networks
Zachary Lipton, David Kale, Charles Elkan, Randall Wetzel
Prioritized Experience Replay
Tom Schaul, John Quan, Ioannis Antonoglou, David Silver
Importance Weighted Autoencoders
Yuri Burda, Ruslan Salakhutdinov, Roger Grosse
Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding
Song Han, Huizi Mao, Bill Dally
Variationally Auto-Encoded Deep Gaussian Processes
Zhenwen Dai, Andreas Damianou, Javier Gonzalez, Neil Lawrence
Training Convolutional Neural Networks with Low-rank Filters for Efficient Image Classification
Yani Ioannou, Duncan Robertson, Jamie Shotton, roberto Cipolla, Antonio Criminisi, Jamie Shotton
Neural Networks with Few Multiplications
Zhouhan Lin, Matthieu Courbariaux, Roland Memisevic, Yoshua Bengio
Reducing Overfitting in Deep Networks by Decorrelating Representations
Michael Cogswell, Faruk Ahmed, Ross Girshick, Larry Zitnick, Dhruv Batra
Pushing the Boundaries of Boundary Detection using Deep Learning
Iasonas Kokkinos
Generating Images from Captions with Attention
Elman Mansimov, Emilio Parisotto, Jimmy Ba, Ruslan Salakhutdinov
Reasoning about Entailment with Neural Attention
Tim Rocktäschel, Edward Grefenstette, Karl Moritz Hermann, Tomáš Kočiský, Phil Blunsom
Convolutional Neural Networks With Low-rank Regularization
Cheng Tai, Tong Xiao, Yi Zhang, Xiaogang Wang, Weinan E
Unifying distillation and privileged information
David Lopez-Paz, Leon Bottou, Bernhard Schölkopf, Vladimir Vapnik
Particular object retrieval with integral max-pooling of CNN activations [code]
Giorgos Tolias, Ronan Sicre, Hervé Jégou
All you need is a good init [code]
Dmytro Mishkin, Jiri Matas
Bayesian Representation Learning with Oracle Constraints
Theofanis Karaletsos, Serge Belongie, Gunnar Rätsch
Neural Programmer: Inducing Latent Programs with Gradient Descent
Arvind Neelakantan, Quoc Le, Ilya Sutskever
Towards Universal Paraphrastic Sentence Embeddings [code]
John Wieting, Mohit Bansal, Kevin Gimpel, Karen Livescu
Regularizing RNNs by Stabilizing Activations
David Krueger, Roland Memisevic
SparkNet: Training Deep Networks in Spark
Philipp Moritz, Robert Nishihara, Ion Stoica, Michael Jordan
Unsupervised and Semi-supervised Learning with Categorical Generative Adversarial Networks
Jost Tobias Springenberg
The Goldilocks Principle: Reading Children's Books with Explicit Memory Representations
Felix Hill, Antoine Bordes, Sumit Chopra, Jason Weston
MuProp: Unbiased Backpropagation For Stochastic Neural Networks
Shixiang Gu, Sergey Levine, Ilya Sutskever, Andriy Mnih
Data Representation and Compression Using Linear-Programming Approximations
Hristo Paskov, John Mitchell, Trevor Hastie
Diversity Networks
Zelda Mariet, Suvrit Sra
Deep Reinforcement Learning in Parameterized Action Space [code] [data]
Matthew Hausknecht, Peter Stone
Learning VIsual Predictive Models of Physics for Playing Billiards
Katerina Fragkiadaki, Pulkit Agrawal, Sergey Levine, Jitendra Malik
Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks [code] [data]
Jason Weston, Antoine Bordes, Sumit Chopra, Sasha Rush, Bart van Merrienboer, Armand Joulin, Tomas Mikolov
Evaluating Prerequisite Qualities for Learning End-to-end Dialog Systems [data]
Jesse Dodge, Andreea Gane, Xiang Zhang, Antoine Bordes, Sumit Chopra, Alexander Miller, Arthur Szlam, Jason Weston
Better Computer Go Player with Neural Network and Long-term Prediction
Yuandong Tian, Yan Zhu
Distributional Smoothing with Virtual Adversarial Training [code]
Takeru Miyato, Shin-ichi Maeda, Masanori Koyama, Ken Nakae, Shin Ishii
Multi-task Sequence to Sequence Learning
Minh-Thang Luong, Quoc Le, Ilya Sutskever, Oriol Vinyals, Lukasz Kaiser
A Test of Relative Similarity for Model Selection in Generative Models
Eugene Belilovsky, Wacha Bounliphone, Matthew Blaschko, Ioannis Antonoglou, Arthur Gretton
Compression of Deep Convolutional Neural Networks for Fast and Low Power Mobile Applications
Yong-Deok Kim, Eunhyeok Park, Sungjoo Yoo, Taelim Choi, Lu Yang, Dongjun Shin
Neural Programmer-Interpreters
Scott Reed, Nando de Freitas
Session-based recommendations with recurrent neural networks [code]
Balázs Hidasi, Alexandros Karatzoglou, Linas Baltrunas, Domonkos Tikk
Continuous control with deep reinforcement learning
Timothy Lillicrap, Jonathan Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, Daan Wierstra
Recurrent Gaussian Processes
César Lincoln Mattos, Zhenwen Dai, Andreas Damianou, Jeremy Forth, Guilherme Barreto, Neil Lawrence
Modeling Visual Representations:Defining Properties and Deep Approximations
Stefano Soatto, Alessandro Chiuso
Auxiliary Image Regularization for Deep CNNs with Noisy Labels
Samaneh Azadi, Jiashi Feng, Stefanie Jegelka, Trevor Darrell
Convergent Learning: Do different neural networks learn the same representations?
Yixuan Li, Jason Yosinski, Jeff Clune, Hod Lipson, John Hopcroft
Policy Distillation
Andrei Rusu, Sergio Gomez, Caglar Gulcehre, Guillaume Desjardins, James Kirkpatrick, Razvan Pascanu, Volodymyr Mnih, Koray Kavukcuoglu, Raia Hadsell
Neural Random-Access Machines
Karol Kurach, Marcin Andrychowicz, Ilya Sutskever
Gated Graph Sequence Neural Networks
Yujia Li, Daniel Tarlow, Marc Brockschmidt, Richard Zemel, CIFAR
Metric Learning with Adaptive Density Discrimination
Oren Rippel, Manohar Paluri, Piotr Dollar, Lubomir Bourdev
Censoring Representations with an Adversary
Harrison Edwards, Amos Storkey
Order-Embeddings of Images and Language [code]
Ivan Vendrov, Ryan Kiros, Sanja Fidler, Raquel Urtasun
Variable Rate Image Compression with Recurrent Neural Networks
George Toderici, Sean O'Malley, Damien Vincent, Sung Jin Hwang, Michele Covell, Shumeet Baluja, Rahul Sukthankar, David Minnen
Delving Deeper into Convolutional Networks for Learning Video Representations
Nicolas Ballas, Li Yao, Pal Chris, Aaron Courville
8-Bit Approximations for Parallelism in Deep Learning
Tim Dettmers
Data-dependent initializations of Convolutional Neural Networks [code]
Philipp Kraehenbuehl, Carl Doersch, Jeff Donahue, Trevor Darrell
Order Matters: Sequence to sequence for sets
Oriol Vinyals, Samy Bengio, Manjunath Kudlur
High-Dimensional Continuous Control Using Generalized Advantage Estimation
John Schulman, Philipp Moritz, Sergey Levine, Michael Jordan, Pieter Abbeel
BlackOut: Speeding up Recurrent Neural Network Language Models With Very Large Vocabularies [code]
Shihao Ji, Swaminathan Vishwanathan, Nadathur Satish, Michael Anderson, Pradeep Dubey
Deep Multi Scale Video Prediction Beyond Mean Square Error
Michael Mathieu, camille couprie, Yann Lecun
Grid Long Short-Term Memory
Nal Kalchbrenner, Alex Graves, Ivo Danihelka
Net2Net: Accelerating Learning via Knowledge Transfer
Tianqi Chen, Ian Goodfellow, Jon Shlens
Predicting distributions with Linearizing Belief Networks
Yann Dauphin, David Grangier
Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs)
Djork-Arné Clevert, Thomas Unterthiner, Sepp Hochreiter
Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning
Emilio Parisotto, Jimmy Ba, Ruslan Salakhutdinov
Segmental Recurrent Neural Networks
Lingpeng Kong, Chris Dyer, Noah Smith
Deep Linear Discriminant Analysis [code]
Matthias Dorfer, Rainer Kelz, Gerhard Widmer
Large-Scale Approximate Kernel Canonical Correlation Analysis
Weiran Wang, Karen Livescu
Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks
Alec Radford, Luke Metz, Soumith Chintala
Learning Representations from EEG with Deep Recurrent-Convolutional Neural Networks [code]
Pouya Bashivan, Irina Rish, Mohammed Yeasin, Noel Codella
Digging Deep into the layers of CNNs: In Search of How CNNs Achieve View Invariance
Amr Bakry, Mohamed Elhoseiny, Tarek El-Gaaly, Ahmed Elgammal
An Exploration of Softmax Alternatives Belonging to the Spherical Loss Family
Alexandre De Brébisson, Pascal Vincent
Data-Dependent Path Normalization in Neural Networks
Behnam Neyshabur, Ryota Tomioka, Ruslan Salakhutdinov, Nathan Srebro
Reasoning in Vector Space: An Exploratory Study of Question Answering
Moontae Lee, Xiaodong He, Wen-tau Yih, Jianfeng Gao, Li Deng, Paul Smolensky
Neural GPUs Learn Algorithms [code] [video]
Lukasz Kaiser, Ilya Sutskever
ACDC: A Structured Efficient Linear Layer
Marcin Moczulski, Misha Denil, Jeremy Appleyard, Nando de Freitas
Density Modeling of Images using a Generalized Normalization Transformation
Johannes Ballé, Valero Laparra, Eero Simoncelli
Adversarial Manipulation of Deep Representations [code]
Sara Sabour, Yanshuai Cao, Fartash Faghri, David Fleet
Geodesics of learned representations
Olivier Hénaff, Eero Simoncelli
Sequence Level Training with Recurrent Neural Networks
Marc'Aurelio Ranzato, Sumit Chopra, Michael Auli, Wojciech Zaremba
Super-resolution with deep convolutional sufficient statistics
Joan Bruna, Pablo Sprechmann, Yann Lecun
Variational Gaussian Process
Dustin Tran, Rajesh Ranganath, David Blei

Presentation Guidelines

Conference Orals

Talks should be no longer than 17 minutes, leaving 2-3 minutes for questions by the audience. The author who will be giving the talk must find the oral session chair in advance, to test the use of his/her personal laptop for presenting the slides.

Talks scheduled before the morning coffee break should do a laptop test before the morning session starts, while other talks can perform their tests during the coffee break.

Poster Presentations

The poster boards are 4 ft. high by 8 ft. wide. Poster presenters are encouraged to put up their posters as early as the day's morning coffee break (10:20 to 10:50).

Each poster is assigned a number, shown above. Presenters should use the poster board corresponding to the number for their work.

Once the poster session is over, presenters have until the end of the day to take off their posters from their assigned poster boards.