Spotlights Session 1
Workshop: S2D-OLAD: From shallow to deep, overcoming limited and adverse data

Data-Efficient Training of Autoencoders for Mildly Non-Linear Problems

Muhammad Al-Digeil

[ Abstract ]
Fri 7 May 6:30 a.m. PDT — 6:34 a.m. PDT


"Principal Component Analysis (PCA) provides reliable dimensionality reduction (DR) when data possesses linear properties even for small datasets. However, faced with data that exhibits non-linear behaviour, PCA cannot perform optimally as compared to non-linear DR methods such as AutoEncoders. By contrast, AutoEncoders typically require much larger datasets for training than PCA. This data requirement is a critical impediment in applications where samples are scarce and expensive to come by. One such area is nanophotonics component design where generating a single data point might involve running optimization methods that use computationally demanding solvers.

We propose Guided AutoEncoders (G-AE) of nearly arbitrary architecture which are standard AutoEncoders initialized using a numerically stable procedure to replicate PCA behaviour before training. Our results show this approach yields a marked reduction in the data size requirements for training the network along with gains in capturing non-linearity during dimensionality reduction and thus performing better than PCA alone."