Spotlights Session 1
in
Workshop: S2D-OLAD: From shallow to deep, overcoming limited and adverse data
Data-Efficient Training of Autoencoders for Mildly Non-Linear Problems
Muhammad Al-Digeil
in
Workshop: S2D-OLAD: From shallow to deep, overcoming limited and adverse data
"Principal Component Analysis (PCA) provides reliable dimensionality reduction (DR) when data possesses linear properties even for small datasets. However, faced with data that exhibits non-linear behaviour, PCA cannot perform optimally as compared to non-linear DR methods such as AutoEncoders. By contrast, AutoEncoders typically require much larger datasets for training than PCA. This data requirement is a critical impediment in applications where samples are scarce and expensive to come by. One such area is nanophotonics component design where generating a single data point might involve running optimization methods that use computationally demanding solvers.
We propose Guided AutoEncoders (G-AE) of nearly arbitrary architecture which are standard AutoEncoders initialized using a numerically stable procedure to replicate PCA behaviour before training. Our results show this approach yields a marked reduction in the data size requirements for training the network along with gains in capturing non-linearity during dimensionality reduction and thus performing better than PCA alone."