Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Bridging the Gap Between Practice and Theory in Deep Learning

Stability Analysis of Various Symbolic Rule Extraction Methods from Recurrent Neural Network

Neisarg Dave · Daniel Kifer · C. Giles · Ankur Mali


Abstract: This paper analyzes two competing rule extraction methodologies: quantization and equivalence query. Previous research has shown that equivalence query LL is robust DFA extraction method and LSTMs are capable of learning counter languages. However, we empirically show significant instabilities in both cases. We trained 36003600 RNN models, extracting 1800018000 DFAs with a quantization approach (k-means and SOM) and 36003600 DFAs by equivalence query(LL) methods across 1010 initialization seeds. We sampled the datasets from 77 Tomita and 44 Dyck grammars and trained them on 44 RNN cells: LSTM, GRU, O2RNN, and MIRNN. The observations from our experiments establish the superior performance of O2RNN and quantization-based rule extraction over others. LL, primarily proposed for regular grammars, performs similarly to quantization methods for Tomita languages when neural networks are perfectly trained. However, for partially trained RNNs, LL shows instability in the number of states in DFA, e.g., for Tomita 5 and Tomita 6 languages, LL produced more than 100100 states. In contrast, quantization methods result in rules with a number of states very close to ground truth DFA. Among RNN cells, O2RNN produces stable DFA consistently compared to other cells. For Dyck Languages, we observe that although GRU outperforms other RNNs in network performance, the DFA extracted by O2RNN has higher performance and better stability. The stability is computed as the standard deviation of accuracy on test sets on networks trained across 1010 seeds. On Dyck Languages, quantization methods outperformed LL with better stability in accuracy and the number of states. LL often showed instability in accuracy in the order of 1616 for GRU and MIRNN while deviation for quantization methods varied in 55. In many instances with LSTM and GRU, DFA's extracted by LL even failed to beat chance accuracy (5050), while those extracted by quantization method had standard deviation in the 77 range. For O2RNN, both rule extraction methods had a deviation in the 0.50.5 range.

Chat is not available.