Towards Persistent Noise-Tolerant Active Learning of Regular Languages with Class Query
Lekai Chen · Ashutosh Trivedi · Alvaro Velasquez
Abstract
Large Language Models (LLMs) are increasingly deployed in human–AI collaborative decision-making systems, where they are expected to align precise formal representations with ambiguous natural language. However, their ad hoc strategies for resolving ambiguity often lead to hallucinations and inconsistencies. We formalize this setting via probabilistic Minimally Adequate Teachers (pMATs) that (i) answer membership queries with fixed but possibly flipped labels, and (ii) return valid counterexamples to hypothesis equivalence. We present **CAPAL** (**C**lass-query **A**ctive, **P**ersistent-noise-**A**ware **L**earning), an active algorithm for learning deterministic finite automata (DFAs) that remains correct under persistent membership noise without demonstrations. CAPAL augments the classic \$L^\star\$ loop with two components grounded in our implementation: (1) a *class query* realized as a statistical same-state test that compares disagreements between two prefixes against a noise-floor estimate \$\hat{\eta}\$ with Hoeffding tolerances; (2) a *discrimination tree* that selects a near-minimal discriminator, keeping the core suffix set small. An efficient micro-bootstrap and cache-reuse scheme estimates \$\hat{\eta}\$ with few new queries. We prove convergence given a perfect language-equivalence oracle and show substantial membership-query savings in practice. Our evaluation across multiple benchmarks, including RegexLib and KB13, demonstrates that this approach enhances both the efficiency and robustness of DFA learning under noisy oracles, supporting the view of LLMs as fallible yet useful collaborators for synthesizing verifiable formal artifacts.
Successful Page Load