Poster
in
Workshop: Workshop on Logical Reasoning of Large Language Models

Entropy Jurisprudence: Auditing Procedural Fidelity in LLM Normative Reasoning

Xiwei Chen

Project Page [ OpenReview]

Abstract

Outcome-correct but procedurally inconsistent reasoning poses deployment risks for LLM-based agents. We introduce Entropy Jurisprudence, a procedural audit framework testing whether LLMs faithfully execute formal normative rules. Using a minimal harm formula ($E = H \times R$), we measure parameter stability across 720 trials on six models. Results reveal a strong empirical alignment-reasoning tension: instruction-faithful models (Qwen3) execute rules reliably but may follow harmful logic; prior-dominant models (Gemma3) maintain safety but ignore parameters entirely (97.5% Guilty); context-sensitive models (Llama3) reconcile conflicts through scale hallucination—generating out-of-distribution numeric values (RI=328). Notably, all models achieve identical ETHICS-style accuracy (50%) while exhibiting dramatically different procedural fidelity, demonstrating that outcome-based evaluation alone is insufficient. Our framework provides a minimal methodology for auditing procedural fidelity before deploying LLM-based agents with irreversible real-world action capabilities.

Chat is not available.