While LLM-based agents are able to tackle a wide variety of code reasoning questions, the answers are not always correct. This prevents the agent from being useful in situations where high precision is desired: (1) helping a software engineer understand a new code base, (2) helping a software engineer during code review sessions, and (3) ensuring that the code generated by an automated code generation system meets certain requirements (e.g. fixes a bug, improves readability, implements a feature). As a result of this lack of trustworthiness, the agent’s answers need to be manually verified before they can be trusted. Manually confirming responses from a code reasoning agent requires human effort and can result in slower developer productivity, which weakens the assistance benefits of the agent. In this paper, we describe a method to automatically validate the answers provided by a code reasoning agent by verifying its reasoning steps. At a very high level, the method consists of extracting a formal representation of the agent’s response and, subsequently, using formal verification and program analysis tools to verify the agent’s reasoning steps. We applied this approach to a benchmark set of 20 uninitialized variable errors detected by sanitizers and 20 program equivalence queries. For the uninitialized variable errors, the formal verification step was able to validate the agent’s reasoning on 13/20 examples, and for the program equivalence queries, the formal verification step successfully caught 6/8 incorrect judgments made by the agent.
@article{sistla2025towards,title={Towards Verified Code Reasoning by LLMs},author={Sistla, Meghana and Balakrishnan, Gogul and Rondon, Pat and Cambronero, Jos{\'e} and Tufano, Michele and Chandra, Satish},journal={arXiv preprint arXiv:2509.26546},year={2025},}
CLEVER: A Curated Benchmark for Formally Verified Code Generation
Amitayush
Thakur, Jasper
Lee, George
Tsoukalas, Meghana
Sistla, Matthew
Zhao, and
4 more authors
We introduce CLEVER, a high-quality, curated benchmark of 161 problems for end-to-end verified code generation in Lean. Each problem consists of (1) the task of generating a specification that matches a held-out ground-truth specification, and (2) the task of generating a Lean implementation that provably satisfies this specification. Unlike prior benchmarks, CLEVER avoids test-case supervision, LLM-generated annotations, and specifications that leak implementation logic or allow vacuous solutions. All outputs are verified post-hoc using Lean’s type checker to ensure machine-checkable correctness. We use CLEVER to evaluate several few-shot and agentic approaches based on state-of-the-art language models. These methods all struggle to achieve full verification, establishing it as a challenging frontier benchmark for program synthesis and formal reasoning.
@article{thakur2025clever,title={CLEVER: A Curated Benchmark for Formally Verified Code Generation},author={Thakur, Amitayush and Lee, Jasper and Tsoukalas, George and Sistla, Meghana and Zhao, Matthew and Zetzsche, Stefan and Durrett, Greg and Yue, Yisong and Chaudhuri, Swarat},journal={arXiv preprint arXiv:2505.13938},year={2025},}
This paper presents a new data structure, called Weighted Context-Free-Language Ordered BDDs (WCFLOBDDs), which are a hierarchically structured decision diagram, akin to Weighted BDDs (WBDDs) enhanced with a procedure-call mechanism. For some functions, WCFLOBDDs are exponentially more succinct than WBDDs. They are potentially beneficial for representing functions of type B^n -> D, when a function’s image V ⊂D has many different values. We apply WCFLOBDDs in quantum-circuit simulation, and find that they perform better than WBDDs on certain benchmarks. With a 15-minute timeout, the number of qubits that can be handled by WCFLOBDDs is 1-64× that of WBDDs(and 1-128× that of CFLOBDDs, which are an unweighted version of WCFLOBDDs). These results support the conclusion that for this application—from the standpoint of problem size, measured as the number of qubits—WCFLOBDDs provide the best of both worlds: performance roughly matches whichever of WBDDs and CFLOBDDs is better.(From the standpoint of running time, the results are more nuanced.)
@article{sistla2024weighted,title={Weighted context-free-language ordered binary decision diagrams},author={Sistla, Meghana and Chaudhuri, Swarat and Reps, Thomas},journal={Proceedings of the ACM on Programming Languages},volume={8},number={OOPSLA2},pages={1390--1419},year={2024},publisher={ACM New York, NY, USA},}
This article presents a new compressed representation of Boolean functions, called CFLOBDDs (for Context-Free-Language Ordered Binary Decision Diagrams). They are essentially a plug-compatible alternative to BDDs (Binary Decision Diagrams), and hence are useful for representing certain classes of functions, matrices, graphs, relations, and so forth in a highly compressed fashion. CFLOBDDs share many of the good properties of BDDs, but—in the best case—the CFLOBDD for a Boolean function can be exponentially smaller than any BDD for that function. Compared with the size of the decision tree for a function, a CFLOBDD—again, in the best case—can give a double-exponential reduction in size. They have the potential to permit applications to (i) execute much faster and (ii) handle much larger problem instances than has been possible heretofore.
We applied CFLOBDDs in quantum-circuit simulation and found that for several standard problems, the improvement in scalability, compared to BDDs, is quite dramatic. With a 15-minute timeout, the number of qubits that CFLOBDDs can handle are 65,536 for Greenberger-Horne-Zellinger, 524,288 for Bernstein-Vazirani, 4,194,304 for Deutsch-Jozsa, and 4,096 for Grover’s algorithm, besting BDDs by factors of 128×, 1,024×, 8,192×, and 128×, respectively.
@article{sistla2024cflobdds,title={CFLOBDDs: Context-free-language ordered binary decision diagrams},author={Sistla, Meghana Aparna and Chaudhuri, Swarat and Reps, Thomas},journal={ACM Transactions on Programming Languages and Systems},volume={46},number={2},pages={1--82},year={2024},publisher={ACM New York, NY},}
2023
Symbolic quantum simulation with quasimodo
Meghana
Sistla, Swarat
Chaudhuri, and Thomas
Reps
In International Conference on Computer Aided Verification , 2023
The simulation of quantum circuits on classical computers is an important problem in quantum computing. Such simulation requires representations of distributions over very large sets of basis vectors, and recent work has used symbolic data-structures such as Binary Decision Diagrams (BDDs) for this purpose. In this tool paper, we present Quasimodo, an extensible, open-source Python library for symbolic simulation of quantum circuits. Quasimodo is specifically designed for easy extensibility to other backends. Quasimodo allows simulations of quantum circuits, checking properties of the outputs of quantum circuits, and debugging quantum circuits. It also allows the user to choose from among several symbolic data-structures—both unweighted and weighted BDDs, and a recent structure called Context-Free-Language Ordered Binary Decision Diagrams (CFLOBDDs)—and can be easily extended to support other symbolic data-structures.
@inproceedings{sistla2023symbolic,title={Symbolic quantum simulation with quasimodo},author={Sistla, Meghana and Chaudhuri, Swarat and Reps, Thomas},booktitle={International Conference on Computer Aided Verification},pages={213--225},year={2023},organization={Springer},}
2019
Graph coloring using GPUs
Meghana Aparna
Sistla, and V Krishna
Nandivada
In European Conference on Parallel Processing , 2019
Graph coloring is a widely studied problem that is used in a variety of applications, such as task scheduling, register allocation, eigenvalue computations, social network analysis, and so on. Many of the modern day applications deal with large graphs (with millions of vertices and edges) and researchers have exploited the parallelism provided by multi-core systems to efficiently color such large graphs. GPUs provide a promising parallel infrastructure to run large applications. In this paper, we present new schemes to efficiently color large graphs on GPUs.
We extend the algorithm of Rokos et al. [21] to efficiently color graphs using GPUs. Their approach has to continually resolve conflicts for color assignment. We present a data driven variation of their algorithm and use an improved scheme for conflict resolution. We also propose two optimizations for our algorithm to reduce both the execution time and memory requirements. We have evaluated our scheme (called SIRG) against the NVIDIA cuSPARSE library and the state of the art work, and show that SIRG runs significantly faster: geomean 3.42
and 1.76, respectively. We have also compared SIRG against the another scheme for CPUs and show that SIRG performs faster on most input graphs: geomean 10.37.
@inproceedings{sistla2019graph,title={Graph coloring using GPUs},author={Sistla, Meghana Aparna and Nandivada, V Krishna},booktitle={European Conference on Parallel Processing},pages={377--390},year={2019},organization={Springer},}