Research

SPARC: Scenario Planning and Reasoning for Automated C Unit Test Generation

arXiv•February 18, 2026 ()•Jaid Monwar Chowdhury, Chi-An Fu, Reyhaneh Jabbarvand

Professional Abstract

"The challenge of automated unit test generation for C programming stems from the inherent semantic gap between the high-level intent of a programmer and the low-level syntactic requirements imposed by C's pointer arithmetic and manual memory management. This paper introduces SPARC, a neuro-symbolic framework designed to address these challenges by enhancing the capabilities of Large Language Models (LLMs) in generating meaningful unit tests. The traditional approach of intent-to-code synthesis often leads to issues such as non-compilable tests, hallucinated function signatures, and low coverage metrics due to the leap-to-code failure mode, where LLMs prematurely generate code without adequate grounding in the underlying program structure and semantics. SPARC operates through a four-stage process: 1. **Control Flow Graph (CFG) Analysis**: This initial stage involves analyzing the program's control flow to understand the logical structure and paths that can be taken during execution. 2. **Operation Map**: This component grounds the reasoning of LLMs in validated utility helpers, ensuring that the generated tests are relevant and applicable to the program's context. 3. **Path-targeted Test Synthesis**: In this stage, the framework synthesizes tests that are specifically targeted at identified paths within the CFG, enhancing the likelihood of meaningful test coverage. 4. **Iterative Self-correction Validation Loop**: Finally, SPARC employs a validation loop that utilizes feedback from both the compiler and runtime to iteratively refine the generated tests, correcting any issues that may arise during initial test synthesis. The evaluation of SPARC was conducted on 59 real-world and algorithmic subjects, demonstrating significant improvements over traditional prompt generation baselines. Specifically, SPARC achieved a 31.36% increase in line coverage, a 26.01% increase in branch coverage, and a 20.78% improvement in mutation score compared to baseline methods. Notably, SPARC's performance was comparable to or exceeded that of the established symbolic execution tool KLEE, particularly on more complex subjects. Furthermore, the framework retained 94.3% of the tests through its iterative repair process, indicating a robust approach to test generation. The generated code also received higher ratings for readability and maintainability from developers, suggesting that SPARC not only improves test coverage but also enhances the quality of the generated tests. In conclusion, SPARC represents a significant advancement in the field of automated unit testing for legacy C codebases. By effectively aligning LLM reasoning with the program structure, SPARC provides a scalable solution that addresses the pressing need for improved testing methodologies in the software development industry, particularly for legacy systems that continue to be critical in various applications."

Technical Insights

1SPARC is a neuro-symbolic framework designed to enhance automated unit test generation for C programming.

2The framework addresses the semantic gap between high-level programming intent and low-level C syntax.

3SPARC operates through a four-stage process: CFG analysis, Operation Map, path-targeted test synthesis, and iterative self-correction.

4The Control Flow Graph analysis helps in understanding the logical structure of the program for better test generation.

5The Operation Map grounds LLM reasoning in validated utility helpers to ensure relevance in generated tests.

6Path-targeted test synthesis focuses on generating tests for specific execution paths identified in the CFG.

7The iterative self-correction validation loop uses compiler and runtime feedback to refine generated tests.

8SPARC outperformed traditional prompt generation baselines by significant margins in line coverage, branch coverage, and mutation score.

9The framework retains a high percentage of tests through iterative repair, ensuring robustness and quality in test generation.