SPARC: Scenario Planning and Reasoning for Automated C Unit Test Generation
Professional Abstract
"The challenge of automated unit test generation for C programming stems from the inherent semantic gap between the high-level intent of a programmer and the low-level syntactic requirements imposed by C's pointer arithmetic and manual memory management. This paper introduces SPARC, a neuro-symbolic framework designed to address these challenges by enhancing the capabilities of Large Language Models (LLMs) in generating meaningful unit tests. The traditional approach of intent-to-code synthesis often leads to issues such as non-compilable tests, hallucinated function signatures, and low coverage metrics due to the leap-to-code failure mode, where LLMs prematurely generate code without adequate grounding in the underlying program structure and semantics. SPARC operates through a four-stage process: 1. **Control Flow Graph (CFG) Analysis**: This initial stage involves analyzing the program's control flow to understand the logical structure and paths that can be taken during execution. 2. **Operation Map**: This component grounds the reasoning of LLMs in validated utility helpers, ensuring that the generated tests are relevant and applicable to the program's context. 3. **Path-targeted Test Synthesis**: In this stage, the framework synthesizes tests that are specifically targeted at identified paths within the CFG, enhancing the likelihood of meaningful test coverage. 4. **Iterative Self-correction Validation Loop**: Finally, SPARC employs a validation loop that utilizes feedback from both the compiler and runtime to iteratively refine the generated tests, correcting any issues that may arise during initial test synthesis. The evaluation of SPARC was conducted on 59 real-world and algorithmic subjects, demonstrating significant improvements over traditional prompt generation baselines. Specifically, SPARC achieved a 31.36% increase in line coverage, a 26.01% increase in branch coverage, and a 20.78% improvement in mutation score compared to baseline methods. Notably, SPARC's performance was comparable to or exceeded that of the established symbolic execution tool KLEE, particularly on more complex subjects. Furthermore, the framework retained 94.3% of the tests through its iterative repair process, indicating a robust approach to test generation. The generated code also received higher ratings for readability and maintainability from developers, suggesting that SPARC not only improves test coverage but also enhances the quality of the generated tests. In conclusion, SPARC represents a significant advancement in the field of automated unit testing for legacy C codebases. By effectively aligning LLM reasoning with the program structure, SPARC provides a scalable solution that addresses the pressing need for improved testing methodologies in the software development industry, particularly for legacy systems that continue to be critical in various applications."