Bilkent University
Department of Computer Engineering
M.S.THESIS PRESENTATION
Automated Code Review: Empirical Evidence from Experiments and Industry
Umut Cihan
Master Student
(Supervisor: Asst.Prof.Eray Tüzün)
Computer Engineering Department
Bilkent University
Abstract: Code reviews are essential for software quality. Advances in large language models (LLMs) have enabled AI-powered code reviews, but their reliability and impact on the industry remain unclear. This study evaluates LLMs for detecting code correctness and suggests improvements while assessing the adoption of AI-assisted code review tools in practice. The thesis consists of two studies: an experimental evaluation of LLMs in code review tasks and a case study on real-world AI-assisted code reviews. In the experiment, GPT-4o and Gemini 2.0 Flash were tested on 492 AI-generated code blocks and 164 HumanEval benchmark blocks. The models assessed correctness and suggested fixes, with GPT-4o achieving 68.50\% accuracy in classification and 67.83\% in corrections, outperforming Gemini 2.0 Flash (63.89\% and 54.26\%). Performance dropped without problem descriptions and varied across code types. The case study examined an AI-assisted code review tool based on Qodo PR Agent deployed to 238 practitioners across ten projects. The analysis focused on 4,335 pull requests, of which 1,568 received automated reviews. Developers engaged with 73.8\% of AI-generated comments, though pull request closure time increased. Surveys indicated minor improvements in code quality but highlighted issues such as faulty suggestions and increased review time. LLM-based code reviews aid in detecting issues and improving code but risk errors. A “Human-in-the-loop” approach is proposed to balance automation with oversight. Despite challenges, AI-assisted reviews enhance bug detection and code awareness, offering valuable, albeit imperfect, integration into software development workflows.
DATE: April 10, Thursday @ 13:30 Place: EA 409