Bilkent University
Department of Computer Engineering
SEMINAR

 

LLMs sometimes make mistakes. When should I trust them?

 

Prof. Dr. Prem Devanbu

Abstract: LLMs make a lot of mistakes when generating software-related artifacts. But they always produce something. So, what should a developer do with LLM output? We discuss some empirical findings, and some recent work on trying to get LLMs to provide a reliable indication of how confident they are in their output. If this indication is reliable, perhaps we can decide on a more rational way to use LLM outputs, balancing productivity gains with quality risk. We offer some background on the applicability of "Calibration" concepts in this setting, and discuss a few different technical approaches, some of which are quite encouraging. Part of this work will be published and presented at ICSE 2025. Joint work with Toufique Ahmed, David Gros, Claudio Spiess, Kunal Pai, and Yuvraj Virk (all students at UC Davis), Michael Pradel at Uni. Stuttgart, Amin Alipour at U. Houston and Susmit Jha at SRI. Work was supported by a grant from NSF, and was partially conducted at Univ. of Stuttgart during 2023, while Devanbu was supported by a Forschungspreis from the Alexander von Humboldt Stiftung in Bonn, Germany.

Bio: Prem Devanbu holds a B.Tech from IIT Madras (India), and a Ph.D from Rutgers University. After decades as a research staff member at Bell Labs in New Jersey, he joined UC Davis where he conducts research in software engineering. In 2021, he was awarded the ACM SIGSOFT Outstanding Research Award, in 2022 the Alexander von Humboldt Research Award, and in 2024 the IEEE Computer Society Harlan Mills Award. He serves as co-chair of the Research Articles track of the Communications of the ACM, and is an ACM Fellow.

 

DATE: February 25, Tuesday @ 18:00

Place: EE 01