Seminar in Computer Engineering

Bilkent University
Department of Computer Engineering
CS 590/690 SEMINAR

PoseVLM – Enhancing Vision-Language Models for Manipulation in Constrained Settings

Huzaifa Salahuddin
Master Student
(Supervisor: Asst.Prof.Özgür Salih Öğüz)
Computer Engineering Department
Bilkent University

Abstract: We present a novel Vision-Language Model (VLM) designed to estimate precise placement poses for constrained manipulation tasks, such as placing a book on a shelf or sorting objects into the correct slots, using a combination of scene point clouds and textual instructions. The backbone of our model leverages state-of-the-art VLM, which we enhance through a new loss function, yielding improved performance over the baseline. Additionally, we introduce a saliency-based filtering mechanism, which refines the point cloud after an initial prediction, further reducing error and improving task precision. Our results demonstrate the model's ability to achieve a precise geometric understanding, significantly advancing constrained manipulation capabilities.

DATE: April 14, Thursday @ 14:10 Place: EA 502