I am a computer vision researcher. I earned my Master's degree from Koç University in 2025, advised by Fatma Güney (see my thesis). I received my Bachelor's degree from Middle East Technical University in 2022.
My research interests include point tracking, video understanding, and object-centric learning.
This paper introduces Track-On-R, a real-world adaptation of our Track-On family of online point trackers. Using a verifier-guided pseudo-labeling framework that selects reliable trajectories from multiple trackers, Track-On-R learns from unlabeled real-world videos and improves robustness across diverse tracking benchmarks.
This paper introduces Track-On2, which extends our prior Track-On model with architectural refinements, improved memory mechanisms, and enhanced synthetic training strategies. Track-On2 achieves higher FPS and lower memory usage, and despite being online and trained only on synthetic data, it sets a new state of the art across multiple benchmarks.
This paper presents Track-On, a simple transformer-based model for online long-term point tracking, leveraging spatial and context memory to enable frame-by-frame tracking without access to future frames, achieving state-of-the-art performance among all online and offline approaches across multiple datasets.
This paper introduces an approach to enhance Bird's Eye View (BEV) perception in autonomous driving by adapting the DINOv2 with Low Rank Adaptation (LoRA). The method improves robustness to environmental challenges like brightness changes, adverse weather, and camera failures.
We assess the geometric awareness of vision foundation models for long-term point tracking. Our results show that Stable Diffusion and DINOv2 excel in zero-shot settings, with DINOv2 matching supervised models after training in lighter setup, highlighting its potential for correspondence learning.
This paper presents SOLV, the first fully unsupervised technique for segmenting multiple objects in real-world video sequences using an object-centric approach. Through a unique masking strategy and slot merging based on similarity, our method effectively segments varied object classes in YouTube videos.
This paper presents ADAPT, a method for predicting trajectories of all agents in complex traffic scenarios, ensuring both efficiency and accuracy. By utilizing dynamic weight learning and an adaptive head, ADAPT offers superior performance over existing multi-agent methods on the Interaction dataset, with reduced computational demands.
This paper introduces a temporal graph representation to improve predictions of future agent locations in dynamic traffic scenes for self-driving applications.