Publications

ViCaS: A Dataset for Combining Holistic and Pixel-level Video Understanding using Captions with Grounded Segmentation

Recent advances in multimodal large language models (MLLMs) have expanded research in video understanding, primarily focusing on …

Ali Athar, Xueqing Deng, Liang-Chieh Chen

4D-Former: Multimodal 4D Panoptic Segmentation

4D panoptic segmentation is a challenging but practically useful task that requires every point in a LiDAR point-cloud sequence to be …

Ali Athar, Enxu Li, Sergio Casas, Raquel Urtasun

TarViS: A Unified Approach for Target-based Video Segmentation

The general domain of video segmentation is currently fragmented into different tasks spanning multiple benchmarks. Despite rapid …

Ali Athar, Alexander Hermans, Jonathon Luiten, Deva Ramanan, Bastian Leibe

TarViS: A Unified Approach for Target-based Video Segmentation

BURST: A Benchmark for Unifying Object Recognition, Segmentation and Tracking in Video

Multiple existing benchmarks involve tracking and segmenting objects in video e.g., Video Object Segmentation (VOS) and Multi-Object …

Ali Athar, Jonathon Luiten, Paul Voigtlaender, Tarasha Khurana, Achal Dave, Bastian Leibe, Deva Ramanan

BURST: A Benchmark for Unifying Object Recognition, Segmentation and Tracking in Video

HODOR: High-level Object Descriptors for Object Re-segmentation in Video Learned from Static Images

Existing state-of-the-art methods for Video Object Segmentation (VOS) learn low-level pixel-to-pixel correspondences between frames to …

Ali Athar, Jonathon Luiten, Alexander Hermans, Deva Ramanan, Bastian Leibe

HODOR: High-level Object Descriptors for Object Re-segmentation in Video Learned from Static Images

Differentiable Soft-Masked Attention

Transformers have become prevalent in computer vision due to their performance and flexibility in modelling complex operations. Of …

Ali Athar, Jonathon Luiten, Alexander Hermans, Deva Ramanan, Bastian Leibe

D^2-Conv3D: Dynamic Dilated Convolutions for Object Segmentation in Videos

Despite receiving significant attention from the research community, the task of segmenting and tracking objects in monocular videos …

Christian Schmidt, Ali Athar, Sabarinath Mahadevan, Bastian Leibe

A Single-Stage, Bottom-up Approach for Occluded VIS using Spatio-temporal Embeddings

The task of Video Instance Segmentation (VIS) involves segmenting, tracking and classifying all object instances present in a given …

Ali Athar, Sabarinath Mahadevan, Aljosa Osep, Laura Leal-Taixe, Bastian Leibe

Making a Case for 3D Convolutions for Object Segmentation in Videos

The task of object segmentation in videos is usually accomplished by processing appearance and motion information separately using …

Sabarinath Mahadevan, Ali Athar, Aljosa Osep, Sebastian Hennen, Laura Leal-Taixe, Bastian Leibe

STEm-Seg: Spatio-temporal Embeddings for Instance Segmentation in Videos

Existing methods for instance segmentation in videos typically involve multi-stage pipelines that follow the tracking-by-detection …

Ali Athar, Sabarinath Mahadevan, Aljosa Osep, Laura Leal-Taixe, Bastian Leibe

Whole-body motion and footstep planning for humanoid robots with multi-heuristic search

In this paper, we present a motion planning framework for humanoid robots that combines whole-body motions as well as footsteps under a …

Rizwan Asif, Ali Athar, Faisal Mehmood, Fahad Islam, Yasar Ayaz

Whole-body motion planning for humanoid robots with heuristic search

The task of whole-body motion planning for humanoid robots is challenging due to its high-DOF nature, stability constraints, and the …

Ali Athar, Abdul Moeed Zafar, Rizwan Asif, Armaghan Ahmad Khan, Fahad Islam, Yasar Ayaz

Whole-body motion planning for humanoid robots with heuristic search