Tom Drummond: Looking Beyond Two Frames: End-to-End Multi-Object Tracking Using Spatial and Temporal Transformers (with Tianyu Zhu, Markus Hiller, Mahsa Ehsanpour, Rongkai Ma, Ian Reid and Hamid Rezatofighi)

This paper shows how to do multi-object tracking end-to-end with transformers that reason over both space and time, rather than treating tracking as a two-stage detect-then-associate pipeline over pairs of frames. A spatial transformer encodes per-frame object features while a temporal transformer links them across a longer window, enabling the network to recover identities through long occlusions that frame-pair methods fail on. The approach sets new state of the art on MOT17 and MOT20.

[PAMI 2023 paper]

Pages

Tom Drummond

Research Topics

Looking Beyond Two Frames: End-to-End Multi-Object Tracking Using Spatial and Temporal Transformers (with Tianyu Zhu, Markus Hiller, Mahsa Ehsanpour, Rongkai Ma, Ian Reid and Hamid Rezatofighi)

No comments:

Post a Comment