Abstract
Existing visual tracking algorithms have made impressive progress by leveraging the powerful global modeling capabilities of Transformers. However, these approaches typically focus on designing complex network models while neglecting temporal information and scale variations. These limitation makes them susceptible to tracking failures caused by target occlusion and deformation. Additionally, most trackers adopt ViT-based attention mechanisms. These trackers rely entirely on input images and lack task-relevant prior knowledge about the target. To address these issues, this paper proposes SSTrack, a novel visual tracking algorithm that integrates scale-aware temporal prompts and a spatio-temporal prior Transformer. Specifically, a scale-aware temporal information propagation mechanism is first designed, which allows the tracker to enable the model to learn the scale changes of the target between the preceding and following frames by propagating multi-scale temporal prompts across consecutive frames. Furthermore, we introduce a spatio-temporal prior module to provide the tracker with spatio-temporal prior knowledge of the target locations and appearances, combing spatio-temporal prior module with the self-attention module. Extensive experiments on seven benchmark datasets, including LaSOT, TrackingNet, and GOT-10k, demonstrate the superior tracking performance of SSTrack. The code and pre-trained models will be available at here.
| Original language | English |
|---|---|
| Article number | 115370 |
| Journal | Knowledge-Based Systems |
| Volume | 337 |
| DOIs | |
| State | Published - 25 Mar 2026 |
Keywords
- Scale-aware
- Spatio-temporal prior
- Temporal prompts
- Transformer
- Visual object tracking
Fingerprint
Dive into the research topics of 'SSTrack: Joint scale-aware temporal prompts and spatio-temporal prior transformer for visual object tracking'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver