Skip to main navigation Skip to search Skip to main content

SSTrack: Joint scale-aware temporal prompts and spatio-temporal prior transformer for visual object tracking

  • Sugang Ma
  • , Zhen Wan
  • , Bin Hu
  • , Jinyu Zhang
  • , Zhiqiang Hou
  • , Xiangmo Zhao

Research output: Contribution to journalArticlepeer-review

Abstract

Existing visual tracking algorithms have made impressive progress by leveraging the powerful global modeling capabilities of Transformers. However, these approaches typically focus on designing complex network models while neglecting temporal information and scale variations. These limitation makes them susceptible to tracking failures caused by target occlusion and deformation. Additionally, most trackers adopt ViT-based attention mechanisms. These trackers rely entirely on input images and lack task-relevant prior knowledge about the target. To address these issues, this paper proposes SSTrack, a novel visual tracking algorithm that integrates scale-aware temporal prompts and a spatio-temporal prior Transformer. Specifically, a scale-aware temporal information propagation mechanism is first designed, which allows the tracker to enable the model to learn the scale changes of the target between the preceding and following frames by propagating multi-scale temporal prompts across consecutive frames. Furthermore, we introduce a spatio-temporal prior module to provide the tracker with spatio-temporal prior knowledge of the target locations and appearances, combing spatio-temporal prior module with the self-attention module. Extensive experiments on seven benchmark datasets, including LaSOT, TrackingNet, and GOT-10k, demonstrate the superior tracking performance of SSTrack. The code and pre-trained models will be available at here.

Original languageEnglish
Article number115370
JournalKnowledge-Based Systems
Volume337
DOIs
StatePublished - 25 Mar 2026

Keywords

  • Scale-aware
  • Spatio-temporal prior
  • Temporal prompts
  • Transformer
  • Visual object tracking

Fingerprint

Dive into the research topics of 'SSTrack: Joint scale-aware temporal prompts and spatio-temporal prior transformer for visual object tracking'. Together they form a unique fingerprint.

Cite this