TY - JOUR
T1 - HFFTrack
T2 - Transformer tracking via hybrid frequency features
AU - Ma, Sugang
AU - Wan, Zhen
AU - Zhang, Licheng
AU - Hu, Bin
AU - Zhang, Jinyu
AU - Zhao, Xiangmo
N1 - Publisher Copyright:
© 2025 Elsevier Ltd
PY - 2025/6
Y1 - 2025/6
N2 - Numerous Transformer-based trackers have emerged due to the powerful global modeling capabilities of the Transformer. Nevertheless, the Transformer is a low-pass filter with insufficient capacity to extract high-frequency features of the target and these features are essential for target location in tracking tasks. To address this issue, this paper proposes a tracking algorithm that utilizes hybrid frequency features, which explores how to improve the performance of the tracker by fusing target multi-frequency features. Specifically, a novel feature extraction network is designed that uses CNN and Transformer to learn the multi-frequency features of the target in stages, taking advantage of both structures and balancing high- and low-frequency information. Secondly, a dual-branch encoder is designed to allow the tracker to capture global information while learning the local features of the target through another branch. Finally, a multi-frequency features fusion network is designed that uses wavelet transform and convolution to fuse high-frequency and low-frequency features. Extensive experimental results demonstrate that our tracker achieves superior tracking performance on six challenging benchmark datasets (i.e., LaSOT, TrackingNet, GOT-10k, TNL2K, UAV123, and OTB100).
AB - Numerous Transformer-based trackers have emerged due to the powerful global modeling capabilities of the Transformer. Nevertheless, the Transformer is a low-pass filter with insufficient capacity to extract high-frequency features of the target and these features are essential for target location in tracking tasks. To address this issue, this paper proposes a tracking algorithm that utilizes hybrid frequency features, which explores how to improve the performance of the tracker by fusing target multi-frequency features. Specifically, a novel feature extraction network is designed that uses CNN and Transformer to learn the multi-frequency features of the target in stages, taking advantage of both structures and balancing high- and low-frequency information. Secondly, a dual-branch encoder is designed to allow the tracker to capture global information while learning the local features of the target through another branch. Finally, a multi-frequency features fusion network is designed that uses wavelet transform and convolution to fuse high-frequency and low-frequency features. Extensive experimental results demonstrate that our tracker achieves superior tracking performance on six challenging benchmark datasets (i.e., LaSOT, TrackingNet, GOT-10k, TNL2K, UAV123, and OTB100).
KW - Dual-branch encoder
KW - Hybrid frequency features
KW - Transformer
KW - Visual object tracking
KW - Wavelet transform
UR - http://www.scopus.com/inward/record.url?scp=85218417986&partnerID=8YFLogxK
U2 - 10.1016/j.neunet.2025.107269
DO - 10.1016/j.neunet.2025.107269
M3 - Article
C2 - 39999533
AN - SCOPUS:85218417986
SN - 0893-6080
VL - 186
JO - Neural Networks
JF - Neural Networks
M1 - 107269
ER -