TY - JOUR
T1 - DLNet
T2 - A Dual-Level Network with Self- and Cross-Attention for High-Resolution Remote Sensing Segmentation
AU - Meng, Weijun
AU - Shan, Lianlei
AU - Ma, Sugang
AU - Liu, Dan
AU - Hu, Bin
N1 - Publisher Copyright:
© 2025 by the authors.
PY - 2025/4
Y1 - 2025/4
N2 - With advancements in remote sensing technologies, high-resolution imagery has become increasingly accessible, supporting applications in urban planning, environmental monitoring, and precision agriculture. However, semantic segmentation of such imagery remains challenging due to complex spatial structures, fine-grained details, and land cover variations. Existing methods often struggle with ineffective feature representation, suboptimal fusion of global and local information, and high computational costs, limiting segmentation accuracy and efficiency. To address these challenges, we propose the dual-level network (DLNet), an enhanced framework incorporating self-attention and cross-attention mechanisms for improved multi-scale feature extraction and fusion. The self-attention module captures long-range dependencies to enhance contextual understanding, while the cross-attention module facilitates bidirectional interaction between global and local features, improving spatial coherence and segmentation quality. Additionally, DLNet optimizes computational efficiency by balancing feature refinement and memory consumption, making it suitable for large-scale remote sensing applications. Extensive experiments on benchmark datasets, including DeepGlobe and Inria Aerial, demonstrate that DLNet achieves state-of-the-art segmentation accuracy while maintaining computational efficiency. On the DeepGlobe dataset, DLNet achieves a (Formula presented.) mean intersection over union (mIoU), outperforming existing models such as GLNet ( (Formula presented.) ) and EHSNet ( (Formula presented.) ), while requiring lower memory (1443 MB) and maintaining a competitive inference speed of 518.3 ms per image. On the Inria Aerial dataset, DLNet attains an mIoU of (Formula presented.), surpassing GLNet ( (Formula presented.) ) while reducing computational cost and achieving an inference speed of 119.4 ms per image. These results highlight DLNet’s effectiveness in achieving precise and efficient segmentation in high-resolution remote sensing imagery.
AB - With advancements in remote sensing technologies, high-resolution imagery has become increasingly accessible, supporting applications in urban planning, environmental monitoring, and precision agriculture. However, semantic segmentation of such imagery remains challenging due to complex spatial structures, fine-grained details, and land cover variations. Existing methods often struggle with ineffective feature representation, suboptimal fusion of global and local information, and high computational costs, limiting segmentation accuracy and efficiency. To address these challenges, we propose the dual-level network (DLNet), an enhanced framework incorporating self-attention and cross-attention mechanisms for improved multi-scale feature extraction and fusion. The self-attention module captures long-range dependencies to enhance contextual understanding, while the cross-attention module facilitates bidirectional interaction between global and local features, improving spatial coherence and segmentation quality. Additionally, DLNet optimizes computational efficiency by balancing feature refinement and memory consumption, making it suitable for large-scale remote sensing applications. Extensive experiments on benchmark datasets, including DeepGlobe and Inria Aerial, demonstrate that DLNet achieves state-of-the-art segmentation accuracy while maintaining computational efficiency. On the DeepGlobe dataset, DLNet achieves a (Formula presented.) mean intersection over union (mIoU), outperforming existing models such as GLNet ( (Formula presented.) ) and EHSNet ( (Formula presented.) ), while requiring lower memory (1443 MB) and maintaining a competitive inference speed of 518.3 ms per image. On the Inria Aerial dataset, DLNet attains an mIoU of (Formula presented.), surpassing GLNet ( (Formula presented.) ) while reducing computational cost and achieving an inference speed of 119.4 ms per image. These results highlight DLNet’s effectiveness in achieving precise and efficient segmentation in high-resolution remote sensing imagery.
KW - cross-attention
KW - high-resolution imagery
KW - remote sensing
KW - self-attention
KW - semantic segmentation
UR - https://www.scopus.com/pages/publications/105002278269
U2 - 10.3390/rs17071119
DO - 10.3390/rs17071119
M3 - Article
AN - SCOPUS:105002278269
SN - 2072-4292
VL - 17
JO - Remote Sensing
JF - Remote Sensing
IS - 7
M1 - 1119
ER -