Joint Geometrical and Statistical Domain Adaptation for Cross-domain Code Vulnerability Detection

Qianjin Du; Shiji Zhou; Xiaohui Kuang; Gang Zhao; Jidong Zhai

doi:10.18653/v1/2023.emnlp-main.788

Joint Geometrical and Statistical Domain Adaptation for Cross-domain Code Vulnerability Detection

Qianjin Du, Shiji Zhou, Xiaohui Kuang, Gang Zhao, Jidong Zhai

Abstract

In code vulnerability detection tasks, a detector trained on a label-rich source domain fails to provide accurate prediction on new or unseen target domains due to the lack of labeled training data on target domains. Previous studies mainly utilize domain adaptation to perform cross-domain vulnerability detection. But they ignore the negative effect of private semantic characteristics of the target domain for domain alignment, which easily causes the problem of negative transfer. In addition, these methods forcibly reduce the distribution discrepancy between domains and do not take into account the interference of irrelevant target instances for distributional domain alignment, which leads to the problem of excessive alignment. To address the above issues, we propose a novel cross-domain code vulnerability detection framework named MNCRI. Specifically, we introduce mutual nearest neighbor contrastive learning to align the source domain and target domain geometrically, which could align the common semantic characteristics of two domains and separate out the private semantic characteristics of each domain. Furthermore, we introduce an instance re-weighting scheme to alleviate the problem of excessive alignment. This scheme dynamically assign different weights to instances, reducing the contribution of irrelevant instances so as to achieve better domain alignment. Finally, extensive experiments demonstrate that MNCRI significantly outperforms state-of-the-art cross-domain code vulnerability detection methods by a large margin.

Anthology ID:: 2023.emnlp-main.788
Volume:: Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Month:: December
Year:: 2023
Address:: Singapore
Editors:: Houda Bouamor, Juan Pino, Kalika Bali
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 12791–12800
Language:
URL:: https://aclanthology.org/2023.emnlp-main.788
DOI:: 10.18653/v1/2023.emnlp-main.788
Bibkey:
Cite (ACL):: Qianjin Du, Shiji Zhou, Xiaohui Kuang, Gang Zhao, and Jidong Zhai. 2023. Joint Geometrical and Statistical Domain Adaptation for Cross-domain Code Vulnerability Detection. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 12791–12800, Singapore. Association for Computational Linguistics.
Cite (Informal):: Joint Geometrical and Statistical Domain Adaptation for Cross-domain Code Vulnerability Detection (Du et al., EMNLP 2023)
Copy Citation:
PDF:: https://aclanthology.org/2023.emnlp-main.788.pdf
Video:: https://aclanthology.org/2023.emnlp-main.788.mp4

PDF Cite Search Video