Obfuscation is rampant in both benign and malicious JavaScript(JS)codes.It generates an obscure and undetectable code that hinders comprehension and analysis.Therefore,accurate detection of JS codes that masquerade as...Obfuscation is rampant in both benign and malicious JavaScript(JS)codes.It generates an obscure and undetectable code that hinders comprehension and analysis.Therefore,accurate detection of JS codes that masquerade as innocuous scripts is vital.The existing deobfuscation methods assume that a specific tool can recover an original JS code entirely.For a multi-layer obfuscation,general tools realize a formatted JS code,but some sections remain encoded.For the detection of such codes,this study performs Deobfuscation,Unpacking,and Decoding(DUD-preprocessing)by function redefinition using a Virtual Machine(VM),a JS code editor,and a python int_to_str()function to facilitate feature learning by the FastText model.The learned feature vectors are passed to a classifier model that judges the maliciousness of a JS code.In performance evaluation,the authors use the Hynek Petrak’s dataset for obfuscated malicious JS codes and the SRILAB dataset and the Majestic Million service top 10,000 websites for obfuscated benign JS codes.They then compare the performance to other models on the detection of DUD-preprocessed obfuscated malicious JS codes.Their experimental results show that the proposed approach enhances feature learning and provides improved accuracy in the detection of obfuscated malicious JS codes.展开更多
基金This research was achieved by the Ministry of Education,Science,Sports,and Culture,Japan,Grant-in-Aid for Scientific Research(B)16H02874the Commissioned Research of National Institute of Information and Communications Technology(NICT),Japan,Grant Number 190.
文摘Obfuscation is rampant in both benign and malicious JavaScript(JS)codes.It generates an obscure and undetectable code that hinders comprehension and analysis.Therefore,accurate detection of JS codes that masquerade as innocuous scripts is vital.The existing deobfuscation methods assume that a specific tool can recover an original JS code entirely.For a multi-layer obfuscation,general tools realize a formatted JS code,but some sections remain encoded.For the detection of such codes,this study performs Deobfuscation,Unpacking,and Decoding(DUD-preprocessing)by function redefinition using a Virtual Machine(VM),a JS code editor,and a python int_to_str()function to facilitate feature learning by the FastText model.The learned feature vectors are passed to a classifier model that judges the maliciousness of a JS code.In performance evaluation,the authors use the Hynek Petrak’s dataset for obfuscated malicious JS codes and the SRILAB dataset and the Majestic Million service top 10,000 websites for obfuscated benign JS codes.They then compare the performance to other models on the detection of DUD-preprocessed obfuscated malicious JS codes.Their experimental results show that the proposed approach enhances feature learning and provides improved accuracy in the detection of obfuscated malicious JS codes.