Open source intelligence is one of the most important public data sources for strategic information analysis. One of the primary and core issues of strategic information research is information perception,so this pape...Open source intelligence is one of the most important public data sources for strategic information analysis. One of the primary and core issues of strategic information research is information perception,so this paper mainly expounds the perception method for strategic information perception in the open source intelligence environment as well as the framework and basic process of information perception. This paper argues that in order to match the information perception result with the information depiction result,it conducts practical exploration for the results of information acquisition,perception,depiction and analysis. This paper introduces and develops a monitoring platform for information perception. The results show that the method proposed in this paper is feasible.展开更多
Law enforcement agencies have a restricted area in which their powers apply,which is called their jurisdiction.These restrictions also apply to the Internet.However,on the Internet,the physical borders of the jurisdic...Law enforcement agencies have a restricted area in which their powers apply,which is called their jurisdiction.These restrictions also apply to the Internet.However,on the Internet,the physical borders of the jurisdiction,typically country borders,are hard to discover.In our case,it is hard to establish whether someone involved in criminal online behavior is indeed a Dutch citizen.We propose a way to overcome the arduous task of manually investigating whether a user on an Internet forum is Dutch or not.More precisely,we aim to detect that a given English text is written by a Dutch native author.To develop a detector,we follow a machine learning approach.Therefore,we need to prepare a specific training corpus.To obtain a corpus that is representative for online forums,we collected a large amount of English forum posts from Dutch and non-Dutch authors on Reddit.To learn a detection model,we used a bag-of-words representation to capture potential misspellings,grammatical errors or unusual turns of phrases that are characteristic of the mother tongue of the authors.For this learning task,we compare the linear support vector machine and regularized logistic regression using the appropriate performance metrics f1 score,precision,and average precision.Our results show logistic regression with frequency-based feature selection performs best at predicting Dutch natives.Further study should be directed to the general applicability of the results that is to find out if the developed models are applicable to other forums with comparable high performance.展开更多
基金Supported by the National Social Science Fund Project(No.18BTQ054)
文摘Open source intelligence is one of the most important public data sources for strategic information analysis. One of the primary and core issues of strategic information research is information perception,so this paper mainly expounds the perception method for strategic information perception in the open source intelligence environment as well as the framework and basic process of information perception. This paper argues that in order to match the information perception result with the information depiction result,it conducts practical exploration for the results of information acquisition,perception,depiction and analysis. This paper introduces and develops a monitoring platform for information perception. The results show that the method proposed in this paper is feasible.
文摘Law enforcement agencies have a restricted area in which their powers apply,which is called their jurisdiction.These restrictions also apply to the Internet.However,on the Internet,the physical borders of the jurisdiction,typically country borders,are hard to discover.In our case,it is hard to establish whether someone involved in criminal online behavior is indeed a Dutch citizen.We propose a way to overcome the arduous task of manually investigating whether a user on an Internet forum is Dutch or not.More precisely,we aim to detect that a given English text is written by a Dutch native author.To develop a detector,we follow a machine learning approach.Therefore,we need to prepare a specific training corpus.To obtain a corpus that is representative for online forums,we collected a large amount of English forum posts from Dutch and non-Dutch authors on Reddit.To learn a detection model,we used a bag-of-words representation to capture potential misspellings,grammatical errors or unusual turns of phrases that are characteristic of the mother tongue of the authors.For this learning task,we compare the linear support vector machine and regularized logistic regression using the appropriate performance metrics f1 score,precision,and average precision.Our results show logistic regression with frequency-based feature selection performs best at predicting Dutch natives.Further study should be directed to the general applicability of the results that is to find out if the developed models are applicable to other forums with comparable high performance.