A good machine learning model would greatly contribute to an accurate crime prediction. Thus, researchers select advanced models more frequently than basic models. To find out whether advanced models have a prominent ...A good machine learning model would greatly contribute to an accurate crime prediction. Thus, researchers select advanced models more frequently than basic models. To find out whether advanced models have a prominent advantage, this study focuses shift from obtaining crime prediction to on comparing model performance between these two types of models on crime prediction. In this study, we aimed to predict burglary occurrence in Los Angeles City, and compared a basic model just using prior year burglary occurrence with advanced models including linear regressor and random forest regressor. In addition, American Community Survey data was used to provide neighborhood level socio-economic features. After finishing data preprocessing steps that regularize the dataset, recursive feature elimination was utilized to determine the final features and the parameters of the two advanced models. Finally, to find out the best fit model, three metrics were used to evaluate model performance: R squared, adjusted R squared and mean squared error. The results indicate that linear regressor is the most suitable model among three models applied in the study with a slightly smaller mean squared error than that of basic model, whereas random forest model performed worse than the basic model. With a much more complex learning steps, advanced models did not show prominent advantages, and further research to extend the current study were discussed.展开更多
针对如何解决中文司法事件检测中触发词与上下文关系不足以判定事件实例、案件触发词表述相似以及同一个案件中多个触发词识别和分类模糊的问题,本研究提出一种基于多头指针的司法事件检测方法。首先,该方法将上下文信息和罪名特征融合...针对如何解决中文司法事件检测中触发词与上下文关系不足以判定事件实例、案件触发词表述相似以及同一个案件中多个触发词识别和分类模糊的问题,本研究提出一种基于多头指针的司法事件检测方法。首先,该方法将上下文信息和罪名特征融合作为输入,使用双向长短期记忆(Bi-directional Long Short-Term Memory,BiLSTM)网络捕获数据依赖关系,深入提取特征;然后,使用多头指针网络对字符间的依赖关系进行建模,有效捕捉句子中的触发词;最后,利用指针标注技术抽取触发词,实现司法事件的有效检测。在公开司法数据集LEVEN上实验验证该方法的有效性,其中微平均和宏平均的F1指标达到了87.53%和78.05%,优于现有模型。该方法不仅显著提高了事件触发词的识别精度,而且也增强了对复杂司法文本中事件上下文关系的把握能力。展开更多
文摘A good machine learning model would greatly contribute to an accurate crime prediction. Thus, researchers select advanced models more frequently than basic models. To find out whether advanced models have a prominent advantage, this study focuses shift from obtaining crime prediction to on comparing model performance between these two types of models on crime prediction. In this study, we aimed to predict burglary occurrence in Los Angeles City, and compared a basic model just using prior year burglary occurrence with advanced models including linear regressor and random forest regressor. In addition, American Community Survey data was used to provide neighborhood level socio-economic features. After finishing data preprocessing steps that regularize the dataset, recursive feature elimination was utilized to determine the final features and the parameters of the two advanced models. Finally, to find out the best fit model, three metrics were used to evaluate model performance: R squared, adjusted R squared and mean squared error. The results indicate that linear regressor is the most suitable model among three models applied in the study with a slightly smaller mean squared error than that of basic model, whereas random forest model performed worse than the basic model. With a much more complex learning steps, advanced models did not show prominent advantages, and further research to extend the current study were discussed.
文摘针对如何解决中文司法事件检测中触发词与上下文关系不足以判定事件实例、案件触发词表述相似以及同一个案件中多个触发词识别和分类模糊的问题,本研究提出一种基于多头指针的司法事件检测方法。首先,该方法将上下文信息和罪名特征融合作为输入,使用双向长短期记忆(Bi-directional Long Short-Term Memory,BiLSTM)网络捕获数据依赖关系,深入提取特征;然后,使用多头指针网络对字符间的依赖关系进行建模,有效捕捉句子中的触发词;最后,利用指针标注技术抽取触发词,实现司法事件的有效检测。在公开司法数据集LEVEN上实验验证该方法的有效性,其中微平均和宏平均的F1指标达到了87.53%和78.05%,优于现有模型。该方法不仅显著提高了事件触发词的识别精度,而且也增强了对复杂司法文本中事件上下文关系的把握能力。