期刊文献+
共找到4篇文章
< 1 >
每页显示 20 50 100
A Development of a Knowledge Management System for Water Situation Reports in Thailand
1
作者 Prattana Deeprasertkul 《Journal of Computer and Communications》 2024年第10期24-36,共13页
The development of a knowledge management system for the National Hydro Data Center of Thailand was described in this paper. The system was created after the major flood event in 2011 to improve water resource managem... The development of a knowledge management system for the National Hydro Data Center of Thailand was described in this paper. The system was created after the major flood event in 2011 to improve water resource management. It addresses the need for easy access to water situation reports, which are crucial for informed decision-making on water usage, allocation, and reservoir management. The system utilizes Optical Character Recognition technique to convert scanned water situation reports into searchable text. It applied FastText and ElasticSearch for advanced search functionalities. FastText identified the documents related to the search query, even with typos or misspelled words. ElasticSearch allows for efficient searching of text data based on relevance. The system also integrates Google Search for additional information access. Therefore, this knowledge management system provides an efficient way to access and analyze water situation data in Thailand. 展开更多
关键词 Optical Character Recognition text similarity Fasttext ElasticSearch
下载PDF
An Enhanced Automatic Arabic Essay Scoring System Based on Machine Learning Algorithms
2
作者 Nourmeen Lotfy Abdulaziz Shehab +1 位作者 Mohammed Elhoseny Ahmed Abu-Elfetouh 《Computers, Materials & Continua》 SCIE EI 2023年第10期1227-1249,共23页
Despite the extensive effort to improve intelligent educational tools for smart learning environments,automatic Arabic essay scoring remains a big research challenge.The nature of the writing style of the Arabic langu... Despite the extensive effort to improve intelligent educational tools for smart learning environments,automatic Arabic essay scoring remains a big research challenge.The nature of the writing style of the Arabic language makes the problem even more complicated.This study designs,implements,and evaluates an automatic Arabic essay scoring system.The proposed system starts with pre-processing the student answer and model answer dataset using data cleaning and natural language processing tasks.Then,it comprises two main components:the grading engine and the adaptive fusion engine.The grading engine employs string-based and corpus-based similarity algorithms separately.After that,the adaptive fusion engine aims to prepare students’scores to be delivered to different feature selection algorithms,such as Recursive Feature Elimination and Boruta.Then,some machine learning algorithms such as Decision Tree,Random Forest,Adaboost,Lasso,Bagging,and K-Nearest Neighbor are employed to improve the suggested system’s efficiency.The experimental results in the grading engine showed that Extracting DIStributionally similar words using the CO-occurrences similarity measure achieved the best correlation values.Furthermore,in the adaptive fusion engine,the Random Forest algorithm outperforms all other machine learning algorithms using the(80%–20%)splitting method on the original dataset.It achieves 91.30%,94.20%,0.023,0.106,and 0.153 in terms of Pearson’s Correlation Coefficient,Willmot’s Index of Agreement,Mean Square Error,Mean Absolute Error,and Root Mean Square Error metrics,respectively. 展开更多
关键词 ARABIC corpus-based similarity CORRELATION machine learning string-based similarity text similarity
下载PDF
Measuring Similarity of Academic Articles with Semantic Profile and Joint Word Embedding 被引量:11
3
作者 Ming Liu Bo Lang +1 位作者 Zepeng Gu Ahmed Zeeshan 《Tsinghua Science and Technology》 SCIE EI CAS CSCD 2017年第6期619-632,共14页
Long-document semantic measurement has great significance in many applications such as semantic searchs, plagiarism detection, and automatic technical surveys. However, research efforts have mainly focused on the sema... Long-document semantic measurement has great significance in many applications such as semantic searchs, plagiarism detection, and automatic technical surveys. However, research efforts have mainly focused on the semantic similarity of short texts. Document-level semantic measurement remains an open issue due to problems such as the omission of background knowledge and topic transition. In this paper, we propose a novel semantic matching method for long documents in the academic domain. To accurately represent the general meaning of an academic article, we construct a semantic profile in which key semantic elements such as the research purpose, methodology, and domain are included and enriched. As such, we can obtain the overall semantic similarity of two papers by computing the distance between their profiles. The distances between the concepts of two different semantic profiles are measured by word vectors. To improve the semantic representation quality of word vectors, we propose a joint word-embedding model for incorporating a domain-specific semantic relation constraint into the traditional context constraint. Our experimental results demonstrate that, in the measurement of document semantic similarity, our approach achieves substantial improvement over state-of-the-art methods, and our joint word-embedding model produces significantly better word representations than traditional word-embedding models. 展开更多
关键词 document semantic similarity text understanding semantic enrichment word embedding scientific literature analysis
原文传递
BDGOA:A bot detection approach for GitHub OAuth Apps
4
作者 Zhifang Liao Xuechun Huang +2 位作者 Bolin Zhang Jinsong Wu Yu Cheng 《Intelligent and Converged Networks》 EI 2023年第3期181-197,共17页
As various software bots are widely used in open source software repositories,some drawbacks are coming to light,such as giving newcomers non-positive feedback and misleading empirical studies of software engineering ... As various software bots are widely used in open source software repositories,some drawbacks are coming to light,such as giving newcomers non-positive feedback and misleading empirical studies of software engineering researchers.Several techniques have been proposed by researchers to perform bot detection,but most of them are limited to identifying bots performing specific activities,let alone distinguishing between GitHub App and OAuth App.In this paper,we propose a bot detection technique for OAuth App,named BDGOA.24 features are used in BDGOA,which can be divided into three dimensions:account information,account activity,and text similarity.To better explore the behavioral features,we define a fine-grained classification of behavioral events and introduce self-similarity to quantify the repeatability of behavioral sequence.We leverage five machine learning classifiers on the benchmark dataset to conduct bot detection,and finally choose random forest as the classifier,which achieves the highest F1-score of 95.83%.The experimental results comparing with the state-of-the-art approaches also demonstrate the superiority of BDGOA. 展开更多
关键词 Github DevBots machine learning text similarity
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部