The eXtreme gradient boosting(XGBoost)algorithm is used to identify abnormal users.Firstly,the raw data were cleaned.Then user power characteristics were extracted from different aspects.Finally,the XGBoost classifier...The eXtreme gradient boosting(XGBoost)algorithm is used to identify abnormal users.Firstly,the raw data were cleaned.Then user power characteristics were extracted from different aspects.Finally,the XGBoost classifier was used to identify the abnormal users respectively in the balanced sample set and the unbalanced sample set.In contrast,under the same characteristics,the k-nearest neighbor(KNN)classifier,back-propagation(BP)neural network classifier and random forest classifier were used to identify the abnormal users in the two samples.The experimental results show that the XGBoost classifier has higher recognition rate and faster running speed.Especially in the imbalanced data sets,the performance improvement is obvious.展开更多
Purpose: This research aims to identify product search tasks in online shopplng ana analyze the characteristics of consumer multi-tasking search sessions. Design/methodology/approach: The experimental dataset contai...Purpose: This research aims to identify product search tasks in online shopplng ana analyze the characteristics of consumer multi-tasking search sessions. Design/methodology/approach: The experimental dataset contains 8,949 queries of 582 users from 3,483 search sessions. A sequential comparison of the Jaccard similarity coefficient between two adjacent search queries and hierarchical clustering of queries is used to identify search tasks. Findings: (1) Users issued a similar number of queries (1.43 to 1.47) with similar lengths (7.3-7.6 characters) per task in mono-tasking and multi-tasking sessions, and (2) Users spent more time on average in sessions with more tasks, but spent less time for each task when the number of tasks increased in a session. Research limitations: The task identification method that relies only on query terms does not completely reflect the complex nature of consumer shopping behavior.Practical implications: These results provide an exploratory understanding of the relationships among multiple shopping tasks, and can be useful for product recommendation and shopping task prediction. Originality/value: The originality of this research is its use of query clustering with online shopping task identification and analysis, and the analysis of product search session characteristics.展开更多
Information on species composition of an urban forest is essential for its management.However,to obtain this information becomes increasingly difficult due to limited taxonomic expertise.In this study,we tested the po...Information on species composition of an urban forest is essential for its management.However,to obtain this information becomes increasingly difficult due to limited taxonomic expertise.In this study,we tested the possibility of using plant identification applications running on mobile platforms to fill this vacuum.Five plant identification apps were compared for their potential in identifying urban tree species in China.An online survey was conducted to determine the features of apps that contributed to users’satisfaction.The results show that identification accuracy varied significantly among the apps.The best performer achieved an accuracy of 74.6%at the species level,which is comparable to the accuracy by professionals in field surveys.Among the features of apps,accuracy of identification was the most important factor that contributed to users’satisfaction.However,plant identification apps did not perform well when used on rare species or outside of the regions where they have been developed.Results indicate that plant identification apps have great potential in urban forest studies and management,but users need to be cautious when deciding which one to use.展开更多
Smartphones have ubiquitously integrated into our home and work environments,however,users normally rely on explicit but inefficient identification processes in a controlled environment.Therefore,when a device is stol...Smartphones have ubiquitously integrated into our home and work environments,however,users normally rely on explicit but inefficient identification processes in a controlled environment.Therefore,when a device is stolen,a thief can have access to the owner’s personal information and services against the stored passwords.As a result of this potential scenario,this work proposes an automatic legitimate user identification system based on gait biometrics extracted from user walking patterns captured by smartphone sensors.A set of preprocessing schemes are applied to calibrate noisy and invalid samples and augment the gait-induced time and frequency domain features,then further optimized using a non-linear unsupervised feature selection method.The selected features create an underlying gait biometric representation able to discriminate among individuals and identify them uniquely.Different classifiers are adopted to achieve accurate legitimate user identification.Extensive experiments on a group of 16 individuals in an indoor environment show the effectiveness of the proposed solution:with 5 to 70 samples per window,KNN and bagging classifiers achieve 87–99%accuracy,82–98%for ELM,and 81–94%for SVM.The proposed pipeline achieves a 100%true positive and 0%false-negative rate for almost all classifiers.展开更多
In digital fingerprinting, preventing piracy of images by colluders is an important and tedious issue. Each image will be embedded with a unique User IDentification (UID) code that is the fingerprint for tracking th...In digital fingerprinting, preventing piracy of images by colluders is an important and tedious issue. Each image will be embedded with a unique User IDentification (UID) code that is the fingerprint for tracking the authorized user. The proposed hiding scheme makes use of a random number generator to scramble two copies of a UID, which will then be hidden in the randomly selected medium frequency coefficients of the host image. The linear support vector machine (SVM) will be used to train classifications by calculating the normalized correlation (NC) for the 2class UID codes. The trained classifications will be the models used for identifying unreadable UID codes. Experimental results showed that the success of predicting the unreadable UID codes can be increased by applying SVM. The proposed scheme can be used to provide protections to intellectual property rights of digital images aad to keep track of users to prevent collaborative piracies.展开更多
Machine Learning has evolved with a variety of algorithms to enable state-of-the-art computer vision applications.In particular the need for automating the process of real-time food item identification,there is a huge...Machine Learning has evolved with a variety of algorithms to enable state-of-the-art computer vision applications.In particular the need for automating the process of real-time food item identification,there is a huge surge of research so as to make smarter refrigerators.According to a survey by the Food and Agriculture Organization of the United Nations(FAO),it has been found that 1.3 billion tons of food is wasted by consumers around the world due to either food spoilage or expiry and a large amount of food is wasted from homes and restaurants itself.Smart refrigerators have been very successful in playing a pivotal role in mitigating this problem of food wastage.But a major issue is the high cost of available smart refrigerators and the lack of accurate design algorithms which can help achieve computer vision in any ordinary refrigerator.To address these issues,this work proposes an automated identification algorithm for computer vision in smart refrigerators using InceptionV3 and MobileNet Convolutional Neural Network(CNN)architectures.The designed module and algorithm have been elaborated in detail and are considerably evaluated for its accuracy using test images on standard fruits and vegetable datasets.A total of eight test cases are considered with accuracy and training time as the performance metric.In the end,real-time testing results are also presented which validates the system’s performance.展开更多
An approach to generating and optimizing test cases is proposed for Web application testing based on user sessions using genetic algorithm. A large volume of meaningful user sessions are obtained after purging their i...An approach to generating and optimizing test cases is proposed for Web application testing based on user sessions using genetic algorithm. A large volume of meaningful user sessions are obtained after purging their irrelevant information by analyzing user logs on the Web server. Most of the redundant user sessions are also removed by the reduction process. For test reuse and test concurrency, it divides the user sessions obtained into different groups, each of which is called a test suite, and then prioritizes the test suites and the test cases of each test suite. So, the initial test suites and test cases, and their initial executing sequences are achieved. However, the test scheme generated by the elementary prioritization is not much approximate to the best one. Therefore, genetic algorithm is employed to optimize the results of grouping and prioritization. Meanwhile, an approach to generating new test cases is presented using crossover. The new test cases can detect faults caused by the use of possible conflicting data shared by different users.展开更多
在Web应用软件模型表示研究领域中,研究对象主要为不含Ajax技术的应用程序。少数针对Ajax(Asynchronous JavaScript and XML)的模型构建采用传统的FSM模型表示方法,并不能描述客户端消息触发后的参数传递问题;在FSM模型的基础上引入UML...在Web应用软件模型表示研究领域中,研究对象主要为不含Ajax技术的应用程序。少数针对Ajax(Asynchronous JavaScript and XML)的模型构建采用传统的FSM模型表示方法,并不能描述客户端消息触发后的参数传递问题;在FSM模型的基础上引入UML分层模型的表示方法因需要人工干预,不利于测试用例的自动生成。针对上述问题,借鉴EFSM模型这一重要的软件描述模型,从用户的Session数据出发,通过日志数据分析用户的行为,并记录客户端的操作事件。通过对用户行为和客户端操作事件进行匹配,生成完整的用户会话,从而建立EFSM模型。实验结果表明,该EFSM模型能够有效地表示Web应用程序的状态以及状态的变化情况,并可以有效地为测试用例自动生成服务。展开更多
低压台区拓扑信息的准确记录是进行台区线损分析、三相不平衡治理等工作的基础。针对目前拓扑档案排查成本高且效率低的问题,提出一种基于自适应k近邻(adaptive k nearest neighbor,AKNN)异常检验和自适应密度峰值(adaptive density pea...低压台区拓扑信息的准确记录是进行台区线损分析、三相不平衡治理等工作的基础。针对目前拓扑档案排查成本高且效率低的问题,提出一种基于自适应k近邻(adaptive k nearest neighbor,AKNN)异常检验和自适应密度峰值(adaptive density peaks clustering,ADPC)聚类的低压台区拓扑识别方法。该方法利用动态时间弯曲(dynamic time warping,DTW)距离度量低压台区用户间电压序列的相似性,通过AKNN异常检验算法检验并校正异常的用户与变压器之间的关系(简称“户变关系”),在得到正确户变关系的基础上,采用ADPC聚类算法对台区内用户进行相位识别;最后,通过实际台区算例分析验证了该方法不需要人为设置参数,能有效实现低压台区的拓扑识别,具有较高的适用性与准确性。展开更多
基金National Natural Science Foundation of China(No.61262044)
文摘The eXtreme gradient boosting(XGBoost)algorithm is used to identify abnormal users.Firstly,the raw data were cleaned.Then user power characteristics were extracted from different aspects.Finally,the XGBoost classifier was used to identify the abnormal users respectively in the balanced sample set and the unbalanced sample set.In contrast,under the same characteristics,the k-nearest neighbor(KNN)classifier,back-propagation(BP)neural network classifier and random forest classifier were used to identify the abnormal users in the two samples.The experimental results show that the XGBoost classifier has higher recognition rate and faster running speed.Especially in the imbalanced data sets,the performance improvement is obvious.
基金supported by the National Science Foundation of China(NSFC)Grant(No.71373015)
文摘Purpose: This research aims to identify product search tasks in online shopplng ana analyze the characteristics of consumer multi-tasking search sessions. Design/methodology/approach: The experimental dataset contains 8,949 queries of 582 users from 3,483 search sessions. A sequential comparison of the Jaccard similarity coefficient between two adjacent search queries and hierarchical clustering of queries is used to identify search tasks. Findings: (1) Users issued a similar number of queries (1.43 to 1.47) with similar lengths (7.3-7.6 characters) per task in mono-tasking and multi-tasking sessions, and (2) Users spent more time on average in sessions with more tasks, but spent less time for each task when the number of tasks increased in a session. Research limitations: The task identification method that relies only on query terms does not completely reflect the complex nature of consumer shopping behavior.Practical implications: These results provide an exploratory understanding of the relationships among multiple shopping tasks, and can be useful for product recommendation and shopping task prediction. Originality/value: The originality of this research is its use of query clustering with online shopping task identification and analysis, and the analysis of product search session characteristics.
基金supported financially by China National Natural Science Foundation(grant number 31570458)Microsoft Research Lab-Asia(grant number 041902008).
文摘Information on species composition of an urban forest is essential for its management.However,to obtain this information becomes increasingly difficult due to limited taxonomic expertise.In this study,we tested the possibility of using plant identification applications running on mobile platforms to fill this vacuum.Five plant identification apps were compared for their potential in identifying urban tree species in China.An online survey was conducted to determine the features of apps that contributed to users’satisfaction.The results show that identification accuracy varied significantly among the apps.The best performer achieved an accuracy of 74.6%at the species level,which is comparable to the accuracy by professionals in field surveys.Among the features of apps,accuracy of identification was the most important factor that contributed to users’satisfaction.However,plant identification apps did not perform well when used on rare species or outside of the regions where they have been developed.Results indicate that plant identification apps have great potential in urban forest studies and management,but users need to be cautious when deciding which one to use.
文摘Smartphones have ubiquitously integrated into our home and work environments,however,users normally rely on explicit but inefficient identification processes in a controlled environment.Therefore,when a device is stolen,a thief can have access to the owner’s personal information and services against the stored passwords.As a result of this potential scenario,this work proposes an automatic legitimate user identification system based on gait biometrics extracted from user walking patterns captured by smartphone sensors.A set of preprocessing schemes are applied to calibrate noisy and invalid samples and augment the gait-induced time and frequency domain features,then further optimized using a non-linear unsupervised feature selection method.The selected features create an underlying gait biometric representation able to discriminate among individuals and identify them uniquely.Different classifiers are adopted to achieve accurate legitimate user identification.Extensive experiments on a group of 16 individuals in an indoor environment show the effectiveness of the proposed solution:with 5 to 70 samples per window,KNN and bagging classifiers achieve 87–99%accuracy,82–98%for ELM,and 81–94%for SVM.The proposed pipeline achieves a 100%true positive and 0%false-negative rate for almost all classifiers.
文摘In digital fingerprinting, preventing piracy of images by colluders is an important and tedious issue. Each image will be embedded with a unique User IDentification (UID) code that is the fingerprint for tracking the authorized user. The proposed hiding scheme makes use of a random number generator to scramble two copies of a UID, which will then be hidden in the randomly selected medium frequency coefficients of the host image. The linear support vector machine (SVM) will be used to train classifications by calculating the normalized correlation (NC) for the 2class UID codes. The trained classifications will be the models used for identifying unreadable UID codes. Experimental results showed that the success of predicting the unreadable UID codes can be increased by applying SVM. The proposed scheme can be used to provide protections to intellectual property rights of digital images aad to keep track of users to prevent collaborative piracies.
基金This work was supported by Taif University Researchers Supporting Project(TURSP)under number(TURSP-2020/10),Taif University,Taif,Saudi Arabia.
文摘Machine Learning has evolved with a variety of algorithms to enable state-of-the-art computer vision applications.In particular the need for automating the process of real-time food item identification,there is a huge surge of research so as to make smarter refrigerators.According to a survey by the Food and Agriculture Organization of the United Nations(FAO),it has been found that 1.3 billion tons of food is wasted by consumers around the world due to either food spoilage or expiry and a large amount of food is wasted from homes and restaurants itself.Smart refrigerators have been very successful in playing a pivotal role in mitigating this problem of food wastage.But a major issue is the high cost of available smart refrigerators and the lack of accurate design algorithms which can help achieve computer vision in any ordinary refrigerator.To address these issues,this work proposes an automated identification algorithm for computer vision in smart refrigerators using InceptionV3 and MobileNet Convolutional Neural Network(CNN)architectures.The designed module and algorithm have been elaborated in detail and are considerably evaluated for its accuracy using test images on standard fruits and vegetable datasets.A total of eight test cases are considered with accuracy and training time as the performance metric.In the end,real-time testing results are also presented which validates the system’s performance.
文摘An approach to generating and optimizing test cases is proposed for Web application testing based on user sessions using genetic algorithm. A large volume of meaningful user sessions are obtained after purging their irrelevant information by analyzing user logs on the Web server. Most of the redundant user sessions are also removed by the reduction process. For test reuse and test concurrency, it divides the user sessions obtained into different groups, each of which is called a test suite, and then prioritizes the test suites and the test cases of each test suite. So, the initial test suites and test cases, and their initial executing sequences are achieved. However, the test scheme generated by the elementary prioritization is not much approximate to the best one. Therefore, genetic algorithm is employed to optimize the results of grouping and prioritization. Meanwhile, an approach to generating new test cases is presented using crossover. The new test cases can detect faults caused by the use of possible conflicting data shared by different users.
文摘在Web应用软件模型表示研究领域中,研究对象主要为不含Ajax技术的应用程序。少数针对Ajax(Asynchronous JavaScript and XML)的模型构建采用传统的FSM模型表示方法,并不能描述客户端消息触发后的参数传递问题;在FSM模型的基础上引入UML分层模型的表示方法因需要人工干预,不利于测试用例的自动生成。针对上述问题,借鉴EFSM模型这一重要的软件描述模型,从用户的Session数据出发,通过日志数据分析用户的行为,并记录客户端的操作事件。通过对用户行为和客户端操作事件进行匹配,生成完整的用户会话,从而建立EFSM模型。实验结果表明,该EFSM模型能够有效地表示Web应用程序的状态以及状态的变化情况,并可以有效地为测试用例自动生成服务。
文摘低压台区拓扑信息的准确记录是进行台区线损分析、三相不平衡治理等工作的基础。针对目前拓扑档案排查成本高且效率低的问题,提出一种基于自适应k近邻(adaptive k nearest neighbor,AKNN)异常检验和自适应密度峰值(adaptive density peaks clustering,ADPC)聚类的低压台区拓扑识别方法。该方法利用动态时间弯曲(dynamic time warping,DTW)距离度量低压台区用户间电压序列的相似性,通过AKNN异常检验算法检验并校正异常的用户与变压器之间的关系(简称“户变关系”),在得到正确户变关系的基础上,采用ADPC聚类算法对台区内用户进行相位识别;最后,通过实际台区算例分析验证了该方法不需要人为设置参数,能有效实现低压台区的拓扑识别,具有较高的适用性与准确性。