As an introductory course for the emerging major of big data management and application,“Introduction to Big Data”has not yet formed a curriculum standard and implementation plan that is widely accepted and used by ...As an introductory course for the emerging major of big data management and application,“Introduction to Big Data”has not yet formed a curriculum standard and implementation plan that is widely accepted and used by everyone.To this end,we discuss some of our explorations and attempts in the construction and teaching process of big data courses for the major of big data management and application from the perspective of course planning,course implementation,and course summary.After interviews with students and feedback from questionnaires,students are highly satisfied with some of the teaching measures and programs currently adopted.展开更多
The Internet of Vehicles(IoV)is extensively deployed in outdoor and open environments to effectively address traffic efficiency and safety issues by connecting vehicles to the network.However,due to the open and varia...The Internet of Vehicles(IoV)is extensively deployed in outdoor and open environments to effectively address traffic efficiency and safety issues by connecting vehicles to the network.However,due to the open and variable nature of its network topology,vehicles frequently engage in cross-domain interactions.During such processes,directly uploading sensitive information to roadside units for interaction may expose it to malicious tampering or interception by attackers,thus compromising the security of the cross-domain authentication process.Additionally,IoV imposes high real-time requirements,and existing cross-domain authentication schemes for IoV often encounter efficiency issues.To mitigate these challenges,we propose CAIoV,a blockchain-based efficient cross-domain authentication scheme for IoV.This scheme comprehensively integrates technologies such as zero-knowledge proofs,smart contracts,and Merkle hash tree structures.It divides the cross-domain process into anonymous cross-domain authentication and safe cross-domain authentication phases to ensure efficiency while maintaining a balance between efficiency and security.Finally,we evaluate the performance of CAIoV.Experimental results demonstrate that our proposed scheme reduces computational overhead by approximately 20%,communication overhead by around 10%,and storage overhead by nearly 30%.展开更多
Social network is the mainstream medium of current information dissemination,and it is particularly important to accurately predict its propagation law.In this paper,we introduce a social network propagation model int...Social network is the mainstream medium of current information dissemination,and it is particularly important to accurately predict its propagation law.In this paper,we introduce a social network propagation model integrating multiple linear regression and infectious disease model.Firstly,we proposed the features that affect social network communication from three dimensions.Then,we predicted the node influence via multiple linear regression.Lastly,we used the node influence as the state transition of the infectious disease model to predict the trend of information dissemination in social networks.The experimental results on a real social network dataset showed that the prediction results of the model are consistent with the actual information dissemination trends.展开更多
Near infrared(NIR)spectrum analysis technology has outstanding advantages such as rapid,nondestructive,pollution-free,and is widely used in food,pharmaceutical,petrochemical,agricultural products production and testin...Near infrared(NIR)spectrum analysis technology has outstanding advantages such as rapid,nondestructive,pollution-free,and is widely used in food,pharmaceutical,petrochemical,agricultural products production and testing industries.Convolutional neural network(CNN)is one of the most successful methods in big data analysis because of its powerful feature ex-traction and abstraction ability,and it is especially suitable for solving multi-classification problems.CNN-based transfer learning is a machine learning technique,which migrates para-meters of trained model to the new one to improve the performance.The transfer learning strategy can speed up the learning efficiency of the model instead of learning from scratch.In view of the difficulty in acquisition of drug NIR spectral data and high labeling cost,this paper proposes three simple but very effective transfer learning methods for multi-manufacturer identification of drugs based on one-dimensional CNN.Compared with the original CNN,the transfer learning method can achieve better classification performance with fewer NIR spectral data,which greatly reduces the dependence on labeled NIR spectral data.At the same time,this paper also compares and discusses three different transfer learning methods,and selects the most suitable transfer learning model for drug NIR spectral data analysis.Compared with the current popular methods,such as SVM,BP,AE and ELM,the proposed method achieves higher classification accuracy and scalability in multi-variety and multi-manufacturer NIR spectrum classification experiments.展开更多
The drug supervision methods based on near-infrared spectroscopy analysis are heavily dependent on the chemometrics model which characterizes the relationship between spectral data and drug categories.The preliminary ...The drug supervision methods based on near-infrared spectroscopy analysis are heavily dependent on the chemometrics model which characterizes the relationship between spectral data and drug categories.The preliminary application of convolution neural network in spectral analysis demonstrates excellent end-to-end prediction ability,but it is sensitive to the hyper-parameters of the network.The transformer is a deep-learning model based on self-attention mechanism that compares convolutional neural networks(CNNs)in predictive performance and has an easy-todesign model structure.Hence,a novel calibration model named SpectraTr,based on the transformer structure,is proposed and used for the qualitative analysis of drug spectrum.The experimental results of seven classes of drug and 18 classes of drug show that the proposed SpectraTr model can automatically extract features from a huge number of spectra,is not dependent on pre-processing algorithms,and is insensitive to model hyperparameters.When the ratio of the training set to test set is 8:2,the prediction accuracy of the SpectraTr model reaches 100%and 99.52%,respectively,which outperforms PLS DA,SVM,SAE,and CNN.The model is also tested on a public drug data set,and achieved classification accuracy of 96.97%without preprocessing algorithm,which is 34.85%,28.28%,5.05%,and 2.73%higher than PLS DA,SVM,SAE,and CNN,respectively.The research shows that the SpectraTr model performs exceptionally well in spectral analysis and is expected to be a novel deep calibration model after Autoencoder networks(AEs)and CNN.展开更多
It is an effective means for merchants to carry out precision marketing and improve ROI by using historical user behavior data obtained from promotional activities in order to build a model to predict the repeat purch...It is an effective means for merchants to carry out precision marketing and improve ROI by using historical user behavior data obtained from promotional activities in order to build a model to predict the repeat purchase behavior of users after promotional activities.Most of the existing prediction models are supervised learning,which does not work well with a small amount of labeled data.This paper proposes a BERT-MLP prediction model that uses“large-scale data unsupervised pre-training+small amount of labeled data fine-tuning.”The experimental results on Alibaba real dataset show that the accuracy of the BERT-MLP model is better than the baseline model.展开更多
The unmanned aerial vehicle(UAV)self-organizing network is composed of multiple UAVs with autonomous capabilities according to a certain structure and scale,which can quickly and accurately complete complex tasks such...The unmanned aerial vehicle(UAV)self-organizing network is composed of multiple UAVs with autonomous capabilities according to a certain structure and scale,which can quickly and accurately complete complex tasks such as path planning,situational awareness,and information transmission.Due to the openness of the network,the UAV cluster is more vulnerable to passive eavesdropping,active interference,and other attacks,which makes the system face serious security threats.This paper proposes a Blockchain-Based Data Acquisition(BDA)scheme with privacy protection to address the data privacy and identity authentication problems in the UAV-assisted data acquisition scenario.Each UAV cluster has an aggregate unmanned aerial vehicle(AGV)that can batch-verify the acquisition reports within its administrative domain.After successful verification,AGV adds its signcrypted ciphertext to the aggregation and uploads it to the blockchain for storage.There are two chains in the blockchain that store the public key information of registered entities and the aggregated reports,respectively.The security analysis shows that theBDAconstruction can protect the privacy and authenticity of acquisition data,and effectively resist a malicious key generation center and the public-key substitution attack.It also provides unforgeability to acquisition reports under the Elliptic Curve Discrete Logarithm Problem(ECDLP)assumption.The performance analysis demonstrates that compared with other schemes,the proposed BDA construction has lower computational complexity and is more suitable for the UAV cluster network with limited computing power and storage capacity.展开更多
Many researchers have applied clustering to handle semi-supervised classification of data streams with concept drifts.However,the generalization ability for each specific concept cannot be steadily improved,and the co...Many researchers have applied clustering to handle semi-supervised classification of data streams with concept drifts.However,the generalization ability for each specific concept cannot be steadily improved,and the concept drift detection method without considering the local structural information of data cannot accurately detect concept drifts.This paper proposes to solve these problems by BIRCH(Balanced Iterative Reducing and Clustering Using Hierarchies)ensemble and local structure mapping.The local structure mapping strategy is utilized to compute local similarity around each sample and combined with semi-supervised Bayesian method to perform concept detection.If a recurrent concept is detected,a historical BIRCH ensemble classifier is selected to be incrementally updated;otherwise a new BIRCH ensemble classifier is constructed and added into the classifier pool.The extensive experiments on several synthetic and real datasets demonstrate the advantage of the proposed algorithm.展开更多
The proliferation of the global datasphere has forced cloud storage systems to evolve more complex architectures for different applications.The emergence of these application session requests and system daemon service...The proliferation of the global datasphere has forced cloud storage systems to evolve more complex architectures for different applications.The emergence of these application session requests and system daemon services has created large persistent flows with diverse performance requirements that need to coexist with other types of traffic.Current routing methods such as equal-cost multipath(ECMP)and Hedera do not take into consideration specific traffic characteristics nor performance requirements,which make these methods difficult to meet the quality of service(QoS)for high-priority flows.In this paper,we tailored the best routing for different kinds of cloud storage flows as an integer programming problem and utilized grey relational analysis(GRA)to solve this optimization problem.The resulting method is a GRAbased service-aware flow scheduling(GRSA)framework that considers requested flow types and network status to select appropriate routing paths for flows in cloud storage datacenter networks.The results from experiments carried out on a real traffic trace show that the proposed GRSA method can better balance traffic loads,conserve table space and reduce the average transmission delay for high-priority flows compared to ECMP and Hedera.展开更多
In the Internet of Things(IoT), various battery-powered wireless devices are connected to collect and exchange data, and typical traffic is periodic and heterogeneous. Polling with power management is a very promisi...In the Internet of Things(IoT), various battery-powered wireless devices are connected to collect and exchange data, and typical traffic is periodic and heterogeneous. Polling with power management is a very promising technique that can be used for communication among these devices in the IoT. In this paper, we propose a novel and scalable model to study the delay and the power consumption performance for polling schemes with power management under heterogeneous settings(particularly the heterogeneous sleeping interval). In our model,by introducing the concept of virtual polling interval, we successfully convert the considered energy-efficient polling scheme into an equivalent purely-limited vacation system. Thus, we can easily evaluate the mean and variance of the delay and the power consumption by applying existing queueing formulae, without developing a new theoretical model as required in previous works. Extensive simulations show that our analytical results are very accurate for both homogeneous and heterogeneous settings.展开更多
Sometimes user has the requirement to run a high bandwidth application over a low bandwidth network. But its implementation is not easy as the traditional network transmits data with only one path where its bandwidth ...Sometimes user has the requirement to run a high bandwidth application over a low bandwidth network. But its implementation is not easy as the traditional network transmits data with only one path where its bandwidth is lower than the demand. Although the current network technology like SDN has the ability to precisely control the data transmission in the network, but till now the standard openflow protocol does not support splitting one flow to multiple flows. In this paper, a flow splitting algorithm is proposed. The algorithm splits a data flow to multiple sub-flows by extending the openflow protocol. A multiple paths routing algorithm is also proposed to implement the multi-path parallel transmission in the paper. The algorithm selects multiple paths and minimizes the cost of transmission under the constraint of maximum delay and delay variance. The simulations show the algorithms can significantly improve the transmission performance.展开更多
Closely related to the safety and stability of power grids,stability analysis has long been a core topic in the electric industry.Conventional approaches employ computational simulation to make the quantitative judgem...Closely related to the safety and stability of power grids,stability analysis has long been a core topic in the electric industry.Conventional approaches employ computational simulation to make the quantitative judgement of the grid stability under distinctive conditions.The lack of in-depth data analysis tools has led to the difficulty in analytical tasks such as situation-aware analysis,instability reasoning and pattern recognition.To facilitate visual exploration and reasoning on the simulation data,we introduce WaveLines,a visual analysis approach which supports the supervisory control of multivariate simulation time series of power grids.We design and implement an interactive system that supports a set of analytical tasks proposed by domain experts and experienced operators.Experiments have been conducted with domain experts to illustrate the usability and effectiveness of WaveLines.展开更多
Existing interference protection systems lack automatic evaluation methods to provide scientific, objective and accurate assessment results. To address this issue, this paper develops a layout scheme by geometrically ...Existing interference protection systems lack automatic evaluation methods to provide scientific, objective and accurate assessment results. To address this issue, this paper develops a layout scheme by geometrically modeling the actual scene, so that the hand-held full-band spectrum analyzer would be able to collect signal field strength values for indoor complex scenes. An improved prediction algorithm based on the K-nearest neighbor non-parametric kernel regression was proposed to predict the signal field strengths for the whole plane before and after being shield. Then the highest accuracy set of data could be picked out by comparison. The experimental results show that the improved prediction algorithm based on the K-nearest neighbor non-parametric kernel regression can scientifically and objectively predict the indoor complex scenes’ signal strength and evaluate the interference protection with high accuracy.展开更多
The remote data integrity auditing technology can guarantee the integrity of outsourced data in clouds. Users can periodically run an integrity auditing protocol by interacting with cloud server, to verify the latest ...The remote data integrity auditing technology can guarantee the integrity of outsourced data in clouds. Users can periodically run an integrity auditing protocol by interacting with cloud server, to verify the latest status of outsourced data. Integrity auditing requires user to take massive time-consuming computations, which would not be affordable by weak devices. In this paper, we propose a privacy-preserving TPA-aided remote data integrity auditing scheme based on Li et al.’s data integrity auditing scheme without bilinear pairings, where a third party auditor (TPA) is employed to perform integrity auditing on outsourced data for users. The privacy of outsourced data can be guaranteed against TPA in the sense that TPA could not infer its contents from the returned proofs in the integrity auditing phase. Our construction is as efficient as Li et al.’s scheme, that is, each procedure takes the same time-consuming operations in both schemes, and our solution does not increase the sizes of processed data, challenge and proof.展开更多
Retinal images play an essential role in the early diagnosis of ophthalmic diseases.Automatic segmentation of retinal vessels in color fundus images is challenging due to the morphological differences between the reti...Retinal images play an essential role in the early diagnosis of ophthalmic diseases.Automatic segmentation of retinal vessels in color fundus images is challenging due to the morphological differences between the retinal vessels and the low-contrast background.At the same time,automated models struggle to capture representative and discriminative retinal vascular features.To fully utilize the structural information of the retinal blood vessels,we propose a novel deep learning network called Pre-Activated Convolution Residual and Triple Attention Mechanism Network(PCRTAM-Net).PCRTAM-Net uses the pre-activated dropout convolution residual method to improve the feature learning ability of the network.In addition,the residual atrous convolution spatial pyramid is integrated into both ends of the network encoder to extract multiscale information and improve blood vessel information flow.A triple attention mechanism is proposed to extract the structural information between vessel contexts and to learn long-range feature dependencies.We evaluate the proposed PCRTAM-Net on four publicly available datasets,DRIVE,CHASE_DB1,STARE,and HRF.Our model achieves state-of-the-art performance of 97.10%,97.70%,97.68%,and 97.14%for ACC and 83.05%,82.26%,84.64%,and 81.16%for F1,respectively.展开更多
Recently, Massive Open Online Courses(MOOCs) have become a major online learning methodology for millions of people worldwide. However, the dropout rates from several current MOOCs are high. Usually, dropout predictio...Recently, Massive Open Online Courses(MOOCs) have become a major online learning methodology for millions of people worldwide. However, the dropout rates from several current MOOCs are high. Usually, dropout prediction aims to predict whether a learner will exhibit learning behaviors during several consecutive days in the future. Therefore, the information related to the learning behaviors of a learner in several consecutive days should be considered. After in-depth analysis of the learning behavior patterns of the MOOC learners, this study reports that learners often exhibit similar learning behaviors on several consecutive days, i.e., the learning status of a learner for the subsequent day is likely to be similar to that for the previous day. Based on this characteristic of MOOC learning,this study proposes a new simple feature matrix for keeping information related to the local correlation of learning behaviors and a new Convolutional Neural Network(CNN) model for predicting the dropout. Extensive experimental validations illustrate that the local correlation of learning behaviors should not be neglected. The proposed CNN model considers this characteristic and improves the dropout prediction accuracy. Furthermore, the proposed model can be used to predict dropout temporally and early when sufficient data are collected.展开更多
Intelligent control of the greenhouse planting environment plays an important role in improving planting efficiency and guaranteeing the quality of precious flowers.Among them,how to adapt the air humidity,temperature...Intelligent control of the greenhouse planting environment plays an important role in improving planting efficiency and guaranteeing the quality of precious flowers.Among them,how to adapt the air humidity,temperature and light intensity in greenhouses to the different needs of the flower growth cycle is the key problem of intelligent control.Therefore,an intelligent flower planting environment monitoring and control system model(named)based on the Internet of Things and fuzzy-GRU network adaptive learning is proposed.The above three parameters in the greenhouse were used as model input parameters.The optimal growth humidity,temperature and illumination intensity of flowers are determined by the model,and the output temperature,humidity and illumination intensity act on the executing organ of the greenhouse room by the single-chip microcomputer.The model was evaluated using field greenhouse crops.The results show that the performance of this model is better than that of the PID model and fuzzy control model in simulation experiments and actual scene control.Compared with the flowers in the natural state,the plants of the flowers under systematic control were approximately 6 cm higher than those in the natural state on average,the blooming time of the flowers was approximately two days longer than that in the natural state,and the quality of the flowers was stable.展开更多
With the development of information technology,a mass of data are generated every day.Collecting and analysing these data help service providers improve their services and gain an advantage in the fierce market compet...With the development of information technology,a mass of data are generated every day.Collecting and analysing these data help service providers improve their services and gain an advantage in the fierce market competition.K-means clustering has been widely used for cluster analysis in real life.However,these analyses are based on users’data,which disclose users’privacy.Local differential privacy has attracted lots of attention recently due to its strong privacy guarantee and has been applied for clustering analysis.However,existing K-means clustering methods with local differential privacy protection cannot get an ideal clustering result due to the large amount of noise introduced to the whole dataset to ensure the privacy guarantee.To solve this problem,we propose a novel method that provides local distance privacy for users who participate in the clustering analysis.Instead of making the users’records in-distinguish from each other in high-dimensional space,we map the user’s record into a one-dimensional distance space and make the records in such a distance space not be distinguished from each other.To be specific,we generate a noisy distance first and then synthesize the high-dimensional data record.We propose a Bounded Laplace Method(BLM)and a Cluster Indistinguishable Method(CIM)to sample such a noisy distance,which satisfies the local differential privacy guarantee and local dE-privacy guarantee,respectively.Furthermore,we introduce a way to generate synthetic data records in high-dimensional space.Our experimental evaluation results show that our methods outperform the traditional methods significantly.展开更多
Coded apertures with random patterns are extensively used in compressive spectral imagers to sample the incident scene in the image plane.Random samplings,however,are inadequate to capture the structural characteristi...Coded apertures with random patterns are extensively used in compressive spectral imagers to sample the incident scene in the image plane.Random samplings,however,are inadequate to capture the structural characteristics of the underlying signal due to the sparsity and structure nature of sensing matrices in spectral imagers.This paper proposes a new approach for super-resolution compressive spectral imaging via adaptive coding.In this method,coded apertures are optimally designed based on a two-tone adaptive compressive sensing(CS)framework to improve the reconstruction resolution and accuracy of the hyperspectral imager.A liquid crystal tunable filter(LCTF)is used to scan the incident scene in the spectral domain to successively select different spectral channels.The output of the LCTF is modulated by the adaptive coded aperture patterns and then projected onto a lowresolution detector array.The coded aperture patterns are implemented by a digital micromirror device(DMD)with higher resolution than that of the detector.Due to the strong correlation across the spectra,the recovered images from previous spectral channels can be used as a priori information to design the adaptive coded apertures for sensing subsequent spectral channels.In particular,the coded apertures are constructed from the a priori spectral images via a two-tone hard thresholding operation that respectively extracts the structural characteristics of bright and dark regions in the underlying scenes.Super-resolution image reconstruction within a spectral channel can be recovered from a few snapshots of low-resolution measurements.Since no additional side information of the spectral scene is needed,the proposed method does not increase the system complexity.Based on the mutual-coherence criterion,the proposed adaptive CS framework is proved theoretically to promote the sensing efficiency of the spectral images.Simulations and experiments are provided to demonstrate and assess the proposed adaptive coding method.Finally,the underlying concepts are extended to a multi-channel method to compress the hyperspectral data cube in the spatial and spectral domains simultaneously.展开更多
Sparse representation is a mathematical model for data representation that has proved to be a powerful tool for solving problems in various fields such as pattern recognition, machine learning, and computer vision. As...Sparse representation is a mathematical model for data representation that has proved to be a powerful tool for solving problems in various fields such as pattern recognition, machine learning, and computer vision. As one of the building blocks of the sparse representation method, dictionary learning plays an important role in the minimization of the reconstruction error between the original signal and its sparse representation in the space of the learned dictionary. Although using training samples directly as dictionary bases can achieve good performance, the main drawback of this method is that it may result in a very large and inef- ficient dictionary due to noisy training instances. To obtain a smaller and more representative dictionary, in this paper, we propose an approach called Laplacian sparse dictionary (LSD) learning. Our method is based on manifold learning and double sparsity. We incorporate the Laplacian weighted graph in the sparse representation model and impose the 11-norm sparsity on the dictionary. An LSD is a sparse overcomplete dictionary that can preserve the intrinsic structure of the data and learn a smaller dictionary for each class. The learned LSD can be easily integrated into a classification framework based on sparse representation. We compare the proposed method with other methods using three benchmark-controlled face image databases, Extended Yale B, ORL, and AR, and one uncontrolled person image dataset, i-LIDS-MA. Results show the advantages of the proposed LSD algorithm over state-of-the-art sparse representation based classification methods.展开更多
文摘As an introductory course for the emerging major of big data management and application,“Introduction to Big Data”has not yet formed a curriculum standard and implementation plan that is widely accepted and used by everyone.To this end,we discuss some of our explorations and attempts in the construction and teaching process of big data courses for the major of big data management and application from the perspective of course planning,course implementation,and course summary.After interviews with students and feedback from questionnaires,students are highly satisfied with some of the teaching measures and programs currently adopted.
基金supported by the National Natural Science Foundation of China(62362013)the Guangxi Natural Science Foundation(2023GXNSFAA026294).
文摘The Internet of Vehicles(IoV)is extensively deployed in outdoor and open environments to effectively address traffic efficiency and safety issues by connecting vehicles to the network.However,due to the open and variable nature of its network topology,vehicles frequently engage in cross-domain interactions.During such processes,directly uploading sensitive information to roadside units for interaction may expose it to malicious tampering or interception by attackers,thus compromising the security of the cross-domain authentication process.Additionally,IoV imposes high real-time requirements,and existing cross-domain authentication schemes for IoV often encounter efficiency issues.To mitigate these challenges,we propose CAIoV,a blockchain-based efficient cross-domain authentication scheme for IoV.This scheme comprehensively integrates technologies such as zero-knowledge proofs,smart contracts,and Merkle hash tree structures.It divides the cross-domain process into anonymous cross-domain authentication and safe cross-domain authentication phases to ensure efficiency while maintaining a balance between efficiency and security.Finally,we evaluate the performance of CAIoV.Experimental results demonstrate that our proposed scheme reduces computational overhead by approximately 20%,communication overhead by around 10%,and storage overhead by nearly 30%.
基金This work was supported by the 2021 Project of the“14th Five-Year Plan”of Shaanxi Education Science“Research on the Application of Educational Data Mining in Applied Undergraduate Teaching-Taking the Course of‘Computer Application Technology’as an Example”(SGH21Y0403)the Teaching Reform and Research Projects for Practical Teaching in 2022“Research on Practical Teaching of Applied Undergraduate Projects Based on‘Combination of Courses and Certificates”-Taking Computer Application Technology Courses as an Example”(SJJG02012)the 11th batch of Teaching Reform Research Project of Xi’an Jiaotong University City College“Project-Driven Cultivation and Research on Information Literacy of Applied Undergraduate Students in the Information Times-Taking Computer Application Technology Course Teaching as an Example”(111001).
文摘Social network is the mainstream medium of current information dissemination,and it is particularly important to accurately predict its propagation law.In this paper,we introduce a social network propagation model integrating multiple linear regression and infectious disease model.Firstly,we proposed the features that affect social network communication from three dimensions.Then,we predicted the node influence via multiple linear regression.Lastly,we used the node influence as the state transition of the infectious disease model to predict the trend of information dissemination in social networks.The experimental results on a real social network dataset showed that the prediction results of the model are consistent with the actual information dissemination trends.
基金supported by National Key R&D Program(Grant No.2018AAA0102600)National Natural Science Foundation of China(Grant No.61906050)+1 种基金Guangxi Technology,R&D,Program(Grant No.2018AD11018)Guangxi University Young and Middle-aged Teachers'Research Ability Improvement Project(Grant No.2020KY05034)
文摘Near infrared(NIR)spectrum analysis technology has outstanding advantages such as rapid,nondestructive,pollution-free,and is widely used in food,pharmaceutical,petrochemical,agricultural products production and testing industries.Convolutional neural network(CNN)is one of the most successful methods in big data analysis because of its powerful feature ex-traction and abstraction ability,and it is especially suitable for solving multi-classification problems.CNN-based transfer learning is a machine learning technique,which migrates para-meters of trained model to the new one to improve the performance.The transfer learning strategy can speed up the learning efficiency of the model instead of learning from scratch.In view of the difficulty in acquisition of drug NIR spectral data and high labeling cost,this paper proposes three simple but very effective transfer learning methods for multi-manufacturer identification of drugs based on one-dimensional CNN.Compared with the original CNN,the transfer learning method can achieve better classification performance with fewer NIR spectral data,which greatly reduces the dependence on labeled NIR spectral data.At the same time,this paper also compares and discusses three different transfer learning methods,and selects the most suitable transfer learning model for drug NIR spectral data analysis.Compared with the current popular methods,such as SVM,BP,AE and ELM,the proposed method achieves higher classification accuracy and scalability in multi-variety and multi-manufacturer NIR spectrum classification experiments.
基金supported by the National Natural Science Foundation of China(61906050,21365008)Guangxi Technology R&D Program(2018AD11018)Innovation Project of GUET Graduate Education(2021YCXS050).
文摘The drug supervision methods based on near-infrared spectroscopy analysis are heavily dependent on the chemometrics model which characterizes the relationship between spectral data and drug categories.The preliminary application of convolution neural network in spectral analysis demonstrates excellent end-to-end prediction ability,but it is sensitive to the hyper-parameters of the network.The transformer is a deep-learning model based on self-attention mechanism that compares convolutional neural networks(CNNs)in predictive performance and has an easy-todesign model structure.Hence,a novel calibration model named SpectraTr,based on the transformer structure,is proposed and used for the qualitative analysis of drug spectrum.The experimental results of seven classes of drug and 18 classes of drug show that the proposed SpectraTr model can automatically extract features from a huge number of spectra,is not dependent on pre-processing algorithms,and is insensitive to model hyperparameters.When the ratio of the training set to test set is 8:2,the prediction accuracy of the SpectraTr model reaches 100%and 99.52%,respectively,which outperforms PLS DA,SVM,SAE,and CNN.The model is also tested on a public drug data set,and achieved classification accuracy of 96.97%without preprocessing algorithm,which is 34.85%,28.28%,5.05%,and 2.73%higher than PLS DA,SVM,SAE,and CNN,respectively.The research shows that the SpectraTr model performs exceptionally well in spectral analysis and is expected to be a novel deep calibration model after Autoencoder networks(AEs)and CNN.
基金Shaanxi Provincial Education Science Regulations“Fourteenth Five-Year Plan”Project“Research on the Application of Educational Data Mining in Applied Undergraduate Teaching:A Case Study of‘Computer Application Technology’Course”(Project Number:SGH21Y0403)The 2020 Bureau of Shaanxi Provincial Sports Regular Project(Project Number:2021392)The Special Research Project of Xi’an Jiaotong University City College(Project Number:KCSZ01005)。
文摘It is an effective means for merchants to carry out precision marketing and improve ROI by using historical user behavior data obtained from promotional activities in order to build a model to predict the repeat purchase behavior of users after promotional activities.Most of the existing prediction models are supervised learning,which does not work well with a small amount of labeled data.This paper proposes a BERT-MLP prediction model that uses“large-scale data unsupervised pre-training+small amount of labeled data fine-tuning.”The experimental results on Alibaba real dataset show that the accuracy of the BERT-MLP model is better than the baseline model.
基金supported in part by the National Key R&D Program of China under Project 2020YFB1006004the Guangxi Natural Science Foundation under Grants 2019GXNSFFA245015 and 2019GXNSFGA245004+2 种基金the National Natural Science Foundation of China under Projects 62162017,61862012,61962012,and 62172119the Major Key Project of PCL under Grants PCL2021A09,PCL2021A02 and PCL2022A03the Innovation Project of Guangxi Graduate Education YCSW2021175.
文摘The unmanned aerial vehicle(UAV)self-organizing network is composed of multiple UAVs with autonomous capabilities according to a certain structure and scale,which can quickly and accurately complete complex tasks such as path planning,situational awareness,and information transmission.Due to the openness of the network,the UAV cluster is more vulnerable to passive eavesdropping,active interference,and other attacks,which makes the system face serious security threats.This paper proposes a Blockchain-Based Data Acquisition(BDA)scheme with privacy protection to address the data privacy and identity authentication problems in the UAV-assisted data acquisition scenario.Each UAV cluster has an aggregate unmanned aerial vehicle(AGV)that can batch-verify the acquisition reports within its administrative domain.After successful verification,AGV adds its signcrypted ciphertext to the aggregation and uploads it to the blockchain for storage.There are two chains in the blockchain that store the public key information of registered entities and the aggregated reports,respectively.The security analysis shows that theBDAconstruction can protect the privacy and authenticity of acquisition data,and effectively resist a malicious key generation center and the public-key substitution attack.It also provides unforgeability to acquisition reports under the Elliptic Curve Discrete Logarithm Problem(ECDLP)assumption.The performance analysis demonstrates that compared with other schemes,the proposed BDA construction has lower computational complexity and is more suitable for the UAV cluster network with limited computing power and storage capacity.
基金This work was supported by the National Natural Science Foundation of China under Grant No.61866007the Natural Science Foundation of Guangxi Zhuang Autonomous Region of China under Grant No.2018GXNSFDA138006Humanities and Social Sciences Research Projects of the Ministry of Education of China under Grant No.17JDGC022.
文摘Many researchers have applied clustering to handle semi-supervised classification of data streams with concept drifts.However,the generalization ability for each specific concept cannot be steadily improved,and the concept drift detection method without considering the local structural information of data cannot accurately detect concept drifts.This paper proposes to solve these problems by BIRCH(Balanced Iterative Reducing and Clustering Using Hierarchies)ensemble and local structure mapping.The local structure mapping strategy is utilized to compute local similarity around each sample and combined with semi-supervised Bayesian method to perform concept detection.If a recurrent concept is detected,a historical BIRCH ensemble classifier is selected to be incrementally updated;otherwise a new BIRCH ensemble classifier is constructed and added into the classifier pool.The extensive experiments on several synthetic and real datasets demonstrate the advantage of the proposed algorithm.
基金supported by National Natural Science Foundation of China(Nos.61861013,61662018)Science and Technology Major Project of Guangxi(No.AA18118031)+2 种基金Guangxi Natural Science Foundation of China(No.2018 GXNSFAA050028)the Doctoral Research Foundation of Guilin University of Electronic Science and Technology(No.UF19033Y)Director Fund project of Key Laboratory of Cognitive Radio and Information Processing of Ministry of Education(No.CRKL190102)。
文摘The proliferation of the global datasphere has forced cloud storage systems to evolve more complex architectures for different applications.The emergence of these application session requests and system daemon services has created large persistent flows with diverse performance requirements that need to coexist with other types of traffic.Current routing methods such as equal-cost multipath(ECMP)and Hedera do not take into consideration specific traffic characteristics nor performance requirements,which make these methods difficult to meet the quality of service(QoS)for high-priority flows.In this paper,we tailored the best routing for different kinds of cloud storage flows as an integer programming problem and utilized grey relational analysis(GRA)to solve this optimization problem.The resulting method is a GRAbased service-aware flow scheduling(GRSA)framework that considers requested flow types and network status to select appropriate routing paths for flows in cloud storage datacenter networks.The results from experiments carried out on a real traffic trace show that the proposed GRSA method can better balance traffic loads,conserve table space and reduce the average transmission delay for high-priority flows compared to ECMP and Hedera.
基金supported by Macao FDCT-MOST grant 001/2015/AMJ, Macao FDCT grants 013/2014/A1 and 005/2016/A1the National Natural Science Foundation of China (Nos. 61373027 and 61672321)the Natural Science Foundation of Shandong Province (No. ZR2012FM023)
文摘In the Internet of Things(IoT), various battery-powered wireless devices are connected to collect and exchange data, and typical traffic is periodic and heterogeneous. Polling with power management is a very promising technique that can be used for communication among these devices in the IoT. In this paper, we propose a novel and scalable model to study the delay and the power consumption performance for polling schemes with power management under heterogeneous settings(particularly the heterogeneous sleeping interval). In our model,by introducing the concept of virtual polling interval, we successfully convert the considered energy-efficient polling scheme into an equivalent purely-limited vacation system. Thus, we can easily evaluate the mean and variance of the delay and the power consumption by applying existing queueing formulae, without developing a new theoretical model as required in previous works. Extensive simulations show that our analytical results are very accurate for both homogeneous and heterogeneous settings.
基金supported by the National Science Foundation of China(No.61772385,No.61373040,No.61572370)
文摘Sometimes user has the requirement to run a high bandwidth application over a low bandwidth network. But its implementation is not easy as the traditional network transmits data with only one path where its bandwidth is lower than the demand. Although the current network technology like SDN has the ability to precisely control the data transmission in the network, but till now the standard openflow protocol does not support splitting one flow to multiple flows. In this paper, a flow splitting algorithm is proposed. The algorithm splits a data flow to multiple sub-flows by extending the openflow protocol. A multiple paths routing algorithm is also proposed to implement the multi-path parallel transmission in the paper. The algorithm selects multiple paths and minimizes the cost of transmission under the constraint of maximum delay and delay variance. The simulations show the algorithms can significantly improve the transmission performance.
基金The authors would also like to thank all col laborators from China Electric Power Research Institute(CEPRI).This work was supported by National Key Research and Development Program(2018YFB0904503)the National Natural Science Foundation of China(Grant Nos.61772456,61761136020).
文摘Closely related to the safety and stability of power grids,stability analysis has long been a core topic in the electric industry.Conventional approaches employ computational simulation to make the quantitative judgement of the grid stability under distinctive conditions.The lack of in-depth data analysis tools has led to the difficulty in analytical tasks such as situation-aware analysis,instability reasoning and pattern recognition.To facilitate visual exploration and reasoning on the simulation data,we introduce WaveLines,a visual analysis approach which supports the supervisory control of multivariate simulation time series of power grids.We design and implement an interactive system that supports a set of analytical tasks proposed by domain experts and experienced operators.Experiments have been conducted with domain experts to illustrate the usability and effectiveness of WaveLines.
基金the National Natural Science Foundation of China under projects 61772150 and 61862012the Guangxi Key R&D Program under project AB17195025+5 种基金the Guangxi Natural Science Foundation under grants 2018GXNSFDA281054 and 2018GXNSFAA281232the National Cryptography Development Fund of China under project MMJJ20170217the Guangxi Science and Technology Base and Special Talents Program AD18281044the Innovation Project of GUET Graduate Education under project 2017YJCX46the Guangxi Young Teachers’ Basic Ability Improvement Program under Grant 2018KY0194the open program of Guangxi Key Laboratory of Cryptography and Information Security under projects GCIS201621 and GCIS201702.
文摘Existing interference protection systems lack automatic evaluation methods to provide scientific, objective and accurate assessment results. To address this issue, this paper develops a layout scheme by geometrically modeling the actual scene, so that the hand-held full-band spectrum analyzer would be able to collect signal field strength values for indoor complex scenes. An improved prediction algorithm based on the K-nearest neighbor non-parametric kernel regression was proposed to predict the signal field strengths for the whole plane before and after being shield. Then the highest accuracy set of data could be picked out by comparison. The experimental results show that the improved prediction algorithm based on the K-nearest neighbor non-parametric kernel regression can scientifically and objectively predict the indoor complex scenes’ signal strength and evaluate the interference protection with high accuracy.
基金the National Natural Science Foundation of China under projects 61772150 and 61862012the Guangxi Key R&D Program under project AB17195025+3 种基金the Guangxi Natural Science Foundation under grants 2018GXNSFDA281054 and 2018GXNSFAA281232the National Cryptography Development Fund of China under project MMJJ20170217the Guangxi Young Teachers’ Basic Ability Improvement Program under Grant 2018KY0194and the open program of Guangxi Key Laboratory of Cryptography and Information Security under projects GCIS201621 and GCIS201702.
文摘The remote data integrity auditing technology can guarantee the integrity of outsourced data in clouds. Users can periodically run an integrity auditing protocol by interacting with cloud server, to verify the latest status of outsourced data. Integrity auditing requires user to take massive time-consuming computations, which would not be affordable by weak devices. In this paper, we propose a privacy-preserving TPA-aided remote data integrity auditing scheme based on Li et al.’s data integrity auditing scheme without bilinear pairings, where a third party auditor (TPA) is employed to perform integrity auditing on outsourced data for users. The privacy of outsourced data can be guaranteed against TPA in the sense that TPA could not infer its contents from the returned proofs in the integrity auditing phase. Our construction is as efficient as Li et al.’s scheme, that is, each procedure takes the same time-consuming operations in both schemes, and our solution does not increase the sizes of processed data, challenge and proof.
基金supported by the Open Funds from Guangxi Key Laboratory of Image and Graphic Intelligent Processing under Grant No.GIIP2209the National Natural Science Foundation of China under Grant Nos.62172120 and 62002082the Natural Science Foundation of Guangxi Province of China under Grant Nos.2019GXNSFAA245014 and 2020GXNSFBA238014.
文摘Retinal images play an essential role in the early diagnosis of ophthalmic diseases.Automatic segmentation of retinal vessels in color fundus images is challenging due to the morphological differences between the retinal vessels and the low-contrast background.At the same time,automated models struggle to capture representative and discriminative retinal vascular features.To fully utilize the structural information of the retinal blood vessels,we propose a novel deep learning network called Pre-Activated Convolution Residual and Triple Attention Mechanism Network(PCRTAM-Net).PCRTAM-Net uses the pre-activated dropout convolution residual method to improve the feature learning ability of the network.In addition,the residual atrous convolution spatial pyramid is integrated into both ends of the network encoder to extract multiscale information and improve blood vessel information flow.A triple attention mechanism is proposed to extract the structural information between vessel contexts and to learn long-range feature dependencies.We evaluate the proposed PCRTAM-Net on four publicly available datasets,DRIVE,CHASE_DB1,STARE,and HRF.Our model achieves state-of-the-art performance of 97.10%,97.70%,97.68%,and 97.14%for ACC and 83.05%,82.26%,84.64%,and 81.16%for F1,respectively.
基金partially supported by the National Natural Science Foundation of China (Nos. 61866007, 61363029, 61662014, 61763007, and U1811264)the Natural Science Foundation of Guangxi District (No. 2018GXNSFDA138006)+2 种基金Guangxi Key Laboratory of Trusted Software (No. KX201721)Humanities and Social Sciences Research Projects of the Ministry of Education (No. 17JDGC022)Chongqing Higher Education Reform Project (No. 183137)
文摘Recently, Massive Open Online Courses(MOOCs) have become a major online learning methodology for millions of people worldwide. However, the dropout rates from several current MOOCs are high. Usually, dropout prediction aims to predict whether a learner will exhibit learning behaviors during several consecutive days in the future. Therefore, the information related to the learning behaviors of a learner in several consecutive days should be considered. After in-depth analysis of the learning behavior patterns of the MOOC learners, this study reports that learners often exhibit similar learning behaviors on several consecutive days, i.e., the learning status of a learner for the subsequent day is likely to be similar to that for the previous day. Based on this characteristic of MOOC learning,this study proposes a new simple feature matrix for keeping information related to the local correlation of learning behaviors and a new Convolutional Neural Network(CNN) model for predicting the dropout. Extensive experimental validations illustrate that the local correlation of learning behaviors should not be neglected. The proposed CNN model considers this characteristic and improves the dropout prediction accuracy. Furthermore, the proposed model can be used to predict dropout temporally and early when sufficient data are collected.
基金supported by the Guangxi Key Research and Development Program [Grant no:AB21196063]Major Achievement Transformation Foundation of Guilin [Grant No.20192013-1]Innovation and Entrepreneurship Training Program for College Students of Guilin University of Electronic Technology [Grant No.202010595031].
文摘Intelligent control of the greenhouse planting environment plays an important role in improving planting efficiency and guaranteeing the quality of precious flowers.Among them,how to adapt the air humidity,temperature and light intensity in greenhouses to the different needs of the flower growth cycle is the key problem of intelligent control.Therefore,an intelligent flower planting environment monitoring and control system model(named)based on the Internet of Things and fuzzy-GRU network adaptive learning is proposed.The above three parameters in the greenhouse were used as model input parameters.The optimal growth humidity,temperature and illumination intensity of flowers are determined by the model,and the output temperature,humidity and illumination intensity act on the executing organ of the greenhouse room by the single-chip microcomputer.The model was evaluated using field greenhouse crops.The results show that the performance of this model is better than that of the PID model and fuzzy control model in simulation experiments and actual scene control.Compared with the flowers in the natural state,the plants of the flowers under systematic control were approximately 6 cm higher than those in the natural state on average,the blooming time of the flowers was approximately two days longer than that in the natural state,and the quality of the flowers was stable.
文摘With the development of information technology,a mass of data are generated every day.Collecting and analysing these data help service providers improve their services and gain an advantage in the fierce market competition.K-means clustering has been widely used for cluster analysis in real life.However,these analyses are based on users’data,which disclose users’privacy.Local differential privacy has attracted lots of attention recently due to its strong privacy guarantee and has been applied for clustering analysis.However,existing K-means clustering methods with local differential privacy protection cannot get an ideal clustering result due to the large amount of noise introduced to the whole dataset to ensure the privacy guarantee.To solve this problem,we propose a novel method that provides local distance privacy for users who participate in the clustering analysis.Instead of making the users’records in-distinguish from each other in high-dimensional space,we map the user’s record into a one-dimensional distance space and make the records in such a distance space not be distinguished from each other.To be specific,we generate a noisy distance first and then synthesize the high-dimensional data record.We propose a Bounded Laplace Method(BLM)and a Cluster Indistinguishable Method(CIM)to sample such a noisy distance,which satisfies the local differential privacy guarantee and local dE-privacy guarantee,respectively.Furthermore,we introduce a way to generate synthetic data records in high-dimensional space.Our experimental evaluation results show that our methods outperform the traditional methods significantly.
基金National Natural Science Foundation of China(61371132,61471043,61527802)International S&T Cooperation Program of China(2014DFR10960)。
文摘Coded apertures with random patterns are extensively used in compressive spectral imagers to sample the incident scene in the image plane.Random samplings,however,are inadequate to capture the structural characteristics of the underlying signal due to the sparsity and structure nature of sensing matrices in spectral imagers.This paper proposes a new approach for super-resolution compressive spectral imaging via adaptive coding.In this method,coded apertures are optimally designed based on a two-tone adaptive compressive sensing(CS)framework to improve the reconstruction resolution and accuracy of the hyperspectral imager.A liquid crystal tunable filter(LCTF)is used to scan the incident scene in the spectral domain to successively select different spectral channels.The output of the LCTF is modulated by the adaptive coded aperture patterns and then projected onto a lowresolution detector array.The coded aperture patterns are implemented by a digital micromirror device(DMD)with higher resolution than that of the detector.Due to the strong correlation across the spectra,the recovered images from previous spectral channels can be used as a priori information to design the adaptive coded apertures for sensing subsequent spectral channels.In particular,the coded apertures are constructed from the a priori spectral images via a two-tone hard thresholding operation that respectively extracts the structural characteristics of bright and dark regions in the underlying scenes.Super-resolution image reconstruction within a spectral channel can be recovered from a few snapshots of low-resolution measurements.Since no additional side information of the spectral scene is needed,the proposed method does not increase the system complexity.Based on the mutual-coherence criterion,the proposed adaptive CS framework is proved theoretically to promote the sensing efficiency of the spectral images.Simulations and experiments are provided to demonstrate and assess the proposed adaptive coding method.Finally,the underlying concepts are extended to a multi-channel method to compress the hyperspectral data cube in the spatial and spectral domains simultaneously.
基金Project supported by the National Natural Science Foundation of China (Nos. 61272304 and 61363029) and the Guangxi Key Laboratory of Trusted Software (No. kx201313)
文摘Sparse representation is a mathematical model for data representation that has proved to be a powerful tool for solving problems in various fields such as pattern recognition, machine learning, and computer vision. As one of the building blocks of the sparse representation method, dictionary learning plays an important role in the minimization of the reconstruction error between the original signal and its sparse representation in the space of the learned dictionary. Although using training samples directly as dictionary bases can achieve good performance, the main drawback of this method is that it may result in a very large and inef- ficient dictionary due to noisy training instances. To obtain a smaller and more representative dictionary, in this paper, we propose an approach called Laplacian sparse dictionary (LSD) learning. Our method is based on manifold learning and double sparsity. We incorporate the Laplacian weighted graph in the sparse representation model and impose the 11-norm sparsity on the dictionary. An LSD is a sparse overcomplete dictionary that can preserve the intrinsic structure of the data and learn a smaller dictionary for each class. The learned LSD can be easily integrated into a classification framework based on sparse representation. We compare the proposed method with other methods using three benchmark-controlled face image databases, Extended Yale B, ORL, and AR, and one uncontrolled person image dataset, i-LIDS-MA. Results show the advantages of the proposed LSD algorithm over state-of-the-art sparse representation based classification methods.