期刊文献+
共找到2,825篇文章
< 1 2 142 >
每页显示 20 50 100
The accessible seismological dataset of a high-density 2D seismic array along Anninghe fault
1
作者 Weifan Lu Zeyan Zhao +3 位作者 Han Yue Shiyong Zhou Jianping Wu Xiaodong Song 《Earthquake Science》 2024年第1期67-77,共11页
The scientific goal of the Anninghe seismic array is to investigate the detailed geometry of the Anninghe fault and the velocity structure of the fault zone.This 2D seismic array is composed of 161 stations forming su... The scientific goal of the Anninghe seismic array is to investigate the detailed geometry of the Anninghe fault and the velocity structure of the fault zone.This 2D seismic array is composed of 161 stations forming sub-rectangular geometry along the Anninghe fault,which covers 50 km and 150 km in the fault normal and strike directions,respectively,with~5 km intervals.The data were collected between June 2020 and June 2021,with some level of temporal gaps.Two types of instruments,i.e.QS-05A and SmartSolo,are used in this array.Data quality and examples of seismograms are provided in this paper.After the data protection period ends(expected in June 2024),researchers can request a dataset from the National Earthquake Science Data Center. 展开更多
关键词 Anninghe fault seismological dataset data share
下载PDF
Optimizing Enterprise Conversational AI: Accelerating Response Accuracy with Custom Dataset Fine-Tuning
2
作者 Yash Kishore 《Intelligent Information Management》 2024年第2期65-76,共12页
As the realm of enterprise-level conversational AI continues to evolve, it becomes evident that while generalized Large Language Models (LLMs) like GPT-3.5 bring remarkable capabilities, they also bring forth formidab... As the realm of enterprise-level conversational AI continues to evolve, it becomes evident that while generalized Large Language Models (LLMs) like GPT-3.5 bring remarkable capabilities, they also bring forth formidable challenges. These models, honed on vast and diverse datasets, have undoubtedly pushed the boundaries of natural language understanding and generation. However, they often stumble when faced with the intricate demands of nuanced enterprise applications. This research advocates for a strategic paradigm shift, urging enterprises to embrace a fine-tuning approach as a means to optimize conversational AI. While generalized LLMs are linguistic marvels, their inability to cater to the specific needs of businesses across various industries poses a critical challenge. This strategic shift involves empowering enterprises to seamlessly integrate their own datasets into LLMs, a process that extends beyond linguistic enhancement. The core concept of this approach centers on customization, enabling businesses to fine-tune the AI’s functionality to fit precisely within their unique business landscapes. By immersing the LLM in industry-specific documents, customer interaction records, internal reports, and regulatory guidelines, the AI transcends its generic capabilities to become a sophisticated conversational partner aligned with the intricacies of the enterprise’s domain. The transformative potential of this fine-tuning approach cannot be overstated. It enables a transition from a universal AI solution to a highly customizable tool. The AI evolves from being a linguistic powerhouse to a contextually aware, industry-savvy assistant. As a result, it not only responds with linguistic accuracy but also with depth, relevance, and resonance, significantly elevating user experiences and operational efficiency. In the subsequent sections, this paper delves into the intricacies of fine-tuning, exploring the multifaceted challenges and abundant opportunities it presents. It addresses the technical intricacies of data integration, ethical considerations surrounding data usage, and the broader implications for the future of enterprise AI. The journey embarked upon in this research holds the potential to redefine the role of conversational AI in enterprises, ushering in an era where AI becomes a dynamic, deeply relevant, and highly effective tool, empowering businesses to excel in an ever-evolving digital landscape. 展开更多
关键词 Fine-Tuning dataset AI CONVERSATIONAL ENTERPRISE LLM
下载PDF
SciCN:A Scientific Dataset for Chinese Named Entity Recognition
3
作者 Jing Yang Bin Ji +2 位作者 Shasha Li Jun Ma Jie Yu 《Computers, Materials & Continua》 SCIE EI 2024年第3期4303-4315,共13页
Named entity recognition(NER)is a fundamental task of information extraction(IE),and it has attracted considerable research attention in recent years.The abundant annotated English NER datasets have significantly prom... Named entity recognition(NER)is a fundamental task of information extraction(IE),and it has attracted considerable research attention in recent years.The abundant annotated English NER datasets have significantly promoted the NER research in the English field.By contrast,much fewer efforts are made to the Chinese NER research,especially in the scientific domain,due to the scarcity of Chinese NER datasets.To alleviate this problem,we present aChinese scientificNER dataset–SciCN,which contains entity annotations of titles and abstracts derived from 3,500 scientific papers.We manually annotate a total of 62,059 entities,and these entities are classified into six types.Compared to English scientific NER datasets,SciCN has a larger scale and is more diverse,for it not only contains more paper abstracts but these abstracts are derived from more research fields.To investigate the properties of SciCN and provide baselines for future research,we adapt a number of previous state-of-theart Chinese NER models to evaluate SciCN.Experimental results show that SciCN is more challenging than other Chinese NER datasets.In addition,previous studies have proven the effectiveness of using lexicons to enhance Chinese NER models.Motivated by this fact,we provide a scientific domain-specific lexicon.Validation results demonstrate that our lexicon delivers better performance gains than lexicons of other domains.We hope that the SciCN dataset and the lexicon will enable us to benchmark the NER task regarding the Chinese scientific domain and make progress for future research.The dataset and lexicon are available at:https://github.com/yangjingla/SciCN.git. 展开更多
关键词 Named entity recognition dataset scientific information extraction LEXICON
下载PDF
Performance evaluation of seven multi-label classification methods on real-world patent and publication datasets
4
作者 Shuo Xu Yuefu Zhang +1 位作者 Xin An Sainan Pi 《Journal of Data and Information Science》 CSCD 2024年第2期81-103,共23页
Purpose:Many science,technology and innovation(STI)resources are attached with several different labels.To assign automatically the resulting labels to an interested instance,many approaches with good performance on t... Purpose:Many science,technology and innovation(STI)resources are attached with several different labels.To assign automatically the resulting labels to an interested instance,many approaches with good performance on the benchmark datasets have been proposed for multi-label classification task in the literature.Furthermore,several open-source tools implementing these approaches have also been developed.However,the characteristics of real-world multi-label patent and publication datasets are not completely in line with those of benchmark ones.Therefore,the main purpose of this paper is to evaluate comprehensively seven multi-label classification methods on real-world datasets.Research limitations:Three real-world datasets differ in the following aspects:statement,data quality,and purposes.Additionally,open-source tools designed for multi-label classification also have intrinsic differences in their approaches for data processing and feature selection,which in turn impacts the performance of a multi-label classification approach.In the near future,we will enhance experimental precision and reinforce the validity of conclusions by employing more rigorous control over variables through introducing expanded parameter settings.Practical implications:The observed Macro F1 and Micro F1 scores on real-world datasets typically fall short of those achieved on benchmark datasets,underscoring the complexity of real-world multi-label classification tasks.Approaches leveraging deep learning techniques offer promising solutions by accommodating the hierarchical relationships and interdependencies among labels.With ongoing enhancements in deep learning algorithms and large-scale models,it is expected that the efficacy of multi-label classification tasks will be significantly improved,reaching a level of practical utility in the foreseeable future.Originality/value:(1)Seven multi-label classification methods are comprehensively compared on three real-world datasets.(2)The TextCNN and TextRCNN models perform better on small-scale datasets with more complex hierarchical structure of labels and more balanced document-label distribution.(3)The MLkNN method works better on the larger-scale dataset with more unbalanced document-label distribution. 展开更多
关键词 Multi-label classification Real-World datasets Hierarchical structure Classification system Label correlation Machine learning
下载PDF
A Method of Generating Semi-Experimental Biomedical Datasets
5
作者 Jing Wang Naike Du +1 位作者 Zi He Xiuzhu Ye 《Journal of Beijing Institute of Technology》 EI CAS 2024年第3期219-226,共8页
This paper proposed a method to generate semi-experimental biomedical datasets based on full-wave simulation software.The system noise such as antenna port couplings is fully considered in the proposed datasets,which ... This paper proposed a method to generate semi-experimental biomedical datasets based on full-wave simulation software.The system noise such as antenna port couplings is fully considered in the proposed datasets,which is more realistic than synthetical datasets.In this paper,datasets containing different shapes are constructed based on the relative permittivities of human tissues.Then,a back-propagation scheme is used to obtain the rough reconstructions,which will be fed into a U-net convolutional neural network(CNN)to recover the high-resolution images.Numerical results show that the network trained on the datasets generated by the proposed method can obtain satisfying reconstruction results and is promising to be applied in real-time biomedical imaging. 展开更多
关键词 electromagnetic imaging dataset biomedical imaging
下载PDF
CNN Channel Attention Intrusion Detection SystemUsing NSL-KDD Dataset
6
作者 Fatma S.Alrayes Mohammed Zakariah +2 位作者 Syed Umar Amin Zafar Iqbal Khan Jehad Saad Alqurni 《Computers, Materials & Continua》 SCIE EI 2024年第6期4319-4347,共29页
Intrusion detection systems(IDS)are essential in the field of cybersecurity because they protect networks from a wide range of online threats.The goal of this research is to meet the urgent need for small-footprint,hi... Intrusion detection systems(IDS)are essential in the field of cybersecurity because they protect networks from a wide range of online threats.The goal of this research is to meet the urgent need for small-footprint,highly-adaptable Network Intrusion Detection Systems(NIDS)that can identify anomalies.The NSL-KDD dataset is used in the study;it is a sizable collection comprising 43 variables with the label’s“attack”and“level.”It proposes a novel approach to intrusion detection based on the combination of channel attention and convolutional neural networks(CNN).Furthermore,this dataset makes it easier to conduct a thorough assessment of the suggested intrusion detection strategy.Furthermore,maintaining operating efficiency while improving detection accuracy is the primary goal of this work.Moreover,typical NIDS examines both risky and typical behavior using a variety of techniques.On the NSL-KDD dataset,our CNN-based approach achieves an astounding 99.728%accuracy rate when paired with channel attention.Compared to previous approaches such as ensemble learning,CNN,RBM(Boltzmann machine),ANN,hybrid auto-encoders with CNN,MCNN,and ANN,and adaptive algorithms,our solution significantly improves intrusion detection performance.Moreover,the results highlight the effectiveness of our suggested method in improving intrusion detection precision,signifying a noteworthy advancement in this field.Subsequent efforts will focus on strengthening and expanding our approach in order to counteract growing cyberthreats and adjust to changing network circumstances. 展开更多
关键词 Intrusion detection system(IDS) NSL-KDD dataset deep-learning MACHINE-LEARNING CNN channel Attention network security
下载PDF
Performance Analysis of Support Vector Machine (SVM) on Challenging Datasets for Forest Fire Detection
7
作者 Ankan Kar Nirjhar Nath +1 位作者 Utpalraj Kemprai   Aman 《International Journal of Communications, Network and System Sciences》 2024年第2期11-29,共19页
This article delves into the analysis of performance and utilization of Support Vector Machines (SVMs) for the critical task of forest fire detection using image datasets. With the increasing threat of forest fires to... This article delves into the analysis of performance and utilization of Support Vector Machines (SVMs) for the critical task of forest fire detection using image datasets. With the increasing threat of forest fires to ecosystems and human settlements, the need for rapid and accurate detection systems is of utmost importance. SVMs, renowned for their strong classification capabilities, exhibit proficiency in recognizing patterns associated with fire within images. By training on labeled data, SVMs acquire the ability to identify distinctive attributes associated with fire, such as flames, smoke, or alterations in the visual characteristics of the forest area. The document thoroughly examines the use of SVMs, covering crucial elements like data preprocessing, feature extraction, and model training. It rigorously evaluates parameters such as accuracy, efficiency, and practical applicability. The knowledge gained from this study aids in the development of efficient forest fire detection systems, enabling prompt responses and improving disaster management. Moreover, the correlation between SVM accuracy and the difficulties presented by high-dimensional datasets is carefully investigated, demonstrated through a revealing case study. The relationship between accuracy scores and the different resolutions used for resizing the training datasets has also been discussed in this article. These comprehensive studies result in a definitive overview of the difficulties faced and the potential sectors requiring further improvement and focus. 展开更多
关键词 Support Vector Machine Challenging datasets Forest Fire Detection CLASSIFICATION
下载PDF
Rock mass quality prediction on tunnel faces with incomplete multi-source dataset via tree-augmented naive Bayesian network
8
作者 Hongwei Huang Chen Wu +3 位作者 Mingliang Zhou Jiayao Chen Tianze Han Le Zhang 《International Journal of Mining Science and Technology》 SCIE EI CAS CSCD 2024年第3期323-337,共15页
Rock mass quality serves as a vital index for predicting the stability and safety status of rock tunnel faces.In tunneling practice,the rock mass quality is often assessed via a combination of qualitative and quantita... Rock mass quality serves as a vital index for predicting the stability and safety status of rock tunnel faces.In tunneling practice,the rock mass quality is often assessed via a combination of qualitative and quantitative parameters.However,due to the harsh on-site construction conditions,it is rather difficult to obtain some of the evaluation parameters which are essential for the rock mass quality prediction.In this study,a novel improved Swin Transformer is proposed to detect,segment,and quantify rock mass characteristic parameters such as water leakage,fractures,weak interlayers.The site experiment results demonstrate that the improved Swin Transformer achieves optimal segmentation results and achieving accuracies of 92%,81%,and 86%for water leakage,fractures,and weak interlayers,respectively.A multisource rock tunnel face characteristic(RTFC)dataset includes 11 parameters for predicting rock mass quality is established.Considering the limitations in predictive performance of incomplete evaluation parameters exist in this dataset,a novel tree-augmented naive Bayesian network(BN)is proposed to address the challenge of the incomplete dataset and achieved a prediction accuracy of 88%.In comparison with other commonly used Machine Learning models the proposed BN-based approach proved an improved performance on predicting the rock mass quality with the incomplete dataset.By utilizing the established BN,a further sensitivity analysis is conducted to quantitatively evaluate the importance of the various parameters,results indicate that the rock strength and fractures parameter exert the most significant influence on rock mass quality. 展开更多
关键词 Rock mass quality Tunnel faces Incomplete multi-source dataset Improved Swin Transformer Bayesian networks
下载PDF
A LiDAR Point Clouds Dataset of Ships in a Maritime Environment
9
作者 Qiuyu Zhang Lipeng Wang +2 位作者 Hao Meng Wen Zhang Genghua Huang 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2024年第7期1681-1694,共14页
For the first time, this article introduces a LiDAR Point Clouds Dataset of Ships composed of both collected and simulated data to address the scarcity of LiDAR data in maritime applications. The collected data are ac... For the first time, this article introduces a LiDAR Point Clouds Dataset of Ships composed of both collected and simulated data to address the scarcity of LiDAR data in maritime applications. The collected data are acquired using specialized maritime LiDAR sensors in both inland waterways and wide-open ocean environments. The simulated data is generated by placing a ship in the LiDAR coordinate system and scanning it with a redeveloped Blensor that emulates the operation of a LiDAR sensor equipped with various laser beams. Furthermore,we also render point clouds for foggy and rainy weather conditions. To describe a realistic shipping environment, a dynamic tail wave is modeled by iterating the wave elevation of each point in a time series. Finally, networks serving small objects are migrated to ship applications by feeding our dataset. The positive effect of simulated data is described in object detection experiments, and the negative impact of tail waves as noise is verified in single-object tracking experiments. The Dataset is available at https://github.com/zqy411470859/ship_dataset. 展开更多
关键词 3D point clouds dataset dynamic tail wave fog simulation rainy simulation simulated data
下载PDF
Terrorism Attack Classification Using Machine Learning: The Effectiveness of Using Textual Features Extracted from GTD Dataset
10
作者 Mohammed Abdalsalam Chunlin Li +1 位作者 Abdelghani Dahou Natalia Kryvinska 《Computer Modeling in Engineering & Sciences》 SCIE EI 2024年第2期1427-1467,共41页
One of the biggest dangers to society today is terrorism, where attacks have become one of the most significantrisks to international peace and national security. Big data, information analysis, and artificial intelli... One of the biggest dangers to society today is terrorism, where attacks have become one of the most significantrisks to international peace and national security. Big data, information analysis, and artificial intelligence (AI) havebecome the basis for making strategic decisions in many sensitive areas, such as fraud detection, risk management,medical diagnosis, and counter-terrorism. However, there is still a need to assess how terrorist attacks are related,initiated, and detected. For this purpose, we propose a novel framework for classifying and predicting terroristattacks. The proposed framework posits that neglected text attributes included in the Global Terrorism Database(GTD) can influence the accuracy of the model’s classification of terrorist attacks, where each part of the datacan provide vital information to enrich the ability of classifier learning. Each data point in a multiclass taxonomyhas one or more tags attached to it, referred as “related tags.” We applied machine learning classifiers to classifyterrorist attack incidents obtained from the GTD. A transformer-based technique called DistilBERT extracts andlearns contextual features from text attributes to acquiremore information from text data. The extracted contextualfeatures are combined with the “key features” of the dataset and used to perform the final classification. Thestudy explored different experimental setups with various classifiers to evaluate the model’s performance. Theexperimental results show that the proposed framework outperforms the latest techniques for classifying terroristattacks with an accuracy of 98.7% using a combined feature set and extreme gradient boosting classifier. 展开更多
关键词 Artificial intelligence machine learning natural language processing data analytic DistilBERT feature extraction terrorism classification GTD dataset
下载PDF
KurdSet: A Kurdish Handwritten Characters Recognition Dataset Using Convolutional Neural Network
11
作者 Sardar Hasen Ali Maiwan Bahjat Abdulrazzaq 《Computers, Materials & Continua》 SCIE EI 2024年第4期429-448,共20页
Handwritten character recognition(HCR)involves identifying characters in images,documents,and various sources such as forms surveys,questionnaires,and signatures,and transforming them into a machine-readable format fo... Handwritten character recognition(HCR)involves identifying characters in images,documents,and various sources such as forms surveys,questionnaires,and signatures,and transforming them into a machine-readable format for subsequent processing.Successfully recognizing complex and intricately shaped handwritten characters remains a significant obstacle.The use of convolutional neural network(CNN)in recent developments has notably advanced HCR,leveraging the ability to extract discriminative features from extensive sets of raw data.Because of the absence of pre-existing datasets in the Kurdish language,we created a Kurdish handwritten dataset called(KurdSet).The dataset consists of Kurdish characters,digits,texts,and symbols.The dataset consists of 1560 participants and contains 45,240 characters.In this study,we chose characters only from our dataset.We utilized a Kurdish dataset for handwritten character recognition.The study also utilizes various models,including InceptionV3,Xception,DenseNet121,and a customCNNmodel.To show the performance of the KurdSet dataset,we compared it to Arabic handwritten character recognition dataset(AHCD).We applied the models to both datasets to show the performance of our dataset.Additionally,the performance of the models is evaluated using test accuracy,which measures the percentage of correctly classified characters in the evaluation phase.All models performed well in the training phase,DenseNet121 exhibited the highest accuracy among the models,achieving a high accuracy of 99.80%on the Kurdish dataset.And Xception model achieved 98.66%using the Arabic dataset. 展开更多
关键词 CNN models Kurdish handwritten recognition KurdSet dataset Arabic handwritten recognition DenseNet121 model InceptionV3 model Xception model
下载PDF
CREDIT-X1local:A reference dataset for machine learning seismology from ChinArray in Southwest China
12
作者 Lu Li Weitao Wang +1 位作者 Ziye Yu Yini Chen 《Earthquake Science》 2024年第2期139-157,共19页
High-quality datasets are critical for the development of advanced machine-learning algorithms in seismology.Here,we present an earthquake dataset based on the ChinArray Phase I records(X1).ChinArray Phase I was deplo... High-quality datasets are critical for the development of advanced machine-learning algorithms in seismology.Here,we present an earthquake dataset based on the ChinArray Phase I records(X1).ChinArray Phase I was deployed in the southern north-south seismic zone(20°N-32°N,95°E-110°E)in 2011-2013 using 355 portable broadband seismic stations.CREDIT-X1local,the first release of the ChinArray Reference Earthquake Dataset for Innovative Techniques(CREDIT),includes comprehensive information for the 105,455 local events that occurred in the southern north-south seismic zone during array observation,incorporating them into a single HDF5 file.Original 100-Hz sampled three-component waveforms are organized by event for stations within epicenter distances of 1,000 km,and records of≥200 s are included for each waveform.Two types of phase labels are provided.The first includes manually picked labels for 5,999 events with magnitudes≥2.0,providing 66,507 Pg,42,310 Sg,12,823 Pn,and 546 Sn phases.The second contains automatically labeled phases for 105,442 events with magnitudes of−1.6 to 7.6.These phases were picked using a recurrent neural network phase picker and screened using the corresponding travel time curves,resulting in 1,179,808 Pg,884,281 Sg,176,089 Pn,and 22,986 Sn phases.Additionally,first-motion polarities are included for 31,273 Pg phases.The event and station locations are provided,so that deep learning networks for both conventional phase picking and phase association can be trained and validated.The CREDIT-X1local dataset is the first million-scale dataset constructed from a dense seismic array,which is designed to support various multi-station deep-learning methods,high-precision focal mechanism inversion,and seismic tomography studies.Additionally,owing to the high seismicity in the southern north-south seismic zone in China,this dataset has great potential for future scientific discoveries. 展开更多
关键词 earthquake dataset machine learning Pg/Sg/Pn/Sn phase picking P-wave first-motion polarity
下载PDF
Deep Learning Recognition for Arabic Alphabet Sign Language RGB Dataset
13
作者 Rabie El Kharoua Xiaoming Jiang 《Journal of Computer and Communications》 2024年第3期32-51,共20页
This paper introduces a Convolutional Neural Network (CNN) model for Arabic Sign Language (AASL) recognition, using the AASL dataset. Recognizing the fundamental importance of communication for the hearing-impaired, e... This paper introduces a Convolutional Neural Network (CNN) model for Arabic Sign Language (AASL) recognition, using the AASL dataset. Recognizing the fundamental importance of communication for the hearing-impaired, especially within the Arabic-speaking deaf community, the study emphasizes the critical role of sign language recognition systems. The proposed methodology achieves outstanding accuracy, with the CNN model reaching 99.9% accuracy on the training set and a validation accuracy of 97.4%. This study not only establishes a high-accuracy AASL recognition model but also provides insights into effective dropout strategies. The achieved high accuracy rates position the proposed model as a significant advancement in the field, holding promise for improved communication accessibility for the Arabic-speaking deaf community. 展开更多
关键词 Convolutional Neural Network (CNN) AASL dataset DROPOUT Deep Learning Communication Technology
下载PDF
M3SC:A Generic Dataset for Mixed Multi-Modal(MMM)Sensing and Communication Integration 被引量:3
14
作者 Xiang Cheng Ziwei Huang +6 位作者 Lu Bai Haotian Zhang Mingran Sun Boxun Liu Sijiang Li Jianan Zhang Minson Lee 《China Communications》 SCIE CSCD 2023年第11期13-29,共17页
The sixth generation(6G)of mobile communication system is witnessing a new paradigm shift,i.e.,integrated sensing-communication system.A comprehensive dataset is a prerequisite for 6G integrated sensing-communication ... The sixth generation(6G)of mobile communication system is witnessing a new paradigm shift,i.e.,integrated sensing-communication system.A comprehensive dataset is a prerequisite for 6G integrated sensing-communication research.This paper develops a novel simulation dataset,named M3SC,for mixed multi-modal(MMM)sensing-communication integration,and the generation framework of the M3SC dataset is further given.To obtain multimodal sensory data in physical space and communication data in electromagnetic space,we utilize Air-Sim and WaveFarer to collect multi-modal sensory data and exploit Wireless InSite to collect communication data.Furthermore,the in-depth integration and precise alignment of AirSim,WaveFarer,andWireless InSite are achieved.The M3SC dataset covers various weather conditions,multiplex frequency bands,and different times of the day.Currently,the M3SC dataset contains 1500 snapshots,including 80 RGB images,160 depth maps,80 LiDAR point clouds,256 sets of mmWave waveforms with 8 radar point clouds,and 72 channel impulse response(CIR)matrices per snapshot,thus totaling 120,000 RGB images,240,000 depth maps,120,000 LiDAR point clouds,384,000 sets of mmWave waveforms with 12,000 radar point clouds,and 108,000 CIR matrices.The data processing result presents the multi-modal sensory information and communication channel statistical properties.Finally,the MMM sensing-communication application,which can be supported by the M3SC dataset,is discussed. 展开更多
关键词 multi-modal sensing RAY-TRACING sensing-communication integration simulation dataset
下载PDF
DiTing:A large-scale Chinese seismic benchmark dataset for artificial intelligence in seismology 被引量:4
15
作者 Ming Zhao Zhuowei Xiao +1 位作者 Shi Chen Lihua Fang 《Earthquake Science》 2023年第2期84-94,共11页
In recent years,artificial intelligence technology has exhibited great potential in seismic signal recognition,setting off a new wave of research.Vast amounts of high-quality labeled data are required to develop and a... In recent years,artificial intelligence technology has exhibited great potential in seismic signal recognition,setting off a new wave of research.Vast amounts of high-quality labeled data are required to develop and apply artificial intelligence in seismology research.In this study,based on the 2013–2020 seismic cataloging reports of the China Earthquake Networks Center,we constructed an artificial intelligence seismological training dataset(“DiTing”)with the largest known total time length.Data were recorded using broadband and short-period seismometers.The obtained dataset included 2,734,748 threecomponent waveform traces from 787,010 regional seismic events,the corresponding P-and S-phase arrival time labels,and 641,025 P-wave first-motion polarity labels.All waveforms were sampled at 50 Hz and cut to a time length of 180 s starting from a random number of seconds before the occurrence of an earthquake.Each three-component waveform contained a considerable amount of descriptive information,such as the epicentral distance,back azimuth,and signal-to-noise ratios.The magnitudes of seismic events,epicentral distance,signal-to-noise ratio of P-wave data,and signal-to-noise ratio of S-wave data ranged from 0 to 7.7,0 to 330 km,–0.05 to 5.31 dB,and–0.05 to 4.73 dB,respectively.The dataset compiled in this study can serve as a high-quality benchmark for machine learning model development and data-driven seismological research on earthquake detection,seismic phase picking,first-motion polarity determination,earthquake magnitude prediction,early warning systems,and strong ground-motion prediction.Such research will further promote the development and application of artificial intelligence in seismology. 展开更多
关键词 artificial intelligence benchmark dataset earthquake detection seismic phase identification first-motion polarity
下载PDF
Pavement Cracks Coupled With Shadows:A New Shadow-Crack Dataset and A Shadow-Removal-Oriented Crack Detection Approach 被引量:2
16
作者 Lili Fan Shen Li +3 位作者 Ying Li Bai Li Dongpu Cao Fei-Yue Wang 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2023年第7期1593-1607,共15页
Automatic pavement crack detection is a critical task for maintaining the pavement stability and driving safety.The task is challenging because the shadows on the pavement may have similar intensity with the crack,whi... Automatic pavement crack detection is a critical task for maintaining the pavement stability and driving safety.The task is challenging because the shadows on the pavement may have similar intensity with the crack,which interfere with the crack detection performance.Till to the present,there still lacks efficient algorithm models and training datasets to deal with the interference brought by the shadows.To fill in the gap,we made several contributions as follows.First,we proposed a new pavement shadow and crack dataset,which contains a variety of shadow and pavement pixel size combinations.It also covers all common cracks(linear cracks and network cracks),placing higher demands on crack detection methods.Second,we designed a two-step shadow-removal-oriented crack detection approach:SROCD,which improves the performance of the algorithm by first removing the shadow and then detecting it.In addition to shadows,the method can cope with other noise disturbances.Third,we explored the mechanism of how shadows affect crack detection.Based on this mechanism,we propose a data augmentation method based on the difference in brightness values,which can adapt to brightness changes caused by seasonal and weather changes.Finally,we introduced a residual feature augmentation algorithm to detect small cracks that can predict sudden disasters,and the algorithm improves the performance of the model overall.We compare our method with the state-of-the-art methods on existing pavement crack datasets and the shadow-crack dataset,and the experimental results demonstrate the superiority of our method. 展开更多
关键词 Automatic pavement crack detection data augmentation compensation deep learning residual feature augmentation shadow removal shadow-crack dataset
下载PDF
An Imbalanced Dataset and Class Overlapping Classification Model for Big Data 被引量:1
17
作者 Mini Prince P.M.Joe Prathap 《Computer Systems Science & Engineering》 SCIE EI 2023年第2期1009-1024,共16页
Most modern technologies,such as social media,smart cities,and the internet of things(IoT),rely on big data.When big data is used in the real-world applications,two data challenges such as class overlap and class imba... Most modern technologies,such as social media,smart cities,and the internet of things(IoT),rely on big data.When big data is used in the real-world applications,two data challenges such as class overlap and class imbalance arises.When dealing with large datasets,most traditional classifiers are stuck in the local optimum problem.As a result,it’s necessary to look into new methods for dealing with large data collections.Several solutions have been proposed for overcoming this issue.The rapid growth of the available data threatens to limit the usefulness of many traditional methods.Methods such as oversampling and undersampling have shown great promises in addressing the issues of class imbalance.Among all of these techniques,Synthetic Minority Oversampling TechniquE(SMOTE)has produced the best results by generating synthetic samples for the minority class in creating a balanced dataset.The issue is that their practical applicability is restricted to problems involving tens of thousands or lower instances of each.In this paper,we have proposed a parallel mode method using SMOTE and MapReduce strategy,this distributes the operation of the algorithm among a group of computational nodes for addressing the aforementioned problem.Our proposed solution has been divided into three stages.Thefirst stage involves the process of splitting the data into different blocks using a mapping function,followed by a pre-processing step for each mapping block that employs a hybrid SMOTE algo-rithm for solving the class imbalanced problem.On each map block,a decision tree model would be constructed.Finally,the decision tree blocks would be com-bined for creating a classification model.We have used numerous datasets with up to 4 million instances in our experiments for testing the proposed scheme’s cap-abilities.As a result,the Hybrid SMOTE appears to have good scalability within the framework proposed,and it also cuts down the processing time. 展开更多
关键词 Imbalanced dataset class overlapping SMOTE MAPREDUCE parallel programming OVERSAMPLING
下载PDF
AWSD:An Aircraft Wing Dataset Created by an Automatic Workflow for Data Mining in Geometric Processing
18
作者 Xiang Su Nan Li +1 位作者 Yuedi Hu Haisheng Li 《Computer Modeling in Engineering & Sciences》 SCIE EI 2023年第9期2935-2952,共18页
This paper introduces an aircraft wing simulation data set(AWSD)created by an automatic workflow based on creating models,meshing.simulating the wing flight flow field solution,and parameterizing solution results.AWSD... This paper introduces an aircraft wing simulation data set(AWSD)created by an automatic workflow based on creating models,meshing.simulating the wing flight flow field solution,and parameterizing solution results.AWSD is a flexible,independent wing collection of simulations with specific engineering requirements.The data set is applicable to handle computer geometry processing tasks.In contrast to the existing 3D model data set,there are some advantages the scale of this data set is not limited by the collection source,the data files have high quality,no defects,redundancy,and other problems,and the models and simulation are all designed for the specific actual engineering demand.Moreover,AWSD has the characteristics of rich information and a similar model structure,which contributes to the construction of the surrogate model.On the other hand,this data set is suitable for advancing research of data mining in computational geometry graphics.To solve the problem that the CFD flows field results are not intuitive,this paper used the resampling method of surface data to sample the result to the model surface,then segmented the re sampled 3D mesh surface,and compared with the diferences among K-means algorithm,Mini_Batch K means algorithm,and Spectral Clustering algorithm.AWSD provides 300 sets of models,meshes,CFD simulation results,and parametric results based on ARAP(As-Rigid-As Posible)and Harmonic mapping for advancing the construction of engineering surrogate models,3D mesh segmentation,surface resampling,and related geometric processing tasks. 展开更多
关键词 dataset CFD mesh PARAMETERIZATION parametric modeling
下载PDF
MBB-IoT:Construction and Evaluation of IoT DDoS Traffic Dataset from a New Perspective
19
作者 Yi Qing Xiangyu Liu Yanhui Du 《Computers, Materials & Continua》 SCIE EI 2023年第8期2095-2119,共25页
Distributed Denial of Service(DDoS)attacks have always been a major concern in the security field.With the release of malware source codes such as BASHLITE and Mirai,Internet of Things(IoT)devices have become the new ... Distributed Denial of Service(DDoS)attacks have always been a major concern in the security field.With the release of malware source codes such as BASHLITE and Mirai,Internet of Things(IoT)devices have become the new source of DDoS attacks against many Internet applications.Although there are many datasets in the field of IoT intrusion detection,such as Bot-IoT,ConstrainedApplication Protocol–Denial of Service(CoAPDoS),and LATAM-DDoS-IoT(some of the names of DDoS datasets),which mainly focus on DDoS attacks,the datasets describing new IoT DDoS attack scenarios are extremely rare,and only N-BaIoT and IoT-23 datasets used IoT devices as DDoS attackers in the construction process,while they did not use Internet applications as victims either.To supplement the description of the new trend of DDoS attacks in the dataset,we built an IoT environment with mainstream DDoS attack tools such as Mirai and BASHLITE being used to infect IoT devices and implement DDoS attacks against WEB servers.Then,data aggregated into a dataset namedMBB-IoTwere captured atWEBservers and IoT nodes.After the MBB-IoT dataset was split into a training set and a test set,it was applied to the training and testing of the Random Forests classification algorithm.The multi-class classification metrics were good and all above 90%.Secondly,in a cross-evaluation experiment based on Support Vector Machine(SVM),Light Gradient Boosting Machine(LightGBM),and Long Short Term Memory networks(LSTM)classification algorithms,the training set and test set were derived from different datasets(MBB-IoT or IoT-23),and the test performance is better when MBB-IoT is used as the training set. 展开更多
关键词 Intrusion detection IOT MALWARE BOTNET DDOS dataset
下载PDF
Active Learning Strategies for Textual Dataset-Automatic Labelling
20
作者 Sher Muhammad Daudpota Saif Hassan +2 位作者 Yazeed Alkhurayyif Abdullah Saleh Alqahtani Muhammad Haris Aziz 《Computers, Materials & Continua》 SCIE EI 2023年第8期1409-1422,共14页
The Internet revolution has resulted in abundant data from various sources,including social media,traditional media,etcetera.Although the availability of data is no longer an issue,data labelling for exploiting it in ... The Internet revolution has resulted in abundant data from various sources,including social media,traditional media,etcetera.Although the availability of data is no longer an issue,data labelling for exploiting it in supervised machine learning is still an expensive process and involves tedious human efforts.The overall purpose of this study is to propose a strategy to automatically label the unlabeled textual data with the support of active learning in combination with deep learning.More specifically,this study assesses the performance of different active learning strategies in automatic labelling of the textual dataset at sentence and document levels.To achieve this objective,different experiments have been performed on the publicly available dataset.In first set of experiments,we randomly choose a subset of instances from training dataset and train a deep neural network to assess performance on test set.In the second set of experiments,we replace the random selection with different active learning strategies to choose a subset of the training dataset to train the same model and reassess its performance on test set.The experimental results suggest that different active learning strategies yield performance improvement of 7% on document level datasets and 3%on sentence level datasets for auto labelling. 展开更多
关键词 Active learning automatic labelling textual datasets
下载PDF
上一页 1 2 142 下一页 到第
使用帮助 返回顶部