We have developed an automatic regional focal mechanism inversion system based on the Earthquake Rapid Report(ERR) system and the real-time three-component seismic waveform stream of 1 000 broadband seismic stations p...We have developed an automatic regional focal mechanism inversion system based on the Earthquake Rapid Report(ERR) system and the real-time three-component seismic waveform stream of 1 000 broadband seismic stations provided by the China Earthquake Networks Center(CENC). The system can rapidly provide a double couple solution and centroid depth within 5–15 min after receiving earthquake information from the ERR system.The data processing is triggered by earthquake information obtained from the ERR system. The system is capable of determining the focal mechanism of all shallow-depth earthquakes in the Chinese mainland with a magnitude of 5.5–6.5. It utilizes waveform data recorded by seismic stations located within 500 km from the epicenter,enabling the reporting of a focal mechanism solution within 5–15 min of an earthquake occurrence. Additionally,the system can assign a corresponding grade(A B C) to the focal mechanism solution. We processed a total of 301earthquakes that occurred from 2021 to June 2022, and after the quality control, 166 of them were selected.These selected solutions were manually checked, and 160 of them were compiled in a focal mechanism catalog.This catalog can be conveniently downloaded online via the Internet. The automatic focal mechanism solution of earthquakes in eastern China exhibits a good agreement with that provided by the Global Centroid Moment Tensor(GCMT), when available. The average Kagan angle between this catalog and GCMT is 22°, and the average difference in MWis 0.17. Furthermore, compared with GCMT, the minimum magnitude of our catalog has been reduced from approximately 5.0 to 4.0. The correlation between the centroid depth and crustal thickness in the Chinese mainland confirms the distribution of the centroid depth.展开更多
Purpose:Automatic keyphrase extraction(AKE)is an important task for grasping the main points of the text.In this paper,we aim to combine the benefits of sequence labeling formulation and pretrained language model to p...Purpose:Automatic keyphrase extraction(AKE)is an important task for grasping the main points of the text.In this paper,we aim to combine the benefits of sequence labeling formulation and pretrained language model to propose an automatic keyphrase extraction model for Chinese scientific research.Design/methodology/approach:We regard AKE from Chinese text as a character-level sequence labeling task to avoid segmentation errors of Chinese tokenizer and initialize our model with pretrained language model BERT,which was released by Google in 2018.We collect data from Chinese Science Citation Database and construct a large-scale dataset from medical domain,which contains 100,000 abstracts as training set,6,000 abstracts as development set and 3,094 abstracts as test set.We use unsupervised keyphrase extraction methods including term frequency(TF),TF-IDF,TextRank and supervised machine learning methods including Conditional Random Field(CRF),Bidirectional Long Short Term Memory Network(BiLSTM),and BiLSTM-CRF as baselines.Experiments are designed to compare word-level and character-level sequence labeling approaches on supervised machine learning models and BERT-based models.Findings:Compared with character-level BiLSTM-CRF,the best baseline model with F1 score of 50.16%,our character-level sequence labeling model based on BERT obtains F1 score of 59.80%,getting 9.64%absolute improvement.Research limitations:We just consider automatic keyphrase extraction task rather than keyphrase generation task,so only keyphrases that are occurred in the given text can be extracted.In addition,our proposed dataset is not suitable for dealing with nested keyphrases.Practical implications:We make our character-level IOB format dataset of Chinese Automatic Keyphrase Extraction from scientific Chinese medical abstracts(CAKE)publicly available for the benefits of research community,which is available at:https://github.com/possible1402/Dataset-For-Chinese-Medical-Keyphrase-Extraction.Originality/value:By designing comparative experiments,our study demonstrates that character-level formulation is more suitable for Chinese automatic keyphrase extraction task under the general trend of pretrained language models.And our proposed dataset provides a unified method for model evaluation and can promote the development of Chinese automatic keyphrase extraction to some extent.展开更多
This Paper presents an automatic programming system developed for NC laser onning of Chinese charaCterswhich combines AutoCAD with NC Under Windows environment, and generates numerical controlled Programs by extractin...This Paper presents an automatic programming system developed for NC laser onning of Chinese charaCterswhich combines AutoCAD with NC Under Windows environment, and generates numerical controlled Programs by extracting outline (stroke) figures of Chinese Characters from outline character base, optimizing cutting routes, transferring them to AutoCAD for editing operations, such as rotation transfer, enlarging an mirroring, the system can transferto a laser cutting machine througy RS-232 series interface to accomplish laser cutting of large Chinese characters.展开更多
Until now, most of the kitchen works are done manually, which often make people bored and suffer from cooking oil smoke pollution. With the development of the robotic technology, it becomes more and more urgent for th...Until now, most of the kitchen works are done manually, which often make people bored and suffer from cooking oil smoke pollution. With the development of the robotic technology, it becomes more and more urgent for the appearance of the automatic cooking machines that can substitute man for most of those works. With this aim in mind, in this paper, a kind of automatic cooking robot is presented, which mainly consists of five parts : the wok mechanism, the stirring-fry and dispersing mechanism, the feeding mechanism, fire control system and the assistant ingredients processing mechanism. Experiment results have proved that the robot has achieved the goal with respect to appearance, smell, and taste of the dishes cooked by the robot, according to the master cook's view.展开更多
Automatic word-segmentation is widely used in the ambiguity cancellation when processing large-scale real text,but during the process of unknown word detection in Chinese word segmentation,many detected word candidate...Automatic word-segmentation is widely used in the ambiguity cancellation when processing large-scale real text,but during the process of unknown word detection in Chinese word segmentation,many detected word candidates are invalid.These false unknown word candidates deteriorate the overall segmentation accuracy,as it will affect the segmentation accuracy of known words.In this paper,we propose several methods for reducing the difficulties and improving the accuracy of the word-segmentation of written Chinese,such as full segmentation of a sentence,processing the duplicative word,idioms and statistical identification for unknown words.A simulation shows the feasibility of our proposed methods in improving the accuracy of word-segmentation of Chinese.展开更多
Purpose: The thrust of this paper is to present a method for improving the accuracy of automatic indexing of Chinese-English mixed documents.Design/methodology/approach: Based on the inherent characteristics of Chines...Purpose: The thrust of this paper is to present a method for improving the accuracy of automatic indexing of Chinese-English mixed documents.Design/methodology/approach: Based on the inherent characteristics of Chinese-English mixed texts and the cybernetics theory,we proposed an integrated control method for indexing documents. It consists of 'feed-forward control','in-progress control' and 'feed-back control',aiming at improving the accuracy of automatic indexing of Chinese-English mixed documents. An experiment was conducted to investigate the effect of our proposed method.Findings: This method distinguishes Chinese and English documents in grammatical structures and word formation rules. Through the implementation of this method in the three phases of automatic indexing for the Chinese-English mixed documents,the results were encouraging. The precision increased from 88.54% to 97.10% and recall improved from97.37% to 99.47%.Research limitations: The indexing method is relatively complicated and the whole indexing process requires substantial human intervention. Due to pattern matching based on a bruteforce(BF) approach,the indexing efficiency has been reduced to some extent.Practical implications: The research is of both theoretical significance and practical value in improving the accuracy of automatic indexing of multilingual documents(not confined to Chinese-English mixed documents). The proposed method will benefit not only the indexing of life science documents but also the indexing of documents in other subject areas.Originality/value: So far,few studies have been published about the method for increasing the accuracy of multilingual automatic indexing. This study will provide insights into the automatic indexing of multilingual documents,especially Chinese-English mixed documents.展开更多
With the popularity of the automatic precipitation gauges in national weather stations,testing their performance and adjusting their measurements are top priorities. Additionally,because different climatic conditions ...With the popularity of the automatic precipitation gauges in national weather stations,testing their performance and adjusting their measurements are top priorities. Additionally,because different climatic conditions may have different effects on the performance of the precipitation gauges, it is also necessary to test the gauges in different areas. This study mainly analyzed precipitation measurements from the single-Altershielded TRwS204 automatic weighing gauge(TRwS_(SA)) relative to the adjusted manual measurements(reference precipitation) from the Chinese standard precipitation gauge in a doublefence wind shield(CSPG_(DF)) in the Hulu watershed in the Qilian Mountains, China. The measurements were compared over the period from August 2014 to July2017, and the transfer function derived from the work by Kochendorfer et al.(2017 a) for correcting windinduced losses was applied to the TRwS_(SA) measurements. The results show that the average loss of TRwS_(SA) measurements relative to the reference precipitation decreased from 0.55 mm(10.7%) to 0.51 mm(9.9%) for rainfall events, from 0.35 mm(8.5%)to 0.22 mm(5.3%) for sleet events, and from 0.49 mm(18.9%) to 0.33 mm(12.7%) for snowfall events after adjustment. The uncorrected large biases of TRwS_(SA) measurements are considered to be mainly caused by specific errors of TRwS_(SA), different gauge orifice area and random errors. These types of errors must be considered when comparing precipitation measurements for different gauge types, especially in the mountains.展开更多
Objectives:The aim of this study was to investigate and develop a data storage and exchange format for the process of automatic systematic reviews(ASR)of traditional Chinese medicine(TCM).Methods:A lightweight and com...Objectives:The aim of this study was to investigate and develop a data storage and exchange format for the process of automatic systematic reviews(ASR)of traditional Chinese medicine(TCM).Methods:A lightweight and commonly used data format,namely,JavaScript Object Notation(JSON),was introduced in this study.We designed a fully described data structure to collect TCM clinical trial information based on the JSON syntax.Results:A smart and powerful data format,JSON-ASR,was developed.JSON-ASR uses a plain-text data format in the form of key/value pairs and consists of six sections and more than 80 preset pairs.JSON-ASR adopts extensible structured arrays to support the situations of multi-groups and multi-outcomes.Conclusion:JSON-ASR has the characteristics of light weight,flexibility,and good scalability,which is suitable for the complex data of clinical evidence.展开更多
In traditional Chinese medicine, the coating on the tongue is considered to be a reflection of various pathologic factors. However, the conventional method to examine the tongue lacks an acceptable standard and does n...In traditional Chinese medicine, the coating on the tongue is considered to be a reflection of various pathologic factors. However, the conventional method to examine the tongue lacks an acceptable standard and does not provide the means for sharing information. This paper describes a segmentation method to extract tongue coatings. First, the tongue body was extracted from the original image using the watershed transform. Then, a threshold method was applied to the image to eliminate the light from the camera flash. Finally, a threshold method using the Otsu model in combination with a splitting-merging method was used in the red, green, and blue (RGB) space to extract the thin coating. The combination of the above two methods is applied in the hue, saturation, and value (HSV) space to extract the thick coating. The feasibility of this method is tested by experiments, and the accuracy of segmentation is 95.9%.展开更多
针对医学汉语水平考试(Medical Chinese Test,简称MCT)题库建设成本高昂及资料匮乏等问题,本文提出MCT对话流完形填空智能出题系统研发思路,旨在通过智能化方法实现对话流的高效生成和完形填空题的挖空出题。该系统主要包括对话流智能...针对医学汉语水平考试(Medical Chinese Test,简称MCT)题库建设成本高昂及资料匮乏等问题,本文提出MCT对话流完形填空智能出题系统研发思路,旨在通过智能化方法实现对话流的高效生成和完形填空题的挖空出题。该系统主要包括对话流智能生成和完形填空智能出题两部分。在对话流智能生成部分,该系统先依据电子病历构建知识图谱,再采用图神经网络实现基于知识图谱的问题与问题链的生成,从而获得对话流数据;在完形填空智能出题部分,该系统首先基于多维复杂医学约束知识对对话流文本进行篇章语义解析,筛选出符合要求的对话流作为出题语料,然后进行对话流文本的知识标注,最后完成挖空并生成干扰项。结果表明,该系统能够有效地生成大量MCT对话流完形填空题目,经人工检测与评估,所生成的题目质量较高。展开更多
基金sponsored by the China Spark Program of Earthquake Science and Technology(XH23051B).
文摘We have developed an automatic regional focal mechanism inversion system based on the Earthquake Rapid Report(ERR) system and the real-time three-component seismic waveform stream of 1 000 broadband seismic stations provided by the China Earthquake Networks Center(CENC). The system can rapidly provide a double couple solution and centroid depth within 5–15 min after receiving earthquake information from the ERR system.The data processing is triggered by earthquake information obtained from the ERR system. The system is capable of determining the focal mechanism of all shallow-depth earthquakes in the Chinese mainland with a magnitude of 5.5–6.5. It utilizes waveform data recorded by seismic stations located within 500 km from the epicenter,enabling the reporting of a focal mechanism solution within 5–15 min of an earthquake occurrence. Additionally,the system can assign a corresponding grade(A B C) to the focal mechanism solution. We processed a total of 301earthquakes that occurred from 2021 to June 2022, and after the quality control, 166 of them were selected.These selected solutions were manually checked, and 160 of them were compiled in a focal mechanism catalog.This catalog can be conveniently downloaded online via the Internet. The automatic focal mechanism solution of earthquakes in eastern China exhibits a good agreement with that provided by the Global Centroid Moment Tensor(GCMT), when available. The average Kagan angle between this catalog and GCMT is 22°, and the average difference in MWis 0.17. Furthermore, compared with GCMT, the minimum magnitude of our catalog has been reduced from approximately 5.0 to 4.0. The correlation between the centroid depth and crustal thickness in the Chinese mainland confirms the distribution of the centroid depth.
基金This work is supported by the project“Research on Methods and Technologies of Scientific Researcher Entity Linking and Subject Indexing”(Grant No.G190091)from the National Science Library,Chinese Academy of Sciencesthe project“Design and Research on a Next Generation of Open Knowledge Services System and Key Technologies”(2019XM55).
文摘Purpose:Automatic keyphrase extraction(AKE)is an important task for grasping the main points of the text.In this paper,we aim to combine the benefits of sequence labeling formulation and pretrained language model to propose an automatic keyphrase extraction model for Chinese scientific research.Design/methodology/approach:We regard AKE from Chinese text as a character-level sequence labeling task to avoid segmentation errors of Chinese tokenizer and initialize our model with pretrained language model BERT,which was released by Google in 2018.We collect data from Chinese Science Citation Database and construct a large-scale dataset from medical domain,which contains 100,000 abstracts as training set,6,000 abstracts as development set and 3,094 abstracts as test set.We use unsupervised keyphrase extraction methods including term frequency(TF),TF-IDF,TextRank and supervised machine learning methods including Conditional Random Field(CRF),Bidirectional Long Short Term Memory Network(BiLSTM),and BiLSTM-CRF as baselines.Experiments are designed to compare word-level and character-level sequence labeling approaches on supervised machine learning models and BERT-based models.Findings:Compared with character-level BiLSTM-CRF,the best baseline model with F1 score of 50.16%,our character-level sequence labeling model based on BERT obtains F1 score of 59.80%,getting 9.64%absolute improvement.Research limitations:We just consider automatic keyphrase extraction task rather than keyphrase generation task,so only keyphrases that are occurred in the given text can be extracted.In addition,our proposed dataset is not suitable for dealing with nested keyphrases.Practical implications:We make our character-level IOB format dataset of Chinese Automatic Keyphrase Extraction from scientific Chinese medical abstracts(CAKE)publicly available for the benefits of research community,which is available at:https://github.com/possible1402/Dataset-For-Chinese-Medical-Keyphrase-Extraction.Originality/value:By designing comparative experiments,our study demonstrates that character-level formulation is more suitable for Chinese automatic keyphrase extraction task under the general trend of pretrained language models.And our proposed dataset provides a unified method for model evaluation and can promote the development of Chinese automatic keyphrase extraction to some extent.
文摘This Paper presents an automatic programming system developed for NC laser onning of Chinese charaCterswhich combines AutoCAD with NC Under Windows environment, and generates numerical controlled Programs by extracting outline (stroke) figures of Chinese Characters from outline character base, optimizing cutting routes, transferring them to AutoCAD for editing operations, such as rotation transfer, enlarging an mirroring, the system can transferto a laser cutting machine througy RS-232 series interface to accomplish laser cutting of large Chinese characters.
文摘Until now, most of the kitchen works are done manually, which often make people bored and suffer from cooking oil smoke pollution. With the development of the robotic technology, it becomes more and more urgent for the appearance of the automatic cooking machines that can substitute man for most of those works. With this aim in mind, in this paper, a kind of automatic cooking robot is presented, which mainly consists of five parts : the wok mechanism, the stirring-fry and dispersing mechanism, the feeding mechanism, fire control system and the assistant ingredients processing mechanism. Experiment results have proved that the robot has achieved the goal with respect to appearance, smell, and taste of the dishes cooked by the robot, according to the master cook's view.
文摘Automatic word-segmentation is widely used in the ambiguity cancellation when processing large-scale real text,but during the process of unknown word detection in Chinese word segmentation,many detected word candidates are invalid.These false unknown word candidates deteriorate the overall segmentation accuracy,as it will affect the segmentation accuracy of known words.In this paper,we propose several methods for reducing the difficulties and improving the accuracy of the word-segmentation of written Chinese,such as full segmentation of a sentence,processing the duplicative word,idioms and statistical identification for unknown words.A simulation shows the feasibility of our proposed methods in improving the accuracy of word-segmentation of Chinese.
基金supported by the Shanghai International Studies University(Grant No.:2011114061)
文摘Purpose: The thrust of this paper is to present a method for improving the accuracy of automatic indexing of Chinese-English mixed documents.Design/methodology/approach: Based on the inherent characteristics of Chinese-English mixed texts and the cybernetics theory,we proposed an integrated control method for indexing documents. It consists of 'feed-forward control','in-progress control' and 'feed-back control',aiming at improving the accuracy of automatic indexing of Chinese-English mixed documents. An experiment was conducted to investigate the effect of our proposed method.Findings: This method distinguishes Chinese and English documents in grammatical structures and word formation rules. Through the implementation of this method in the three phases of automatic indexing for the Chinese-English mixed documents,the results were encouraging. The precision increased from 88.54% to 97.10% and recall improved from97.37% to 99.47%.Research limitations: The indexing method is relatively complicated and the whole indexing process requires substantial human intervention. Due to pattern matching based on a bruteforce(BF) approach,the indexing efficiency has been reduced to some extent.Practical implications: The research is of both theoretical significance and practical value in improving the accuracy of automatic indexing of multilingual documents(not confined to Chinese-English mixed documents). The proposed method will benefit not only the indexing of life science documents but also the indexing of documents in other subject areas.Originality/value: So far,few studies have been published about the method for increasing the accuracy of multilingual automatic indexing. This study will provide insights into the automatic indexing of multilingual documents,especially Chinese-English mixed documents.
基金supported primarily by the National Basic Research Program of China (2013CBA01806)the National Natural Sciences Foundation of China (41671029, 41690141, 41401040 and 41501040)
文摘With the popularity of the automatic precipitation gauges in national weather stations,testing their performance and adjusting their measurements are top priorities. Additionally,because different climatic conditions may have different effects on the performance of the precipitation gauges, it is also necessary to test the gauges in different areas. This study mainly analyzed precipitation measurements from the single-Altershielded TRwS204 automatic weighing gauge(TRwS_(SA)) relative to the adjusted manual measurements(reference precipitation) from the Chinese standard precipitation gauge in a doublefence wind shield(CSPG_(DF)) in the Hulu watershed in the Qilian Mountains, China. The measurements were compared over the period from August 2014 to July2017, and the transfer function derived from the work by Kochendorfer et al.(2017 a) for correcting windinduced losses was applied to the TRwS_(SA) measurements. The results show that the average loss of TRwS_(SA) measurements relative to the reference precipitation decreased from 0.55 mm(10.7%) to 0.51 mm(9.9%) for rainfall events, from 0.35 mm(8.5%)to 0.22 mm(5.3%) for sleet events, and from 0.49 mm(18.9%) to 0.33 mm(12.7%) for snowfall events after adjustment. The uncorrected large biases of TRwS_(SA) measurements are considered to be mainly caused by specific errors of TRwS_(SA), different gauge orifice area and random errors. These types of errors must be considered when comparing precipitation measurements for different gauge types, especially in the mountains.
基金the National Key R&D Program of China(Grant no.2019YFC1709803)National Natural Science Foundation of China(Grant no.81873183).
文摘Objectives:The aim of this study was to investigate and develop a data storage and exchange format for the process of automatic systematic reviews(ASR)of traditional Chinese medicine(TCM).Methods:A lightweight and commonly used data format,namely,JavaScript Object Notation(JSON),was introduced in this study.We designed a fully described data structure to collect TCM clinical trial information based on the JSON syntax.Results:A smart and powerful data format,JSON-ASR,was developed.JSON-ASR uses a plain-text data format in the form of key/value pairs and consists of six sections and more than 80 preset pairs.JSON-ASR adopts extensible structured arrays to support the situations of multi-groups and multi-outcomes.Conclusion:JSON-ASR has the characteristics of light weight,flexibility,and good scalability,which is suitable for the complex data of clinical evidence.
基金Supported by the National Key Basic Research and Development (973) Program of China(No.2006CB705701)National Natural Science Foundation of China(No.60373000)
文摘In traditional Chinese medicine, the coating on the tongue is considered to be a reflection of various pathologic factors. However, the conventional method to examine the tongue lacks an acceptable standard and does not provide the means for sharing information. This paper describes a segmentation method to extract tongue coatings. First, the tongue body was extracted from the original image using the watershed transform. Then, a threshold method was applied to the image to eliminate the light from the camera flash. Finally, a threshold method using the Otsu model in combination with a splitting-merging method was used in the red, green, and blue (RGB) space to extract the thin coating. The combination of the above two methods is applied in the hue, saturation, and value (HSV) space to extract the thick coating. The feasibility of this method is tested by experiments, and the accuracy of segmentation is 95.9%.
文摘针对医学汉语水平考试(Medical Chinese Test,简称MCT)题库建设成本高昂及资料匮乏等问题,本文提出MCT对话流完形填空智能出题系统研发思路,旨在通过智能化方法实现对话流的高效生成和完形填空题的挖空出题。该系统主要包括对话流智能生成和完形填空智能出题两部分。在对话流智能生成部分,该系统先依据电子病历构建知识图谱,再采用图神经网络实现基于知识图谱的问题与问题链的生成,从而获得对话流数据;在完形填空智能出题部分,该系统首先基于多维复杂医学约束知识对对话流文本进行篇章语义解析,筛选出符合要求的对话流作为出题语料,然后进行对话流文本的知识标注,最后完成挖空并生成干扰项。结果表明,该系统能够有效地生成大量MCT对话流完形填空题目,经人工检测与评估,所生成的题目质量较高。