The exponential growth of literature is constraining researchers’access to comprehensive information in related fields.While natural language processing(NLP)may offer an effective solution to literature classificatio...The exponential growth of literature is constraining researchers’access to comprehensive information in related fields.While natural language processing(NLP)may offer an effective solution to literature classification,it remains hindered by the lack of labelled dataset.In this article,we introduce a novel method for generating literature classification models through semi-supervised learning,which can generate labelled dataset iteratively with limited human input.We apply this method to train NLP models for classifying literatures related to several research directions,i.e.,battery,superconductor,topological material,and artificial intelligence(AI)in materials science.The trained NLP‘battery’model applied on a larger dataset different from the training and testing dataset can achieve F1 score of 0.738,which indicates the accuracy and reliability of this scheme.Furthermore,our approach demonstrates that even with insufficient data,the not-well-trained model in the first few cycles can identify the relationships among different research fields and facilitate the discovery and understanding of interdisciplinary directions.展开更多
This article discusses the current status and development strategies of computer science and technology in the context of big data.Firstly,it explains the relationship between big data and computer science and technol...This article discusses the current status and development strategies of computer science and technology in the context of big data.Firstly,it explains the relationship between big data and computer science and technology,focusing on analyzing the current application status of computer science and technology in big data,including data storage,data processing,and data analysis.Then,it proposes development strategies for big data processing.Computer science and technology play a vital role in big data processing by providing strong technical support.展开更多
Case-file backlogs were identified as one of the cause factors affecting the competitiveness of the forensic science laboratory (FSL). Backlogs represent case-files?that remain unprocessed or unreported within a selec...Case-file backlogs were identified as one of the cause factors affecting the competitiveness of the forensic science laboratory (FSL). Backlogs represent case-files?that remain unprocessed or unreported within a selected time interval (year, week or month) which leads to increased customer complaints, rework, cost of analysis, degradation of biological samples, etc. Case-file backlogging was quantified in three consecutive years (2014 to 2016), using the following parameters: case-files received and case-files processed, difference of which gives case-files backlogged. There was a need to define time interval for a case-file to be regarded as backlogged (that is, one week), results of which can translate into backlogged case-files per month or year. A data collection tool was established and used for three work stations (forensic chemistry, biology/DNA and toxicology laboratories). The tool includes starting and ending date for each?time interval, in which the numbers of case-files received and processed were entered followed by computing the backlogs. It was observed that, case-files reported?increased between 2014 and 2016 leading to a decrease in backlogged case-files.?The annual percentage of the case-files backlogged was highest for forensic?toxicology. The highest number of case-files backlogged was observed for forensic?chemistry, followed by forensic biology/DNA. The number of case-files?backlogged per analyst per year was highest in 2014 and dropped continuously?towards 2016, being comparably higher in forensic biology/DNA and chemistry.?Probability density functions (PDFs) and cumulative distribution functions (CDFs)?of backlogs data indicated that a large number of backlogs created in previous?weeks were eliminated. It was concluded that the effect of case-file backlogging on FSL competitiveness can be minimized by continued management effort in backlog elimination.展开更多
Purpose:This work aims to normalize the NLPCONTRIBUTIONS scheme(henceforward,NLPCONTRIBUTIONGRAPH)to structure,directly from article sentences,the contributions information in Natural Language Processing(NLP)scholarly...Purpose:This work aims to normalize the NLPCONTRIBUTIONS scheme(henceforward,NLPCONTRIBUTIONGRAPH)to structure,directly from article sentences,the contributions information in Natural Language Processing(NLP)scholarly articles via a two-stage annotation methodology:1)pilot stage-to define the scheme(described in prior work);and 2)adjudication stage-to normalize the graphing model(the focus of this paper).Design/methodology/approach:We re-annotate,a second time,the contributions-pertinent information across 50 prior-annotated NLP scholarly articles in terms of a data pipeline comprising:contribution-centered sentences,phrases,and triple statements.To this end,specifically,care was taken in the adjudication annotation stage to reduce annotation noise while formulating the guidelines for our proposed novel NLP contributions structuring and graphing scheme.Findings:The application of NLPCONTRIBUTIONGRAPH on the 50 articles resulted finally in a dataset of 900 contribution-focused sentences,4,702 contribution-information-centered phrases,and 2,980 surface-structured triples.The intra-annotation agreement between the first and second stages,in terms of F1-score,was 67.92%for sentences,41.82%for phrases,and 22.31%for triple statements indicating that with increased granularity of the information,the annotation decision variance is greater.Research limitations:NLPCONTRIBUTIONGRAPH has limited scope for structuring scholarly contributions compared with STEM(Science,Technology,Engineering,and Medicine)scholarly knowledge at large.Further,the annotation scheme in this work is designed by only an intra-annotator consensus-a single annotator first annotated the data to propose the initial scheme,following which,the same annotator reannotated the data to normalize the annotations in an adjudication stage.However,the expected goal of this work is to achieve a standardized retrospective model of capturing NLP contributions from scholarly articles.This would entail a larger initiative of enlisting multiple annotators to accommodate different worldviews into a“single”set of structures and relationships as the final scheme.Given that the initial scheme is first proposed and the complexity of the annotation task in the realistic timeframe,our intraannotation procedure is well-suited.Nevertheless,the model proposed in this work is presently limited since it does not incorporate multiple annotator worldviews.This is planned as future work to produce a robust model.Practical implications:We demonstrate NLPCONTRIBUTIONGRAPH data integrated into the Open Research Knowledge Graph(ORKG),a next-generation KG-based digital library with intelligent computations enabled over structured scholarly knowledge,as a viable aid to assist researchers in their day-to-day tasks.Originality/value:NLPCONTRIBUTIONGRAPH is a novel scheme to annotate research contributions from NLP articles and integrate them in a knowledge graph,which to the best of our knowledge does not exist in the community.Furthermore,our quantitative evaluations over the two-stage annotation tasks offer insights into task difficulty.展开更多
The rock mass engineering system (RMES) basically consists ofrock mass engineering (RME), water system and surroundingecological environments, etc. The RMES is characterized by nonlinearity,occurrence of chaos and...The rock mass engineering system (RMES) basically consists ofrock mass engineering (RME), water system and surroundingecological environments, etc. The RMES is characterized by nonlinearity,occurrence of chaos and self-organization (Tazaka, 1998;Tsuda, 1998; Kishida, 2000). From construction to abandonmentof RME, the RMES will experience four stages, i.e. initial phase,development phase, declining phase and failure phase. In thiscircumstance, the RMES boundary conditions, structural safetyand surrounding environments are varied at each phase, so arethe evolution characteristics and disasters (Wang et al., 2014).展开更多
The presented paper is dedicated to a new ret-rospective view on the history of natural sci-ences in XX-XXI cc, partially including the sci-ence philosophy (mainly, the problems of the scientific realism, i.e. the cor...The presented paper is dedicated to a new ret-rospective view on the history of natural sci-ences in XX-XXI cc, partially including the sci-ence philosophy (mainly, the problems of the scientific realism, i.e. the correspondence of science to reality) and also a novel scheme for different classes of sciences with different ob-jects and paradigms. There are analyzed the chosen “great” and “grand” problems of phys-ics (including the comprehension of quantum mechanics, with a recently elaborated new chapter, connected with time as a quantum obs- ervable and time analysis of quantum processes) and also of natural sciences as a whole. The particular attention is paid to the interpretation questions and slightly to the aspects, inevitably connected with the world- views of the res- earchers (which do often constitute a part of the interpretation questions).展开更多
A process of "Methanol or Dimethylether to Olefins" developed by Dalian Institute of Chemical Physics (DICP), designated as the DMTO process, has attained great success in industrial scaling up testing. DICP, by c...A process of "Methanol or Dimethylether to Olefins" developed by Dalian Institute of Chemical Physics (DICP), designated as the DMTO process, has attained great success in industrial scaling up testing. DICP, by collaborating with the Xinxing Coal Chemical Co., Ltd. of Shaanxi Province and the Luoyang Petrochemical Engineering Co. of the SINOPEC Group, operated successfully a 50t(methanol)/d unit for the conversion of methanol to lower olefins, with a methanol conversion of close to 100%, and a selectivity to lower olefins(ethylene, propylene and butylenes) of higher than 90%. On 23rd August, the industrial test project has passed a state appraisal. The experts of the Appraisal Group, headed by Prof. YUAN Qingtang, academician of Chinese Academy of Engineering, drew the conclusions that the DMTO process, by utilizing a proprietary SAPO-34 catalyst system and a recycling fluidized bed reaction system for the production of lower olefins from methanol, is the first unit in the world having a capacity of producing nearly ten thousand tons lower olefins per year. The technological level of the industrial test is at a leading position internationally. This accomplishment will provide a sound base for the subsequent commercialization of the DMTO process.展开更多
The networks are fundamental to our modern world and they appear throughout science and society.Access to a massive amount of data presents a unique opportunity to the researcher’s community.As networks grow in size ...The networks are fundamental to our modern world and they appear throughout science and society.Access to a massive amount of data presents a unique opportunity to the researcher’s community.As networks grow in size the complexity increases and our ability to analyze them using the current state of the art is at severe risk of failing to keep pace.Therefore,this paper initiates a discussion on graph signal processing for large-scale data analysis.We first provide a comprehensive overview of core ideas in Graph signal processing(GSP)and their connection to conventional digital signal processing(DSP).We then summarize recent developments in developing basic GSP tools,including methods for graph filtering or graph learning,graph signal,graph Fourier transform(GFT),spectrum,graph frequency,etc.Graph filtering is a basic task that allows for isolating the contribution of individual frequencies and therefore enables the removal of noise.We then consider a graph filter as a model that helps to extend the application of GSP methods to large datasets.To show the suitability and the effeteness,we first created a noisy graph signal and then applied it to the filter.After several rounds of simulation results.We see that the filtered signal appears to be smoother and is closer to the original noise-free distance-based signal.By using this example application,we thoroughly demonstrated that graph filtration is efficient for big data analytics.展开更多
With this work, we introduce a novel method for the unsupervised learning of conceptual hierarchies, or concept maps as they are sometimes called, which is aimed specifically for use with literary texts, as such disti...With this work, we introduce a novel method for the unsupervised learning of conceptual hierarchies, or concept maps as they are sometimes called, which is aimed specifically for use with literary texts, as such distinguishing itself from the majority of research literature on the topic which is primarily focused on building ontologies from a vast array of different types of data sources, both structured and unstructured, to support various forms of AI, in particular, the Semantic Web as envisioned by Tim Berners-Lee. We first elaborate on mutually informing disciplines of philosophy and computer science, or more specifically the relationship between metaphysics, epistemology, ontology, computing and AI, followed by a technically in-depth discussion of DEBRA, our dependency tree based concept hierarchy constructor, which as its name alludes to, constructs a conceptual map in the form of a directed graph which illustrates the concepts, their respective relations, and the implied ontological structure of the concepts as encoded in the text, decoded with standard Python NLP libraries such as spaCy and NLTK. With this work we hope to both augment the Knowledge Representation literature with opportunities for intellectual advancement in AI with more intuitive, less analytical, and well-known forms of knowledge representation from the cognitive science community, as well as open up new areas of research between Computer Science and the Humanities with respect to the application of the latest in NLP tools and techniques upon literature of cultural significance, shedding light on existing methods of computation with respect to documents in semantic space that effectively allows for, at the very least, the comparison and evolution of texts through time, using vector space math.展开更多
基金funded by the Informatization Plan of Chinese Academy of Sciences(Grant No.CASWX2021SF-0102)the National Key R&D Program of China(Grant Nos.2022YFA1603903,2022YFA1403800,and 2021YFA0718700)+1 种基金the National Natural Science Foundation of China(Grant Nos.11925408,11921004,and 12188101)the Chinese Academy of Sciences(Grant No.XDB33000000)。
文摘The exponential growth of literature is constraining researchers’access to comprehensive information in related fields.While natural language processing(NLP)may offer an effective solution to literature classification,it remains hindered by the lack of labelled dataset.In this article,we introduce a novel method for generating literature classification models through semi-supervised learning,which can generate labelled dataset iteratively with limited human input.We apply this method to train NLP models for classifying literatures related to several research directions,i.e.,battery,superconductor,topological material,and artificial intelligence(AI)in materials science.The trained NLP‘battery’model applied on a larger dataset different from the training and testing dataset can achieve F1 score of 0.738,which indicates the accuracy and reliability of this scheme.Furthermore,our approach demonstrates that even with insufficient data,the not-well-trained model in the first few cycles can identify the relationships among different research fields and facilitate the discovery and understanding of interdisciplinary directions.
文摘This article discusses the current status and development strategies of computer science and technology in the context of big data.Firstly,it explains the relationship between big data and computer science and technology,focusing on analyzing the current application status of computer science and technology in big data,including data storage,data processing,and data analysis.Then,it proposes development strategies for big data processing.Computer science and technology play a vital role in big data processing by providing strong technical support.
文摘Case-file backlogs were identified as one of the cause factors affecting the competitiveness of the forensic science laboratory (FSL). Backlogs represent case-files?that remain unprocessed or unreported within a selected time interval (year, week or month) which leads to increased customer complaints, rework, cost of analysis, degradation of biological samples, etc. Case-file backlogging was quantified in three consecutive years (2014 to 2016), using the following parameters: case-files received and case-files processed, difference of which gives case-files backlogged. There was a need to define time interval for a case-file to be regarded as backlogged (that is, one week), results of which can translate into backlogged case-files per month or year. A data collection tool was established and used for three work stations (forensic chemistry, biology/DNA and toxicology laboratories). The tool includes starting and ending date for each?time interval, in which the numbers of case-files received and processed were entered followed by computing the backlogs. It was observed that, case-files reported?increased between 2014 and 2016 leading to a decrease in backlogged case-files.?The annual percentage of the case-files backlogged was highest for forensic?toxicology. The highest number of case-files backlogged was observed for forensic?chemistry, followed by forensic biology/DNA. The number of case-files?backlogged per analyst per year was highest in 2014 and dropped continuously?towards 2016, being comparably higher in forensic biology/DNA and chemistry.?Probability density functions (PDFs) and cumulative distribution functions (CDFs)?of backlogs data indicated that a large number of backlogs created in previous?weeks were eliminated. It was concluded that the effect of case-file backlogging on FSL competitiveness can be minimized by continued management effort in backlog elimination.
基金This work was co-funded by the European Research Council for the project ScienceGRAPH(Grant agreement ID:819536)by the TIB Leibniz Information Centre for Science and Technology.
文摘Purpose:This work aims to normalize the NLPCONTRIBUTIONS scheme(henceforward,NLPCONTRIBUTIONGRAPH)to structure,directly from article sentences,the contributions information in Natural Language Processing(NLP)scholarly articles via a two-stage annotation methodology:1)pilot stage-to define the scheme(described in prior work);and 2)adjudication stage-to normalize the graphing model(the focus of this paper).Design/methodology/approach:We re-annotate,a second time,the contributions-pertinent information across 50 prior-annotated NLP scholarly articles in terms of a data pipeline comprising:contribution-centered sentences,phrases,and triple statements.To this end,specifically,care was taken in the adjudication annotation stage to reduce annotation noise while formulating the guidelines for our proposed novel NLP contributions structuring and graphing scheme.Findings:The application of NLPCONTRIBUTIONGRAPH on the 50 articles resulted finally in a dataset of 900 contribution-focused sentences,4,702 contribution-information-centered phrases,and 2,980 surface-structured triples.The intra-annotation agreement between the first and second stages,in terms of F1-score,was 67.92%for sentences,41.82%for phrases,and 22.31%for triple statements indicating that with increased granularity of the information,the annotation decision variance is greater.Research limitations:NLPCONTRIBUTIONGRAPH has limited scope for structuring scholarly contributions compared with STEM(Science,Technology,Engineering,and Medicine)scholarly knowledge at large.Further,the annotation scheme in this work is designed by only an intra-annotator consensus-a single annotator first annotated the data to propose the initial scheme,following which,the same annotator reannotated the data to normalize the annotations in an adjudication stage.However,the expected goal of this work is to achieve a standardized retrospective model of capturing NLP contributions from scholarly articles.This would entail a larger initiative of enlisting multiple annotators to accommodate different worldviews into a“single”set of structures and relationships as the final scheme.Given that the initial scheme is first proposed and the complexity of the annotation task in the realistic timeframe,our intraannotation procedure is well-suited.Nevertheless,the model proposed in this work is presently limited since it does not incorporate multiple annotator worldviews.This is planned as future work to produce a robust model.Practical implications:We demonstrate NLPCONTRIBUTIONGRAPH data integrated into the Open Research Knowledge Graph(ORKG),a next-generation KG-based digital library with intelligent computations enabled over structured scholarly knowledge,as a viable aid to assist researchers in their day-to-day tasks.Originality/value:NLPCONTRIBUTIONGRAPH is a novel scheme to annotate research contributions from NLP articles and integrate them in a knowledge graph,which to the best of our knowledge does not exist in the community.Furthermore,our quantitative evaluations over the two-stage annotation tasks offer insights into task difficulty.
基金funded by the National Natural Science Foundation of China(Grant Nos.51274110,51304108,U1361211)
文摘The rock mass engineering system (RMES) basically consists ofrock mass engineering (RME), water system and surroundingecological environments, etc. The RMES is characterized by nonlinearity,occurrence of chaos and self-organization (Tazaka, 1998;Tsuda, 1998; Kishida, 2000). From construction to abandonmentof RME, the RMES will experience four stages, i.e. initial phase,development phase, declining phase and failure phase. In thiscircumstance, the RMES boundary conditions, structural safetyand surrounding environments are varied at each phase, so arethe evolution characteristics and disasters (Wang et al., 2014).
文摘The presented paper is dedicated to a new ret-rospective view on the history of natural sci-ences in XX-XXI cc, partially including the sci-ence philosophy (mainly, the problems of the scientific realism, i.e. the correspondence of science to reality) and also a novel scheme for different classes of sciences with different ob-jects and paradigms. There are analyzed the chosen “great” and “grand” problems of phys-ics (including the comprehension of quantum mechanics, with a recently elaborated new chapter, connected with time as a quantum obs- ervable and time analysis of quantum processes) and also of natural sciences as a whole. The particular attention is paid to the interpretation questions and slightly to the aspects, inevitably connected with the world- views of the res- earchers (which do often constitute a part of the interpretation questions).
文摘A process of "Methanol or Dimethylether to Olefins" developed by Dalian Institute of Chemical Physics (DICP), designated as the DMTO process, has attained great success in industrial scaling up testing. DICP, by collaborating with the Xinxing Coal Chemical Co., Ltd. of Shaanxi Province and the Luoyang Petrochemical Engineering Co. of the SINOPEC Group, operated successfully a 50t(methanol)/d unit for the conversion of methanol to lower olefins, with a methanol conversion of close to 100%, and a selectivity to lower olefins(ethylene, propylene and butylenes) of higher than 90%. On 23rd August, the industrial test project has passed a state appraisal. The experts of the Appraisal Group, headed by Prof. YUAN Qingtang, academician of Chinese Academy of Engineering, drew the conclusions that the DMTO process, by utilizing a proprietary SAPO-34 catalyst system and a recycling fluidized bed reaction system for the production of lower olefins from methanol, is the first unit in the world having a capacity of producing nearly ten thousand tons lower olefins per year. The technological level of the industrial test is at a leading position internationally. This accomplishment will provide a sound base for the subsequent commercialization of the DMTO process.
基金supported in part by Basic Science Research Program through the National Research Foundation of Korea(NRF)funded by the Ministry of Education(NRF-2019R1A2C1006159)and(NRF-2021R1A6A1A03039493)by the 2021 Yeungnam University Research Grant.
文摘The networks are fundamental to our modern world and they appear throughout science and society.Access to a massive amount of data presents a unique opportunity to the researcher’s community.As networks grow in size the complexity increases and our ability to analyze them using the current state of the art is at severe risk of failing to keep pace.Therefore,this paper initiates a discussion on graph signal processing for large-scale data analysis.We first provide a comprehensive overview of core ideas in Graph signal processing(GSP)and their connection to conventional digital signal processing(DSP).We then summarize recent developments in developing basic GSP tools,including methods for graph filtering or graph learning,graph signal,graph Fourier transform(GFT),spectrum,graph frequency,etc.Graph filtering is a basic task that allows for isolating the contribution of individual frequencies and therefore enables the removal of noise.We then consider a graph filter as a model that helps to extend the application of GSP methods to large datasets.To show the suitability and the effeteness,we first created a noisy graph signal and then applied it to the filter.After several rounds of simulation results.We see that the filtered signal appears to be smoother and is closer to the original noise-free distance-based signal.By using this example application,we thoroughly demonstrated that graph filtration is efficient for big data analytics.
文摘With this work, we introduce a novel method for the unsupervised learning of conceptual hierarchies, or concept maps as they are sometimes called, which is aimed specifically for use with literary texts, as such distinguishing itself from the majority of research literature on the topic which is primarily focused on building ontologies from a vast array of different types of data sources, both structured and unstructured, to support various forms of AI, in particular, the Semantic Web as envisioned by Tim Berners-Lee. We first elaborate on mutually informing disciplines of philosophy and computer science, or more specifically the relationship between metaphysics, epistemology, ontology, computing and AI, followed by a technically in-depth discussion of DEBRA, our dependency tree based concept hierarchy constructor, which as its name alludes to, constructs a conceptual map in the form of a directed graph which illustrates the concepts, their respective relations, and the implied ontological structure of the concepts as encoded in the text, decoded with standard Python NLP libraries such as spaCy and NLTK. With this work we hope to both augment the Knowledge Representation literature with opportunities for intellectual advancement in AI with more intuitive, less analytical, and well-known forms of knowledge representation from the cognitive science community, as well as open up new areas of research between Computer Science and the Humanities with respect to the application of the latest in NLP tools and techniques upon literature of cultural significance, shedding light on existing methods of computation with respect to documents in semantic space that effectively allows for, at the very least, the comparison and evolution of texts through time, using vector space math.