期刊文献+
共找到21,960篇文章
< 1 2 250 >
每页显示 20 50 100
Functions of Karez to Xinjiang Agriculture in the Qing Dynasty from the Perspective of Historical Documents
1
作者 Danyang GONG 《Asian Agricultural Research》 2023年第3期70-71,共2页
Desertification is increasingly serious in Xinjiang,and the construction of water conservancy is a precondition for the development of agriculture.The main project for the development of agriculture and water conserva... Desertification is increasingly serious in Xinjiang,and the construction of water conservancy is a precondition for the development of agriculture.The main project for the development of agriculture and water conservancy in Xinjiang is to build Karez,which played a vital role in the development of Xinjiang agriculture in the Qing Dynasty.It has been recorded many times in historical documents of the Qing Dynasty,such as Lin Zexu s Diary,Tao Baolian s Diary,Xinjiang Atlas and Zuo Zongtang s Memorial to the Emperor,etc.,which recorded the situation and historical origin of Karez.Karez made a significant contribution to the development of agriculture in the Qing Dynasty.It increased the cultivated land in Xinjiang at that time,and increased the types and yields of crops.It is conducive to the stability and development of Xinjiang s economy.Until today,Karez is still an important water source for agricultural irrigation in Xinjiang. 展开更多
关键词 KAREZ Historical documents in the Qing Dynasty Xinjiang agriculture
下载PDF
Human Rights in Civil Judicial Documents:Conception and Function
2
作者 郑若瀚 《The Journal of Human Rights》 2023年第4期851-868,共18页
Traditional human rights theory tends to hold that human rights should be aimed at defending public authority and that the legal issue of human rights is a matter of public law.However,the development of human rights ... Traditional human rights theory tends to hold that human rights should be aimed at defending public authority and that the legal issue of human rights is a matter of public law.However,the development of human rights concepts and practices is not just confined to this.A textual search shows that the term“human rights”exists widely in China’s civil judicial documents.Among the 3,412 civil judicial documents we researched,the concept of“human rights”penetrates all kinds of disputes in lawsuits,ranging from property rights,contracts,labor,and torts to marital property,which is embedded in both the claims of the parties concerned and the reasoning of judges.Human rights have become the discourse and yardstick for understanding and evaluating social behavior.The widespread use of the term“human rights”in civil judicial documents reflects at least three concepts related to human rights:first,the rights to subsistence and development are the primary basic human rights;second,the judicial protection of human rights is a bottom-line guarantee;third,the protection of human rights aims to achieve equal rights.Today,judges quote the theory of human rights in judicial judgments from time to time,evidencing that human rights have a practical function in judicial adjudication activities,and in practice this is mainly manifested in declaring righteous values and strengthening arguments with the values and ideas related to human rights,using the provisions concerning human rights in the Constitution to interpret the constitutionality,and using the principles of human rights to interpret blurred rules and rank the importance of different rights. 展开更多
关键词 human rights concept of human rights civil judicature judicial documents judicial reasons
下载PDF
An explorative study on document type assignment of review articles in Web of Science,Scopus and journals’websites
3
作者 Manman Zhu Xinyue Lu +2 位作者 Fuyou Chen Liying Yang Zhesi Shen 《Journal of Data and Information Science》 CSCD 2024年第1期11-36,共26页
Purpose:Accurately assigning the document type of review articles in citation index databases like Web of Science(WoS)and Scopus is important.This study aims to investigate the document type assignation of review arti... Purpose:Accurately assigning the document type of review articles in citation index databases like Web of Science(WoS)and Scopus is important.This study aims to investigate the document type assignation of review articles in Web of Science,Scopus and Publisher’s websites on a large scale.Design/methodology/approach:27,616 papers from 160 journals from 10 review journal series indexed in SCI are analyzed.The document types of these papers labeled on journals’websites,and assigned by WoS and Scopus are retrieved and compared to determine the assigning accuracy and identify the possible reasons for wrongly assigning.For the document type labeled on the website,we further differentiate them into explicit review and implicit review based on whether the website directly indicates it is a review or not.Findings:Overall,WoS and Scopus performed similarly,with an average precision of about 99% and recall of about 80%.However,there were some differences between WoS and Scopus across different journal series and within the same journal series.The assigning accuracy of WoS and Scopus for implicit reviews dropped significantly,especially for Scopus.Research limitations:The document types we used as the gold standard were based on the journal websites’labeling which were not manually validated one by one.We only studied the labeling performance for review articles published during 2017-2018 in review journals.Whether this conclusion can be extended to review articles published in non-review journals and most current situation is not very clear.Practical implications:This study provides a reference for the accuracy of document type assigning of review articles in WoS and Scopus,and the identified pattern for assigning implicit reviews may be helpful to better labeling on websites,WoS and Scopus.Originality/value:This study investigated the assigning accuracy of document type of reviews and identified the some patterns of wrong assignments. 展开更多
关键词 document type Web of Science SCOPUS Review article
下载PDF
Hybrid Optimization Algorithm for Handwritten Document Enhancement
4
作者 Shu-Chuan Chu Xiaomeng Yang +2 位作者 Li Zhang Václav Snášel Jeng-Shyang Pan 《Computers, Materials & Continua》 SCIE EI 2024年第3期3763-3786,共24页
The Gannet Optimization Algorithm (GOA) and the Whale Optimization Algorithm (WOA) demonstrate strong performance;however, there remains room for improvement in convergence and practical applications. This study intro... The Gannet Optimization Algorithm (GOA) and the Whale Optimization Algorithm (WOA) demonstrate strong performance;however, there remains room for improvement in convergence and practical applications. This study introduces a hybrid optimization algorithm, named the adaptive inertia weight whale optimization algorithm and gannet optimization algorithm (AIWGOA), which addresses challenges in enhancing handwritten documents. The hybrid strategy integrates the strengths of both algorithms, significantly enhancing their capabilities, whereas the adaptive parameter strategy mitigates the need for manual parameter setting. By amalgamating the hybrid strategy and parameter-adaptive approach, the Gannet Optimization Algorithm was refined to yield the AIWGOA. Through a performance analysis of the CEC2013 benchmark, the AIWGOA demonstrates notable advantages across various metrics. Subsequently, an evaluation index was employed to assess the enhanced handwritten documents and images, affirming the superior practical application of the AIWGOA compared with other algorithms. 展开更多
关键词 Metaheuristic algorithm gannet optimization algorithm hybrid algorithm handwritten document enhancement
下载PDF
Multimodal Deep Neural Networks for Digitized Document Classification
5
作者 Aigerim Baimakhanova Ainur Zhumadillayeva +4 位作者 Bigul Mukhametzhanova Natalya Glazyrina Rozamgul Niyazova Nurseit Zhunissov Aizhan Sambetbayeva 《Computer Systems Science & Engineering》 2024年第3期793-811,共19页
As digital technologies have advanced more rapidly,the number of paper documents recently converted into a digital format has exponentially increased.To respond to the urgent need to categorize the growing number of d... As digital technologies have advanced more rapidly,the number of paper documents recently converted into a digital format has exponentially increased.To respond to the urgent need to categorize the growing number of digitized documents,the classification of digitized documents in real time has been identified as the primary goal of our study.A paper classification is the first stage in automating document control and efficient knowledge discovery with no or little human involvement.Artificial intelligence methods such as Deep Learning are now combined with segmentation to study and interpret those traits,which were not conceivable ten years ago.Deep learning aids in comprehending input patterns so that object classes may be predicted.The segmentation process divides the input image into separate segments for a more thorough image study.This study proposes a deep learning-enabled framework for automated document classification,which can be implemented in higher education.To further this goal,a dataset was developed that includes seven categories:Diplomas,Personal documents,Journal of Accounting of higher education diplomas,Service letters,Orders,Production orders,and Student orders.Subsequently,a deep learning model based on Conv2D layers is proposed for the document classification process.In the final part of this research,the proposed model is evaluated and compared with other machine-learning techniques.The results demonstrate that the proposed deep learning model shows high results in document categorization overtaking the other machine learning models by reaching 94.84%,94.79%,94.62%,94.43%,94.07%in accuracy,precision,recall,F-score,and AUC-ROC,respectively.The achieved results prove that the proposed deep model is acceptable to use in practice as an assistant to an office worker. 展开更多
关键词 document categorization deep learning machine learning CLASSIFICATION DIGITIZATION
下载PDF
Impact of Laboratory Value Flowsheet in Electronic Health Record (EHR) Documentation Time
6
作者 Isabel Rosado Pogozelski 《Open Journal of Nursing》 2024年第1期40-50,共11页
Research on the use of EHR is contradictory since it presents contradicting results regarding the time spent documenting. There is research that supports the use of electronic records as a tool to speed documentation;... Research on the use of EHR is contradictory since it presents contradicting results regarding the time spent documenting. There is research that supports the use of electronic records as a tool to speed documentation;and research that found that it is time consuming. The purpose of this quantitative retrospective before-after project was to measure the impact of using the laboratory value flowsheet within the EHR on documentation time. The research question was: “Does the use of a laboratory value flowsheet in the EHR impact documentation time by primary care providers (PCPs)?” The theoretical framework utilized in this project was the Donabedian Model. The population in this research was the two PCPs in a small primary care clinic in the northwest of Puerto Rico. The sample was composed of all the encounters during the months of October 2019 and December 2019. The data was obtained through data mining and analyzed using SPSS 27. The evaluative outcome of this project is that there is a decrease in documentation time after implementation of the use of the laboratory value flowsheet in the EHR. However, patients per day increase therefore having an impact on the number of patients seen per day/week/month. The implications for clinical practice include the use of templates to improve workflow and documentation as well as decreasing documentation time while also increasing the number of patients seen per day. . 展开更多
关键词 Electronic Health Record EHR Laboratory Results Template documentation Time
下载PDF
A Framework Based on the DAO and NFT in Blockchain for Electronic Document Sharing
7
作者 Lin Chen Jiaming Zhu +2 位作者 Yuting Xu Huanqin Zheng Shen Su 《Computer Modeling in Engineering & Sciences》 SCIE EI 2024年第9期2373-2395,共23页
In the information age,electronic documents(e-documents)have become a popular alternative to paper documents due to their lower costs,higher dissemination rates,and ease of knowledge sharing.However,digital copyright ... In the information age,electronic documents(e-documents)have become a popular alternative to paper documents due to their lower costs,higher dissemination rates,and ease of knowledge sharing.However,digital copyright infringements occur frequently due to the ease of copying,which not only infringes on the rights of creators but also weakens their creative enthusiasm.Therefore,it is crucial to establish an e-document sharing system that enforces copyright protection.However,the existing centralized system has outstanding vulnerabilities,and the plagiarism detection algorithm used cannot fully detect the context,semantics,style,and other factors of the text.Digital watermark technology is only used as a means of infringement tracing.This paper proposes a decentralized framework for e-document sharing based on decentralized autonomous organization(DAO)and non-fungible token(NFT)in blockchain.The use of blockchain as a distributed credit base resolves the vulnerabilities inherent in traditional centralized systems.The e-document evaluation and plagiarism detection mechanisms based on the DAO model effectively address challenges in comprehensive text information checks,thereby promoting the enhancement of e-document quality.The mechanism for protecting and circulating e-document copyrights using NFT technology ensures effective safeguarding of users’e-document copyrights and facilitates e-document sharing.Moreover,recognizing the security issues within the DAO governance mechanism,we introduce an innovative optimization solution.Through experimentation,we validate the enhanced security of the optimized governance mechanism,reducing manipulation risks by up to 51%.Additionally,by utilizing evolutionary game analysis to deduce the equilibrium strategies of the framework,we discovered that adjusting the reward and penalty parameters of the incentive mechanism motivates creators to generate superior quality and unique e-documents,while evaluators are more likely to engage in assessments. 展开更多
关键词 Electronic document sharing blockchain DAO NFT evolutionary game CMES 2024 vol.140 no.3
下载PDF
Embedding-based Detection and Extraction of Research Topics from Academic Documents Using Deep Clustering 被引量:4
8
作者 Sahand Vahidnia Alireza Abbasi Hussein A.Abbass 《Journal of Data and Information Science》 CSCD 2021年第3期99-122,共24页
Purpose:Detection of research fields or topics and understanding the dynamics help the scientific community in their decisions regarding the establishment of scientific fields.This also helps in having a better collab... Purpose:Detection of research fields or topics and understanding the dynamics help the scientific community in their decisions regarding the establishment of scientific fields.This also helps in having a better collaboration with governments and businesses.This study aims to investigate the development of research fields over time,translating it into a topic detection problem.Design/methodology/approach:To achieve the objectives,we propose a modified deep clustering method to detect research trends from the abstracts and titles of academic documents.Document embedding approaches are utilized to transform documents into vector-based representations.The proposed method is evaluated by comparing it with a combination of different embedding and clustering approaches and the classical topic modeling algorithms(i.e.LDA)against a benchmark dataset.A case study is also conducted exploring the evolution of Artificial Intelligence(AI)detecting the research topics or sub-fields in related AI publications.Findings:Evaluating the performance of the proposed method using clustering performance indicators reflects that our proposed method outperforms similar approaches against the benchmark dataset.Using the proposed method,we also show how the topics have evolved in the period of the recent 30 years,taking advantage of a keyword extraction method for cluster tagging and labeling,demonstrating the context of the topics.Research limitations:We noticed that it is not possible to generalize one solution for all downstream tasks.Hence,it is required to fine-tune or optimize the solutions for each task and even datasets.In addition,interpretation of cluster labels can be subjective and vary based on the readers’opinions.It is also very difficult to evaluate the labeling techniques,rendering the explanation of the clusters further limited.Practical implications:As demonstrated in the case study,we show that in a real-world example,how the proposed method would enable the researchers and reviewers of the academic research to detect,summarize,analyze,and visualize research topics from decades of academic documents.This helps the scientific community and all related organizations in fast and effective analysis of the fields,by establishing and explaining the topics.Originality/value:In this study,we introduce a modified and tuned deep embedding clustering coupled with Doc2Vec representations for topic extraction.We also use a concept extraction method as a labeling approach in this study.The effectiveness of the method has been evaluated in a case study of AI publications,where we analyze the AI topics during the past three decades. 展开更多
关键词 Dynamics of science Science mapping document clustering Artificial intelligence Deep learning
下载PDF
Hadoop-Based Similarity Computation System for Composed Documents 被引量:1
9
作者 Xiaoming Zhang Zhipeng Qin +3 位作者 Xuwei Liu Qianyun Hou Baishuang Zhang Jie Wu 《Journal of Computer and Communications》 2015年第5期196-202,共7页
There exist a large number of composed documents in universities in the teaching process. Most of them are required to check the similarity for validation. A kind of similarity computation system is constructed for co... There exist a large number of composed documents in universities in the teaching process. Most of them are required to check the similarity for validation. A kind of similarity computation system is constructed for composed documents with images and text information. Firstly, each document is split and outputs two parts as images and text information. Then, these documents are compared by computing the similarities of images and text contents independently. Through Hadoop system, the text contents are easily and quickly separated. Experimental results show that the proposed system is efficient and practical. 展开更多
关键词 SIMILARITY COMPUTATION Composed documents Map REDUCE SYSTEM Integration
下载PDF
Automatic Table Recognition and Extraction from Heterogeneous Documents 被引量:1
10
作者 Florence Folake Babatunde Bolanle Adefowoke Ojokoh Samuel Adebayo Oluwadare 《Journal of Computer and Communications》 2015年第12期100-110,共11页
This paper examines automatic recognition and extraction of tables from a large collection of het-erogeneous documents. The heterogeneous documents are initially pre-processed and converted to HTML codes, after which ... This paper examines automatic recognition and extraction of tables from a large collection of het-erogeneous documents. The heterogeneous documents are initially pre-processed and converted to HTML codes, after which an algorithm recognises the table portion of the documents. Hidden Markov Model (HMM) is then applied to the HTML code in order to extract the tables. The model was trained and tested with five hundred and twenty six self-generated tables (three hundred and twenty-one (321) tables for training and two hundred and five (205) tables for testing). Viterbi algorithm was implemented for the testing part. The system was evaluated in terms of accuracy, precision, recall and f-measure. The overall evaluation results show 88.8% accuracy, 96.8% precision, 91.7% recall and 88.8% F-measure revealing that the method is good at solving the problem of table extraction. 展开更多
关键词 Hidden MARKOV Model Table Recognition and EXTRACTION HYPERTEXT MARKUP Language HETEROGENEOUS documents
下载PDF
Performance Assessment of Nanocellulose Hydroxypropyl Methyl Cellulose Composite on Role of Nano-CaCO_(3) for the Preservation of Paper Documents 被引量:3
11
作者 Xiaochun Ma Altaf Halim +2 位作者 Xiaohong Li Huiming Fan Shiyu Fu 《Paper And Biomaterials》 CAS 2022年第2期1-9,共9页
Deacidification and self-cleaning are important for the preservation of paper documents.In this study,nano-CaCO_(3) was used as a deacidification agent and stabilized by nanocellulose(CNC)and hydroxypropyl methylcellu... Deacidification and self-cleaning are important for the preservation of paper documents.In this study,nano-CaCO_(3) was used as a deacidification agent and stabilized by nanocellulose(CNC)and hydroxypropyl methylcellulose(HPMC)to form a uniform dispersion.Followed by polydimethylsiloxane(PDMS)treatment and chemical vapor deposition(CVD)of methyltrimethoxysilane(MTMS),a hydrophobic coating was constructed for self-cleaning purposes.The pH value of the treated paper was approximately 8.20,and the static contact angle was as high as 152.29°.Compared to the untreated paper,the tensile strength of the treated paper increased by 12.6%.This treatment method endows the paper with a good deacidification effect and self-cleaning property,which are beneficial for its long-term preservation. 展开更多
关键词 paper documents NANOCELLULOSE self-cleaning nano-CaCO_(3) superhydrophobicity DEACIDIFICATION
下载PDF
Stochastic Model for Multiple Classes and Subclasses Simple Documents Processing 被引量:1
12
作者 Pierre Moukeli Mbindzoukou Arsène Roland Moukoukou Marius Massala 《Intelligent Information Management》 2021年第2期124-140,共17页
The issue of document management has been raised for a long time, especially with the appearance of office automation in the 1980s, which led to dematerialization and Electronic Document Management (EDM). In the same ... The issue of document management has been raised for a long time, especially with the appearance of office automation in the 1980s, which led to dematerialization and Electronic Document Management (EDM). In the same period, workflow management has experienced significant development, but has become more focused on the industry. However, it seems to us that document workflows have not had the same interest for the scientific community. But nowadays, the emergence and supremacy of the Internet in electronic exchanges are leading to a massive dematerialization of documents;which requires a conceptual reconsideration of the organizational framework for the processing of said documents in both public and private administrations. This problem seems open to us and deserves the interest of the scientific community. Indeed, EDM has mainly focused on the storage (referencing) and circulation of documents (traceability). It paid little attention to the overall behavior of the system in processing documents. The purpose of our researches is to model document processing systems. In the previous works, we proposed a general model and its specialization in the case of small documents (any document processed by a single person at a time during its processing life cycle), which represent 70% of documents processed by administrations, according to our study. In this contribution, we extend the model for processing small documents to the case where they are managed in a system comprising document classes organized in subclasses;which is the case for most administrations. We have thus observed that this model is a Markovian <i>M<sup>L×K</sup>/M<sup>L×K</sup>/</i>1 queues network. We have analyzed the constraints of this model and deduced certain characteristics and metrics. <span style="white-space:normal;"><i></i></span><i>In fine<span style="white-space:normal;"></span></i>, the ultimate objective of our work is to design a document workflow management system, integrating a component of global behavior prediction. 展开更多
关键词 document Processing WORKFLOW Hierarchic Chart Counting Processes Stochastic Models Waiting Lines Markov Processes Priority Queues Multiple Class and Subclass Queues
下载PDF
Supporting B2B Business Documents in XML Web Services 被引量:3
13
作者 KIM Hyoungdo 《Journal of Electronic Science and Technology of China》 2004年第3期53-57,73,共6页
While XML web services become recognized as a solution to business-to-business transactions, there are many problems that should be solved. For example, it is not easy to manipulate business documents of existing stan... While XML web services become recognized as a solution to business-to-business transactions, there are many problems that should be solved. For example, it is not easy to manipulate business documents of existing standards such as RosettaNet and UN/EDIFACT EDI, traditionally regarded as an important resource for managing B2B relationships. As a starting point for the complete implementation of B2B web services, this paper deals with how to support B2B business documents in XML web services. In the first phase, basic requirements for driving XML web services by business documents are introduced. As a solution, this paper presents how to express B2B business documents in WSDL, a core standard for XML web services. This kind of approach facilitates the reuse of existing business documents and enhances interoperability between implemented web services. Furthermore, it suggests how to link with other conceptual modeling frameworks such as ebXML/UMM, built on a rich heritage of electronic business experience. 展开更多
关键词 business document XML web service EBXML
下载PDF
Deep Learning Multimodal for Unstructured and Semi-Structured Textual Documents Classicatio 被引量:1
14
作者 Nany Katamesh Osama Abu-Elnasr Samir Elmougy 《Computers, Materials & Continua》 SCIE EI 2021年第7期589-606,共18页
Due to the availability of a huge number of electronic text documents from a variety of sources representing unstructured and semi-structured information,the document classication task becomes an interesting area for ... Due to the availability of a huge number of electronic text documents from a variety of sources representing unstructured and semi-structured information,the document classication task becomes an interesting area for controlling data behavior.This paper presents a document classication multimodal for categorizing textual semi-structured and unstructured documents.The multimodal implements several individual deep learning models such as Deep Neural Networks(DNN),Recurrent Convolutional Neural Networks(RCNN)and Bidirectional-LSTM(Bi-LSTM).The Stacked Ensemble based meta-model technique is used to combine the results of the individual classiers to produce better results,compared to those reached by any of the above mentioned models individually.A series of textual preprocessing steps are executed to normalize the input corpus followed by text vectorization techniques.These techniques include using Term Frequency Inverse Term Frequency(TFIDF)or Continuous Bag of Word(CBOW)to convert text data into the corresponding suitable numeric form acceptable to be manipulated by deep learning models.Moreover,this proposed model is validated using a dataset collected from several spaces with a huge number of documents in every class.In addition,the experimental results prove that the proposed model has achieved effective performance.Besides,upon investigating the PDF Documents classication,the proposed model has achieved accuracy up to 0.9045 and 0.959 for the TFIDF and CBOW features,respectively.Moreover,concerning the JSON Documents classication,the proposed model has achieved accuracy up to 0.914 and 0.956 for the TFIDF and CBOW features,respectively.Furthermore,as for the XML Documents classication,the proposed model has achieved accuracy values up to 0.92 and 0.959 for the TFIDF and CBOW features,respectively. 展开更多
关键词 document classication deep learning text vectorization convolutional neural network bi-directional neural network stacked ensemble
下载PDF
Study on Multi-Label Classification of Medical Dispute Documents 被引量:2
15
作者 Baili Zhang Shan Zhou +2 位作者 Le Yang Jianhua Lv Mingjun Zhong 《Computers, Materials & Continua》 SCIE EI 2020年第12期1975-1986,共12页
The Internet of Medical Things(IoMT)will come to be of great importance in the mediation of medical disputes,as it is emerging as the core of intelligent medical treatment.First,IoMT can track the entire medical treat... The Internet of Medical Things(IoMT)will come to be of great importance in the mediation of medical disputes,as it is emerging as the core of intelligent medical treatment.First,IoMT can track the entire medical treatment process in order to provide detailed trace data in medical dispute resolution.Second,IoMT can infiltrate the ongoing treatment and provide timely intelligent decision support to medical staff.This information includes recommendation of similar historical cases,guidance for medical treatment,alerting of hired dispute profiteers etc.The multi-label classification of medical dispute documents(MDDs)plays an important role as a front-end process for intelligent decision support,especially in the recommendation of similar historical cases.However,MDDs usually appear as long texts containing a large amount of redundant information,and there is a serious distribution imbalance in the dataset,which directly leads to weaker classification performance.Accordingly,in this paper,a multi-label classification method based on key sentence extraction is proposed for MDDs.The method is divided into two parts.First,the attention-based hierarchical bi-directional long short-term memory(BiLSTM)model is used to extract key sentences from documents;second,random comprehensive sampling Bagging(RCS-Bagging),which is an ensemble multi-label classification model,is employed to classify MDDs based on key sentence sets.The use of this approach greatly improves the classification performance.Experiments show that the performance of the two models proposed in this paper is remarkably better than that of the baseline methods. 展开更多
关键词 Internet of Medical Things(IoMT) medical disputes medical dispute document(MDD) multi-label classification(MLC) key sentence extraction class imbalance
下载PDF
INFORMATION RETRIEVAL FOR SHORT DOCUMENTS 被引量:2
16
作者 Qi Haoliang Li Mu +1 位作者 Gao Jianfeng Li Sheng 《Journal of Electronics(China)》 2006年第6期933-936,共4页
The major problem of the most current approaches of information models lies in that individual words provide unreliable evidence about the content of the texts. When the document is short, e.g. only the ab-stract is a... The major problem of the most current approaches of information models lies in that individual words provide unreliable evidence about the content of the texts. When the document is short, e.g. only the ab-stract is available, the word-use variability problem will have substantial impact on the Information Retrieval (IR) performance. To solve the problem, a new technology to short document retrieval named Reference Document Model (RDM) is put forward in this letter. RDM gets the statistical semantic of the query/document by pseudo feedback both for the query and document from reference documents. The contributions of this model are three-fold: (1) Pseudo feedback both for the query and the document; (2) Building the query model and the document model from reference documents; (3) Flexible indexing units, which can be any linguistic elements such as documents, paragraphs, sentences, n-grams, term or character. For short document retrieval, RDM achieves significant improvements over the classical probabilistic models on the task of ad hoc retrieval on Text REtrieval Conference (TREC) test sets. Results also show that the shorter the document, the better the RDM performance. 展开更多
关键词 信息恢复 短文档 基准文档模型 信息论
下载PDF
EDCMS:A Content Management System for Engineering Documents
17
作者 Chris McMahon Mansur Darlington +1 位作者 Steve Culley Peter Wild 《International Journal of Automation and computing》 EI 2007年第1期56-70,共15页
Engineers often need to look for the right pieces of information by sifting through long engineering documents, It is a very tiring and time-consuming job. To address this issue, researchers are increasingly devoting ... Engineers often need to look for the right pieces of information by sifting through long engineering documents, It is a very tiring and time-consuming job. To address this issue, researchers are increasingly devoting their attention to new ways to help information users, including engineers, to access and retrieve document content. The research reported in this paper explores how to use the key technologies of document decomposition (study of document structure), document mark-up (with EXtensible Mark- up Language (XML), HyperText Mark-up Language (HTML), and Scalable Vector Graphics (SVG)), and a facetted classification mechanism. Document content extraction is implemented via computer programming (with Java). An Engineering Document Content Management System (EDCMS) developed in this research demonstrates that as information providers we can make document content in a more accessible manner for information users including engineers.The main features of the EDCMS system are: 1) EDCMS is a system that enables users, especially engineers, to access and retrieve information at content rather than document level. In other words, it provides the right pieces of information that answer specific questions so that engineers don't need to waste time sifting through the whole document to obtain the required piece of information. 2) Users can use the EDCMS via both the data and metadata of a document to access engineering document content. 3) Users can use the EDCMS to access and retrieve content objects, i.e. text, images and graphics (including engineering drawings) via multiple views and at different granularities based on decomposition schemes. Experiments with the EDCMS have been conducted on semi-structured documents, a textbook of CADCAM, and a set of project posters in the Engineering Design domain. Experimental results show that the system provides information users with a powerful solution to access document content. 展开更多
关键词 document content management engineering design decomposition schemes document mark-up facetted classification.
下载PDF
Study on Documents ofCampus Network
18
作者 万伟太 杨林 宋为 《International Journal of Mining Science and Technology》 SCIE EI 1997年第2期66-69,共4页
Campus network establishment belongs to the field of system engineering. It is necessary to carry on cooperation among departments. Standardization is the key to solve the problem, and its core is standardization of d... Campus network establishment belongs to the field of system engineering. It is necessary to carry on cooperation among departments. Standardization is the key to solve the problem, and its core is standardization of documents. Therefore, this paper will be concentrated on the discussion of relevant problems in combination with our campus network practice. 展开更多
关键词 CAMPUS NETWORK ENGINEERING document
下载PDF
Research of Web Documents Clustering Based on Dynamic Concept
19
作者 WANGYun-hua CHENShi-hong 《Wuhan University Journal of Natural Sciences》 EI CAS 2004年第5期547-552,共6页
Conceptual clustering is mainly used for solving the deficiency and incompleteness of domain knowledge. Based on conceptual clustering technology and aiming at the institutional framework and characteristic of Web the... Conceptual clustering is mainly used for solving the deficiency and incompleteness of domain knowledge. Based on conceptual clustering technology and aiming at the institutional framework and characteristic of Web theme information, this paper proposes and implements dynamic conceptual clustering algorithm and merging algorithm for Web documents, and also analyses the super performance of the clustering algorithm in efficiency and clustering accuracy. Key words conceptual clustering - clustering center - dynamic conceptual clustering - theme - web documents clustering CLC number TP 311 Foundation item: Supported by the National “863” Program of China (2002AA111010, 2003AA001032)Biography: WANG Yun-hua(1979-), male, Master candidate, research direction: knowledge engineering and data mining. 展开更多
关键词 conceptual clustering clustering center dynamic conceptual clustering THEME web documents clustering
下载PDF
Word Segmentation for Chinese Judicial Documents 被引量:1
20
作者 Linxia Yao Jidong Ge +5 位作者 Chuanyi Li Yuan Yao Zhenhao Li Jin Zeng Bin Luo Victor Chang 《国际计算机前沿大会会议论文集》 2019年第1期476-478,共3页
Word segmentation is an integral step in many knowledge discovery applications. However, existing word segmentation methods have problems when applying to Chinese judicial documents:(1) existing methods rely on large-... Word segmentation is an integral step in many knowledge discovery applications. However, existing word segmentation methods have problems when applying to Chinese judicial documents:(1) existing methods rely on large-scale labeled data which is typically unavailable in judicial documents, and (2) judicial document has its own language features and writing formats. In this paper, a word segmentation method is proposed for Chinese judicial documents. The proposed method consists of two steps:(1) automatically generating some labeled data as legal dictionaries, and (2) applying a hybrid multilayer neural networks to do word segmentation incorporating legal dictionaries. Experiments are conducted on a dataset of Chinese judicial documents showing that the proposed model can achieve better results than the existing methods. 展开更多
关键词 CHINESE word SEGMENTATION KNOWLEDGE DISCOVERY JUDICIAL documents
下载PDF
上一页 1 2 250 下一页 到第
使用帮助 返回顶部