期刊文献+
共找到546篇文章
< 1 2 28 >
每页显示 20 50 100
Functions of Karez to Xinjiang Agriculture in the Qing Dynasty from the Perspective of Historical Documents
1
作者 Danyang GONG 《Asian Agricultural Research》 2023年第3期70-71,共2页
Desertification is increasingly serious in Xinjiang,and the construction of water conservancy is a precondition for the development of agriculture.The main project for the development of agriculture and water conserva... Desertification is increasingly serious in Xinjiang,and the construction of water conservancy is a precondition for the development of agriculture.The main project for the development of agriculture and water conservancy in Xinjiang is to build Karez,which played a vital role in the development of Xinjiang agriculture in the Qing Dynasty.It has been recorded many times in historical documents of the Qing Dynasty,such as Lin Zexu s Diary,Tao Baolian s Diary,Xinjiang Atlas and Zuo Zongtang s Memorial to the Emperor,etc.,which recorded the situation and historical origin of Karez.Karez made a significant contribution to the development of agriculture in the Qing Dynasty.It increased the cultivated land in Xinjiang at that time,and increased the types and yields of crops.It is conducive to the stability and development of Xinjiang s economy.Until today,Karez is still an important water source for agricultural irrigation in Xinjiang. 展开更多
关键词 KAREZ Historical documents in the Qing Dynasty Xinjiang agriculture
下载PDF
Human Rights in Civil Judicial Documents:Conception and Function
2
作者 郑若瀚 《The Journal of Human Rights》 2023年第4期851-868,共18页
Traditional human rights theory tends to hold that human rights should be aimed at defending public authority and that the legal issue of human rights is a matter of public law.However,the development of human rights ... Traditional human rights theory tends to hold that human rights should be aimed at defending public authority and that the legal issue of human rights is a matter of public law.However,the development of human rights concepts and practices is not just confined to this.A textual search shows that the term“human rights”exists widely in China’s civil judicial documents.Among the 3,412 civil judicial documents we researched,the concept of“human rights”penetrates all kinds of disputes in lawsuits,ranging from property rights,contracts,labor,and torts to marital property,which is embedded in both the claims of the parties concerned and the reasoning of judges.Human rights have become the discourse and yardstick for understanding and evaluating social behavior.The widespread use of the term“human rights”in civil judicial documents reflects at least three concepts related to human rights:first,the rights to subsistence and development are the primary basic human rights;second,the judicial protection of human rights is a bottom-line guarantee;third,the protection of human rights aims to achieve equal rights.Today,judges quote the theory of human rights in judicial judgments from time to time,evidencing that human rights have a practical function in judicial adjudication activities,and in practice this is mainly manifested in declaring righteous values and strengthening arguments with the values and ideas related to human rights,using the provisions concerning human rights in the Constitution to interpret the constitutionality,and using the principles of human rights to interpret blurred rules and rank the importance of different rights. 展开更多
关键词 human rights concept of human rights civil judicature judicial documents judicial reasons
下载PDF
Performance Assessment of Nanocellulose Hydroxypropyl Methyl Cellulose Composite on Role of Nano-CaCO_(3) for the Preservation of Paper Documents 被引量:3
3
作者 Xiaochun Ma Altaf Halim +2 位作者 Xiaohong Li Huiming Fan Shiyu Fu 《Paper And Biomaterials》 CAS 2022年第2期1-9,共9页
Deacidification and self-cleaning are important for the preservation of paper documents.In this study,nano-CaCO_(3) was used as a deacidification agent and stabilized by nanocellulose(CNC)and hydroxypropyl methylcellu... Deacidification and self-cleaning are important for the preservation of paper documents.In this study,nano-CaCO_(3) was used as a deacidification agent and stabilized by nanocellulose(CNC)and hydroxypropyl methylcellulose(HPMC)to form a uniform dispersion.Followed by polydimethylsiloxane(PDMS)treatment and chemical vapor deposition(CVD)of methyltrimethoxysilane(MTMS),a hydrophobic coating was constructed for self-cleaning purposes.The pH value of the treated paper was approximately 8.20,and the static contact angle was as high as 152.29°.Compared to the untreated paper,the tensile strength of the treated paper increased by 12.6%.This treatment method endows the paper with a good deacidification effect and self-cleaning property,which are beneficial for its long-term preservation. 展开更多
关键词 paper documents NANOCELLULOSE self-cleaning nano-CaCO_(3) superhydrophobicity DEACIDIFICATION
下载PDF
Hadoop-Based Similarity Computation System for Composed Documents 被引量:1
4
作者 Xiaoming Zhang Zhipeng Qin +3 位作者 Xuwei Liu Qianyun Hou Baishuang Zhang Jie Wu 《Journal of Computer and Communications》 2015年第5期196-202,共7页
There exist a large number of composed documents in universities in the teaching process. Most of them are required to check the similarity for validation. A kind of similarity computation system is constructed for co... There exist a large number of composed documents in universities in the teaching process. Most of them are required to check the similarity for validation. A kind of similarity computation system is constructed for composed documents with images and text information. Firstly, each document is split and outputs two parts as images and text information. Then, these documents are compared by computing the similarities of images and text contents independently. Through Hadoop system, the text contents are easily and quickly separated. Experimental results show that the proposed system is efficient and practical. 展开更多
关键词 SIMILARITY COMPUTATION Composed documents Map REDUCE SYSTEM Integration
下载PDF
Automatic Table Recognition and Extraction from Heterogeneous Documents 被引量:1
5
作者 Florence Folake Babatunde Bolanle Adefowoke Ojokoh Samuel Adebayo Oluwadare 《Journal of Computer and Communications》 2015年第12期100-110,共11页
This paper examines automatic recognition and extraction of tables from a large collection of het-erogeneous documents. The heterogeneous documents are initially pre-processed and converted to HTML codes, after which ... This paper examines automatic recognition and extraction of tables from a large collection of het-erogeneous documents. The heterogeneous documents are initially pre-processed and converted to HTML codes, after which an algorithm recognises the table portion of the documents. Hidden Markov Model (HMM) is then applied to the HTML code in order to extract the tables. The model was trained and tested with five hundred and twenty six self-generated tables (three hundred and twenty-one (321) tables for training and two hundred and five (205) tables for testing). Viterbi algorithm was implemented for the testing part. The system was evaluated in terms of accuracy, precision, recall and f-measure. The overall evaluation results show 88.8% accuracy, 96.8% precision, 91.7% recall and 88.8% F-measure revealing that the method is good at solving the problem of table extraction. 展开更多
关键词 Hidden MARKOV Model Table Recognition and EXTRACTION HYPERTEXT MARKUP Language HETEROGENEOUS documents
下载PDF
Word Segmentation for Chinese Judicial Documents 被引量:1
6
作者 Linxia Yao Jidong Ge +5 位作者 Chuanyi Li Yuan Yao Zhenhao Li Jin Zeng Bin Luo Victor Chang 《国际计算机前沿大会会议论文集》 2019年第1期476-478,共3页
Word segmentation is an integral step in many knowledge discovery applications. However, existing word segmentation methods have problems when applying to Chinese judicial documents:(1) existing methods rely on large-... Word segmentation is an integral step in many knowledge discovery applications. However, existing word segmentation methods have problems when applying to Chinese judicial documents:(1) existing methods rely on large-scale labeled data which is typically unavailable in judicial documents, and (2) judicial document has its own language features and writing formats. In this paper, a word segmentation method is proposed for Chinese judicial documents. The proposed method consists of two steps:(1) automatically generating some labeled data as legal dictionaries, and (2) applying a hybrid multilayer neural networks to do word segmentation incorporating legal dictionaries. Experiments are conducted on a dataset of Chinese judicial documents showing that the proposed model can achieve better results than the existing methods. 展开更多
关键词 CHINESE word SEGMENTATION KNOWLEDGE DISCOVERY JUDICIAL documents
下载PDF
THE EARLIEST SLAVERY DOCUMENTS FROM MESOPOTAMIA 被引量:1
7
作者 Wu Yuhong, IHAC Northeast Normal University, Changchun 《Journal of Ancient Civilizations》 2009年第1期1-33,共33页
In his ice-breaking article "The Smell of the Cage in Cuneiform Digital Library" Journal 2009/4, Robert K. Englund discusses some archaic lists of slaves from Uruk III and Jemdet Nasr (Ancient Ni-ru?) about ... In his ice-breaking article "The Smell of the Cage in Cuneiform Digital Library" Journal 2009/4, Robert K. Englund discusses some archaic lists of slaves from Uruk III and Jemdet Nasr (Ancient Ni-ru?) about 3100-2900 B.C. 展开更多
关键词 THE EARLIEST SLAVERY documents FROM MESOPOTAMIA
下载PDF
Research of Web Documents Clustering Based on Dynamic Concept
8
作者 WANGYun-hua CHENShi-hong 《Wuhan University Journal of Natural Sciences》 EI CAS 2004年第5期547-552,共6页
Conceptual clustering is mainly used for solving the deficiency and incompleteness of domain knowledge. Based on conceptual clustering technology and aiming at the institutional framework and characteristic of Web the... Conceptual clustering is mainly used for solving the deficiency and incompleteness of domain knowledge. Based on conceptual clustering technology and aiming at the institutional framework and characteristic of Web theme information, this paper proposes and implements dynamic conceptual clustering algorithm and merging algorithm for Web documents, and also analyses the super performance of the clustering algorithm in efficiency and clustering accuracy. Key words conceptual clustering - clustering center - dynamic conceptual clustering - theme - web documents clustering CLC number TP 311 Foundation item: Supported by the National “863” Program of China (2002AA111010, 2003AA001032)Biography: WANG Yun-hua(1979-), male, Master candidate, research direction: knowledge engineering and data mining. 展开更多
关键词 conceptual clustering clustering center dynamic conceptual clustering THEME web documents clustering
下载PDF
A New Enhanced Arabic Light Stemmer for IR in Medical Documents
9
作者 Ra’ed M.Al-Khatib Taha Zerrouki +2 位作者 Mohammed M.Abu Shquier Amar Balla Asef Al-Khateeb 《Computers, Materials & Continua》 SCIE EI 2021年第7期1255-1269,共15页
This paper introduces a new enhanced Arabic stemming algorithm for solving the information retrieval problem,especially in medical documents.Our proposed algorithm is a light stemming algorithm for extracting stems an... This paper introduces a new enhanced Arabic stemming algorithm for solving the information retrieval problem,especially in medical documents.Our proposed algorithm is a light stemming algorithm for extracting stems and roots from the input data.One of the main challenges facing the light stemming algorithm is cutting off the input word,to extract the initial segments.When initiating the light stemmer with strong initial segments,the final extracting stems and roots will be more accurate.Therefore,a new enhanced segmentation based on deploying the Direct Acyclic Graph(DAG)model is utilized.In addition to extracting the powerful initial segments,the main two procedures(i.e.,stems and roots extraction),should be also reinforced with more efficient operators to improve the final outputs.To validate the proposed enhanced stemmer,four data sets are used.The achieved stems and roots resulted from our proposed light stemmer are compared with the results obtained from five other well-known Arabic light stemmers using the same data sets.This evaluation process proved that the proposed enhanced stemmer outperformed other comparative stemmers. 展开更多
关键词 Machine learning information retrieval systems medical documents stemming algorithms arabic light stemmer natural language processing
下载PDF
Mathematical Expression Extraction in Text Fields of Documents Based on HMM
10
作者 Xuedong Tian Ruihan Bai +2 位作者 Fang Yang Jinyuan Bai Xinfu Li 《Journal of Computer and Communications》 2017年第14期1-13,共13页
Aiming at the problem that the mathematical expressions in unstructured text fields of documents are hard to be extracted automatically, rapidly and effectively, a method based on Hidden Markov Model (HMM) is proposed... Aiming at the problem that the mathematical expressions in unstructured text fields of documents are hard to be extracted automatically, rapidly and effectively, a method based on Hidden Markov Model (HMM) is proposed. Firstly, this method trained the HMM model through employing the symbol combination features of mathematical expressions. Then, some preprocessing works such as removing labels and filtering words were carried out. Finally, the preprocessed text was converted into an observation sequence as the input of the HMM model to determine which is the mathematical expression and extracts it. The experimental results show that the proposed method can effectively extract the mathematical expressions from the text fields of documents, and also has the relatively high accuracy rate and recall rate. 展开更多
关键词 Mathematical Expression EXTRACTION Hidden MARKOV Model TEXT FIELDS documents SYMBOL Combination Features
下载PDF
On the Combination of “The Textual Research on Historical Documents” and “The Comparative Study of Historical Data” —— and a Discussion on “The Law of Quan-ma and Gui-mei” in Chinese Language Studies
11
作者 Lu Guoyao 《宏观语言学》 2007年第1期46-59,共14页
In Chinese language studies, both “The Textual Research on Historical Documents” and “The Comparative Study of Historical Data” are traditional in methodology and they both deserve being treasured, passed on, and ... In Chinese language studies, both “The Textual Research on Historical Documents” and “The Comparative Study of Historical Data” are traditional in methodology and they both deserve being treasured, passed on, and further developed. It will certainly do harm to the development of academic research if any of the two methods is given unreasonable priority. The author claims that the best or one of the best methodologies of the historical study of Chinese language is the combination of the two, hence a new interpretation of “The Double-proof Method”. Meanwhile, this essay is also an attempt to put forward “The Law of Quan-ma and Gui-mei” in Chinese language studies, in which the author believes that it is not advisable to either treat Gui-mei as Quan-ma or vice versa in linguistic research. It is crucial for us to respect always the language facts first, which is considered the very soul of linguistics. 展开更多
关键词 the history of Chinese language methodology The Textual Research on HISTORICAL documents The Comparative Study of HISTORICAL Data Double-proof method the LAW of Quan-ma and Gui-mei
下载PDF
Measuring Qualities of XML Schema Documents
12
作者 Tin Zar Thaw Mie Mie Khin 《Journal of Software Engineering and Applications》 2013年第9期458-469,共12页
The Extensible Markup Language (XML) is becoming a de-facto standard for exchanging information among the web applications. Efficient implementation of web application needs to be efficient implementation of XML and X... The Extensible Markup Language (XML) is becoming a de-facto standard for exchanging information among the web applications. Efficient implementation of web application needs to be efficient implementation of XML and XML schema document. The quality of XML document has great impact on the design quality of its schema document. Therefore, the design of XML schema document plays an important role in web engineering process and needs to have many schema qualities: functionality, extensibility, reusability, understandability, maintainability and so on. Three schema metrics: Reusable Quality metric (RQ), Extensible Quality metric (EQ) and Understandable Quality metric (UQ) are proposed to measure the Reusable, Extensible and Understandable of XML schema documents in web engineering process respectively. The base attributes are selected according to XML Quality Assurance Design Guidelines. These metrics are formulated based on Binary Entropy Function and Rank Order Centroid method. To check the validity of the proposed metrics empirically and analytically, the self-organizing feature map (SOM) and Weyuker’s 9 properties are used. 展开更多
关键词 Extensible MARKUP Language XML SCHEMA documents Web Engineering Process XML Quality ASSURANCE Design Guidelines SCHEMA Qualities
下载PDF
Establish Evidence Chain Model on Chinese Criminal Judgment Documents Using Text Similarity Measure
13
作者 Yixuan Dong Yemao Zhou +6 位作者 Chuanyi Li Jidong Ge Yali Han Mengting He Dekuan Liu Xiaoyu Zhou Bin Luo 《国际计算机前沿大会会议论文集》 2018年第2期4-4,共1页
关键词 CRIMINAL JUDGMENT documents JUDGMENT documents reasoningBig data EVIDENCE CHAIN TEXT similarity measure Word2vecWeight of EVIDENCE CHAIN
下载PDF
Evaluation System for Reasoning Description of Judgment Documents Based on TensorFlow CNN
14
作者 Mengting He Zhongyue Li +5 位作者 Yanshu Wei Jidong Ge Peitang Ling Chuanyi Li Ting Lei Bin Luo 《国际计算机前沿大会会议论文集》 2019年第1期531-533,共3页
In order to improve the quality of the judgment documents, the state and government have introduced laws and regulations. However, the current status of trials in our country is that the number of cases is very large.... In order to improve the quality of the judgment documents, the state and government have introduced laws and regulations. However, the current status of trials in our country is that the number of cases is very large. Using system to verify the documents can reduce the burden on the judges and ensure the accuracy of the judgment. This paper describes an evaluation system for reasoning description of judgment documents. The main evaluation steps include: segmenting the front and back of the law;extracting the key information in the document by using XML parsing technology;constructing the legal exclusive stop word library and preprocessing inputting text;entering the text input into the model to get the text matching result;using the “match keyword, compare sentencing degree” idea to judge whether the logic is consistent if it is the evaluation of “law and conclusion”;integrating the calculation results of each evaluation subject and feeding clear and concise results back to the system user. Simulation of real application scenarios was conducted to test whether the reasoning lacks key links or is insufficient or the judgment result is unreasonable. The result show that evaluation speed of each document is relatively fast and the accuracy of the evaluation of the common nine criminal cases is high. 展开更多
关键词 JUDGMENT documents REASONING DESCRIPTION Evaluation TEXT MATCHING Correlation calculation
下载PDF
The Origin Review and A Case Study of Plain Language in Federal Documents
15
作者 朱慧子 《海外英语》 2017年第19期217-219,共3页
Plain Language has made a great difference nowadays. As it turns out, Plain Language works effectively to express clearly, concisely and systematically. However, it is necessary for contemporary practitioners to revie... Plain Language has made a great difference nowadays. As it turns out, Plain Language works effectively to express clearly, concisely and systematically. However, it is necessary for contemporary practitioners to review the origin and development of Plain Language Movement and to examine whether it has thoroughly implemented Plain Language policies in every federal document. Examining a contemporary federal document against the Guidelines for Document Designers reveals existing problems for further development. 展开更多
关键词 Plain Language federal documents development history document analysis
下载PDF
Research on the Complication of Documents for Largescale Infrastructure Projects in Colleges and Universities
16
作者 Meili Wu 《Journal of World Architecture》 2021年第2期36-39,共4页
With the practical experience of constru-ction bidding documentation,for example,in view of the large infrastructure project construction in colleges and universities bidding documents for the main body,the constructi... With the practical experience of constru-ction bidding documentation,for example,in view of the large infrastructure project construction in colleges and universities bidding documents for the main body,the construction technology,qualification,performance requirements,bill of quantities,the terms of the contract set aspects were discussed,and put forward practical measures and methods,for similar project construction bidding document preparation to provide certain reference. 展开更多
关键词 Large infrastructure Tender documents Bill of quantities CONTRACT
下载PDF
A method for improving the accuracy of automatic indexing of Chinese-English mixed documents
17
作者 Yan ZHAO Hui SHI 《Chinese Journal of Library and Information Science》 2012年第4期77-92,共16页
Purpose: The thrust of this paper is to present a method for improving the accuracy of automatic indexing of Chinese-English mixed documents.Design/methodology/approach: Based on the inherent characteristics of Chines... Purpose: The thrust of this paper is to present a method for improving the accuracy of automatic indexing of Chinese-English mixed documents.Design/methodology/approach: Based on the inherent characteristics of Chinese-English mixed texts and the cybernetics theory,we proposed an integrated control method for indexing documents. It consists of 'feed-forward control','in-progress control' and 'feed-back control',aiming at improving the accuracy of automatic indexing of Chinese-English mixed documents. An experiment was conducted to investigate the effect of our proposed method.Findings: This method distinguishes Chinese and English documents in grammatical structures and word formation rules. Through the implementation of this method in the three phases of automatic indexing for the Chinese-English mixed documents,the results were encouraging. The precision increased from 88.54% to 97.10% and recall improved from97.37% to 99.47%.Research limitations: The indexing method is relatively complicated and the whole indexing process requires substantial human intervention. Due to pattern matching based on a bruteforce(BF) approach,the indexing efficiency has been reduced to some extent.Practical implications: The research is of both theoretical significance and practical value in improving the accuracy of automatic indexing of multilingual documents(not confined to Chinese-English mixed documents). The proposed method will benefit not only the indexing of life science documents but also the indexing of documents in other subject areas.Originality/value: So far,few studies have been published about the method for increasing the accuracy of multilingual automatic indexing. This study will provide insights into the automatic indexing of multilingual documents,especially Chinese-English mixed documents. 展开更多
关键词 Chinese-English mixed documents String matching Accuracy of automatic indexing CYBERNETICS Dedicated hepatitis B virus(HBV) database
下载PDF
Embedding-based Detection and Extraction of Research Topics from Academic Documents Using Deep Clustering 被引量:4
18
作者 Sahand Vahidnia Alireza Abbasi Hussein A.Abbass 《Journal of Data and Information Science》 CSCD 2021年第3期99-122,共24页
Purpose:Detection of research fields or topics and understanding the dynamics help the scientific community in their decisions regarding the establishment of scientific fields.This also helps in having a better collab... Purpose:Detection of research fields or topics and understanding the dynamics help the scientific community in their decisions regarding the establishment of scientific fields.This also helps in having a better collaboration with governments and businesses.This study aims to investigate the development of research fields over time,translating it into a topic detection problem.Design/methodology/approach:To achieve the objectives,we propose a modified deep clustering method to detect research trends from the abstracts and titles of academic documents.Document embedding approaches are utilized to transform documents into vector-based representations.The proposed method is evaluated by comparing it with a combination of different embedding and clustering approaches and the classical topic modeling algorithms(i.e.LDA)against a benchmark dataset.A case study is also conducted exploring the evolution of Artificial Intelligence(AI)detecting the research topics or sub-fields in related AI publications.Findings:Evaluating the performance of the proposed method using clustering performance indicators reflects that our proposed method outperforms similar approaches against the benchmark dataset.Using the proposed method,we also show how the topics have evolved in the period of the recent 30 years,taking advantage of a keyword extraction method for cluster tagging and labeling,demonstrating the context of the topics.Research limitations:We noticed that it is not possible to generalize one solution for all downstream tasks.Hence,it is required to fine-tune or optimize the solutions for each task and even datasets.In addition,interpretation of cluster labels can be subjective and vary based on the readers’opinions.It is also very difficult to evaluate the labeling techniques,rendering the explanation of the clusters further limited.Practical implications:As demonstrated in the case study,we show that in a real-world example,how the proposed method would enable the researchers and reviewers of the academic research to detect,summarize,analyze,and visualize research topics from decades of academic documents.This helps the scientific community and all related organizations in fast and effective analysis of the fields,by establishing and explaining the topics.Originality/value:In this study,we introduce a modified and tuned deep embedding clustering coupled with Doc2Vec representations for topic extraction.We also use a concept extraction method as a labeling approach in this study.The effectiveness of the method has been evaluated in a case study of AI publications,where we analyze the AI topics during the past three decades. 展开更多
关键词 Dynamics of science Science mapping Document clustering Artificial intelligence Deep learning
下载PDF
Study on Multi-Label Classification of Medical Dispute Documents 被引量:2
19
作者 Baili Zhang Shan Zhou +2 位作者 Le Yang Jianhua Lv Mingjun Zhong 《Computers, Materials & Continua》 SCIE EI 2020年第12期1975-1986,共12页
The Internet of Medical Things(IoMT)will come to be of great importance in the mediation of medical disputes,as it is emerging as the core of intelligent medical treatment.First,IoMT can track the entire medical treat... The Internet of Medical Things(IoMT)will come to be of great importance in the mediation of medical disputes,as it is emerging as the core of intelligent medical treatment.First,IoMT can track the entire medical treatment process in order to provide detailed trace data in medical dispute resolution.Second,IoMT can infiltrate the ongoing treatment and provide timely intelligent decision support to medical staff.This information includes recommendation of similar historical cases,guidance for medical treatment,alerting of hired dispute profiteers etc.The multi-label classification of medical dispute documents(MDDs)plays an important role as a front-end process for intelligent decision support,especially in the recommendation of similar historical cases.However,MDDs usually appear as long texts containing a large amount of redundant information,and there is a serious distribution imbalance in the dataset,which directly leads to weaker classification performance.Accordingly,in this paper,a multi-label classification method based on key sentence extraction is proposed for MDDs.The method is divided into two parts.First,the attention-based hierarchical bi-directional long short-term memory(BiLSTM)model is used to extract key sentences from documents;second,random comprehensive sampling Bagging(RCS-Bagging),which is an ensemble multi-label classification model,is employed to classify MDDs based on key sentence sets.The use of this approach greatly improves the classification performance.Experiments show that the performance of the two models proposed in this paper is remarkably better than that of the baseline methods. 展开更多
关键词 Internet of Medical Things(IoMT) medical disputes medical dispute document(MDD) multi-label classification(MLC) key sentence extraction class imbalance
下载PDF
Deep Learning Multimodal for Unstructured and Semi-Structured Textual Documents Classicatio 被引量:1
20
作者 Nany Katamesh Osama Abu-Elnasr Samir Elmougy 《Computers, Materials & Continua》 SCIE EI 2021年第7期589-606,共18页
Due to the availability of a huge number of electronic text documents from a variety of sources representing unstructured and semi-structured information,the document classication task becomes an interesting area for ... Due to the availability of a huge number of electronic text documents from a variety of sources representing unstructured and semi-structured information,the document classication task becomes an interesting area for controlling data behavior.This paper presents a document classication multimodal for categorizing textual semi-structured and unstructured documents.The multimodal implements several individual deep learning models such as Deep Neural Networks(DNN),Recurrent Convolutional Neural Networks(RCNN)and Bidirectional-LSTM(Bi-LSTM).The Stacked Ensemble based meta-model technique is used to combine the results of the individual classiers to produce better results,compared to those reached by any of the above mentioned models individually.A series of textual preprocessing steps are executed to normalize the input corpus followed by text vectorization techniques.These techniques include using Term Frequency Inverse Term Frequency(TFIDF)or Continuous Bag of Word(CBOW)to convert text data into the corresponding suitable numeric form acceptable to be manipulated by deep learning models.Moreover,this proposed model is validated using a dataset collected from several spaces with a huge number of documents in every class.In addition,the experimental results prove that the proposed model has achieved effective performance.Besides,upon investigating the PDF Documents classication,the proposed model has achieved accuracy up to 0.9045 and 0.959 for the TFIDF and CBOW features,respectively.Moreover,concerning the JSON Documents classication,the proposed model has achieved accuracy up to 0.914 and 0.956 for the TFIDF and CBOW features,respectively.Furthermore,as for the XML Documents classication,the proposed model has achieved accuracy values up to 0.92 and 0.959 for the TFIDF and CBOW features,respectively. 展开更多
关键词 Document classication deep learning text vectorization convolutional neural network bi-directional neural network stacked ensemble
下载PDF
上一页 1 2 28 下一页 到第
使用帮助 返回顶部