期刊文献+
共找到12篇文章
< 1 >
每页显示 20 50 100
Ensuring the Correctness of Regular Expressions: A Review 被引量:1
1
作者 Li-Xiao Zheng Shuai Ma +1 位作者 Zu-Xi Chen Xiang-Yu Luo 《International Journal of Automation and computing》 EI CSCD 2021年第4期521-535,共15页
Regular expressions are widely used within and even outside of computer science due to their expressiveness and flexibility.However, regular expressions have a quite compact and rather tolerant syntax that makes them ... Regular expressions are widely used within and even outside of computer science due to their expressiveness and flexibility.However, regular expressions have a quite compact and rather tolerant syntax that makes them hard to understand, hard to compose,and error-prone. Faulty regular expressions may cause failures of the applications that use them. Therefore, ensuring the correctness of regular expressions is a vital prerequisite for their use in practical applications. The importance and necessity of ensuring correct definitions of regular expressions have attracted extensive attention from researchers and practitioners, especially in recent years. In this study, we provide a review of the recent works for ensuring the correct usage of regular expressions. We classify those works into different categories, including the empirical study, test string generation, automatic synthesis and learning, static checking and verification,visual representation and explanation, and repairing. For each category, we review the main results, compare different approaches, and discuss their advantages and disadvantages. We also discuss some potential future research directions. 展开更多
关键词 regular expressions CORRECTNESS string generation learning static checking VERIFICATION VISUALIZATION repairing
原文传递
BSPM:A NEW MECHANISM FOR “OVERLAP-MATCHING EXPRESSIONS”IN DPI
2
作者 Li Zheng Yu Nenghai Li Yang 《Journal of Electronics(China)》 2010年第3期289-297,共9页
Nowadays, using Deterministic Finite Automata (DFA) or Non-deterministic Finite Automata (NFA) to parse regular expressions is the most popular way for Deep Packet Inspection (DPI), and the research about DPI focuses ... Nowadays, using Deterministic Finite Automata (DFA) or Non-deterministic Finite Automata (NFA) to parse regular expressions is the most popular way for Deep Packet Inspection (DPI), and the research about DPI focuses on the improvement of DFA to reduce memory. However, most of the existing literature ignores a special kind of "overlap-matching expression", which causes states explosion and takes quite a large part in the DPI rules. To solve this problem, in this paper a new mechanism is proposed based on bitmap. We start with a simple regular expression to describe "overlap-matching expressions" and state the problem. Then, after calculating the terrible number of exploded states for this kind of expressions, the procedure of Bitmap-based Soft Parallel Mechanism (BSPM) is described. Based on BSPM, we discuss all the different types of "overlap-matching ex- pressions" and give optimization suggestions of them separately. Finally, experiment results prove that BSPM can give an excellent performance on solving the problem stated above, and the optimization suggestions are also effective for the memory reduction on all types of "overlap-matching expressions". 展开更多
关键词 Intrusion detection Deep Packet Inspection (DPI) regular expressions Bitmap-based Deterministic Finite Automata (DFA)
下载PDF
Data Masking for Chinese Electronic Medical Records with Named Entity Recognition
3
作者 Tianyu He Xiaolong Xu +3 位作者 Zhichen Hu Qingzhan Zhao Jianguo Dai Fei Dai 《Intelligent Automation & Soft Computing》 SCIE 2023年第6期3657-3673,共17页
With the rapid development of information technology,the electronifi-cation of medical records has gradually become a trend.In China,the population base is huge and the supporting medical institutions are numerous,so ... With the rapid development of information technology,the electronifi-cation of medical records has gradually become a trend.In China,the population base is huge and the supporting medical institutions are numerous,so this reality drives the conversion of paper medical records to electronic medical records.Electronic medical records are the basis for establishing a smart hospital and an important guarantee for achieving medical intelligence,and the massive amount of electronic medical record data is also an important data set for conducting research in the medical field.However,electronic medical records contain a large amount of private patient information,which must be desensitized before they are used as open resources.Therefore,to solve the above problems,data masking for Chinese electronic medical records with named entity recognition is proposed in this paper.Firstly,the text is vectorized to satisfy the required format of the model input.Secondly,since the input sentences may have a long or short length and the relationship between sentences in context is not negligible.To this end,a neural network model for named entity recognition based on bidirectional long short-term memory(BiLSTM)with conditional random fields(CRF)is constructed.Finally,the data masking operation is performed based on the named entity recog-nition results,mainly using regular expression filtering encryption and principal component analysis(PCA)word vector compression and replacement.In addi-tion,comparison experiments with the hidden markov model(HMM)model,LSTM-CRF model,and BiLSTM model are conducted in this paper.The experi-mental results show that the method used in this paper achieves 92.72%Accuracy,92.30%Recall,and 92.51%F1_score,which has higher accuracy compared with other models. 展开更多
关键词 Named entity recognition Chinese electronic medical records data masking principal component analysis regular expression
下载PDF
Optimized XML Storage in NXD Based on Tree-Structure Disassemble
4
作者 LIU Yun-sheng WANG Yi ZHONG Hao 《Wuhan University Journal of Natural Sciences》 EI CAS 2006年第1期93-99,共7页
Independent XML storage based on XSD (XML Schema Document) is adopted in NXD(Native XML Data base), XMI. storage structure based on tree-structure disassemble and the algorithm used in dynamically updating XML doc... Independent XML storage based on XSD (XML Schema Document) is adopted in NXD(Native XML Data base), XMI. storage structure based on tree-structure disassemble and the algorithm used in dynamically updating XML document are provided in this paper. The main idea is that in term of data model of XML document, XML document is parsed to Document Structure-Tree with Hierarchical Model and Leaf-Data with Relation Model for storage. Simultaneously Proxy node is imported in order to solve the problem that XML data store in cross-blocks. And with XSD model information, sparse index is constructed to save storage space. It is proved that this storage structure could improve efficiency of XML document operation. 展开更多
关键词 XML storage storage model NXD (native XML database) document trees regular expression
下载PDF
Research of testing method based on UML statecharts
5
作者 占学德 《Journal of Shanghai University(English Edition)》 CAS 2006年第5期469-470,共2页
Unified modeling language (UML) is a powerful graphical modeling language with intuitional meaning. It provides various diagrams to depict system characteristics and complex environment from different viewpoints and... Unified modeling language (UML) is a powerful graphical modeling language with intuitional meaning. It provides various diagrams to depict system characteristics and complex environment from different viewpoints and different application layers. UML-based software development and modeling environments have been widely accepted in industry, including areas in which safety is an important issue such as spaceflight, defense, automobile, etc. To ensure and improve software quality becomes a main concern in the field. As one of the key techniques for software quality, software testing can effectively detect system faults. UML based software testing based is an important research direction in software engineering. The key to software testing is the generation of test cases. This dissertation studies an approach to generating test cases from UML statecharts. 展开更多
关键词 unified modeling language (UML) statechart formalsemantics flattened regular expression (FREE) model specification based software testing test criteria automatic generation of test case.
下载PDF
Semantic Recognition of a Data Structure in Big-Data
6
作者 Aicha Ben Salem Faouzi Boufares Sebastiao Correia 《Journal of Computer and Communications》 2014年第9期93-102,共10页
Data governance is a subject that is becoming increasingly important in business and government. In fact, good governance data allows improved interactions between employees of one or more organizations. Data quality ... Data governance is a subject that is becoming increasingly important in business and government. In fact, good governance data allows improved interactions between employees of one or more organizations. Data quality represents a great challenge because the cost of non-quality can be very high. Therefore the use of data quality becomes an absolute necessity within an organization. To improve the data quality in a Big-Data source, our purpose, in this paper, is to add semantics to data and help user to recognize the Big-Data schema. The originality of this approach lies in the semantic aspect it offers. It detects issues in data and proposes a data schema by applying a semantic data profiling. 展开更多
关键词 Data Quality Big-Data Semantic Data Profiling Data Dictionary regular expressions ONTOLOGY
下载PDF
IESRL:An information extraction system for research level
7
作者 Fuhai LENG Rujiang BAI Qingsong ZHU 《Chinese Journal of Library and Information Science》 2013年第4期16-27,共12页
Purpose:In order to annotate the semantic information and extract the research level information of research papers,we attempt to seek a method to develop an information extraction system.Design/methodology/approach:S... Purpose:In order to annotate the semantic information and extract the research level information of research papers,we attempt to seek a method to develop an information extraction system.Design/methodology/approach:Semantic dictionary and conditional random field model(CRFM)were used to annotate the semantic information of research papers.Based on the annotation results,the research level information was extracted through regular expression.All the functions were implemented on Sybase platform.Findings:According to the result of our experiment in carbon nanotube research,the precision and recall rates reached 65.13%and 57.75%,respectively after the semantic properties of word class have been labeled,and F-measure increased dramatically from less than 50%to60.18%while added with semantic features.Our experiment also showed that the information extraction system for research level(IESRL)can extract performance indicators from research papers rapidly and effectively.Research limitations:Some text information,such as that of format and chart,might have been lost due to the extraction processing of text format from PDF to TXT files.Semantic labeling on sentences could be insufficient due to the rich meaning of lexicons in the semantic dictionary.Research implications:The established system can help researchers rapidly compare the level of different research papers and find out their implicit innovation values.It could also be used as an auxiliary tool for analyzing research levels of various research institutions.Originality/value:In this work,we have successfully established an information extraction system for research papers by a revised semantic annotation method based on CRFM and the semantic dictionary.Our system can analyze the information extraction problem from two levels,i.e.from the sentence level and noun(phrase)level of research papers.Compared with the extraction method based on knowledge engineering and that on machine learning,our system shows advantages of the both. 展开更多
关键词 Research papers Information extraction Semantic labeling regular expression Conditional random fields Research level
下载PDF
Brane world black holes in teleparallel theory equivalent to general relativity and their Killing vectors,energy,momentum and angular momentum
8
作者 Gamal G.L.Nashed 《Chinese Physics B》 SCIE EI CAS CSCD 2010年第2期77-91,共15页
The energy--momentum tensor, which is coordinate-independent, is used to calculate energy, momentum and angular momentum of two different tetrad fields. Although, the two tetrad fields reproduce the same space--time t... The energy--momentum tensor, which is coordinate-independent, is used to calculate energy, momentum and angular momentum of two different tetrad fields. Although, the two tetrad fields reproduce the same space--time their energies are different. Therefore, a regularized expression of the gravitational energy--momentum tensor of the teleparallel equivalent of general relativity (TEGR), is used to make the energies of the two tetrad fields equal. The definition of the gravitational energy--momentum is used to investigate the energy within the external event horizon. The components of angular momentum associated with these space--times are calculated. In spite of using a static space--time, we get a non-zero component of angular momentum! Therefore, we derive the Killing vectors associated with these space--times using the definition of the Lie derivative of a second rank tensor in the framework of the TEGR to make the picture more clear. 展开更多
关键词 teleparallel equivalent of general relativity brane world black holes gravitational energy--momentum tensor regularized expression of the gravitational energy--momentum
下载PDF
L-Tree Match: A New Data Extraction Model and Algorithm for Huge Text Stream with Noises 被引量:4
9
作者 邓绪斌 朱扬勇 《Journal of Computer Science & Technology》 SCIE EI CSCD 2005年第6期763-773,共11页
In this paper, a new method, named as L-tree match, is presented for extracting data from complex data sources. Firstly, based on data extraction logic presented in this work, a new data extraction model is constructe... In this paper, a new method, named as L-tree match, is presented for extracting data from complex data sources. Firstly, based on data extraction logic presented in this work, a new data extraction model is constructed in which model components are structurally correlated via a generalized template. Secondly, a database-populating mechanism is built, along with some object-manipulating operations needed for flexible database design, to support data extraction from huge text stream. Thirdly, top-down and bottom-up strategies are combined to design a new extraction algorithm that can extract data from data sources with optional, unordered, nested, and/or noisy components. Lastly, this method is applied to extract accurate data from biological documents amounting to 100GB for the first online integrated biological data warehouse of China. 展开更多
关键词 data extraction data model extraction algorithm regular expression WRAPPER
原文传递
Accelerating Application Identification with Two-Stage Matching and Pre-Classification 被引量:1
10
作者 何飞 项帆 +2 位作者 邵熠阳 薛一波 李军 《Tsinghua Science and Technology》 SCIE EI CAS 2011年第4期422-431,共10页
Modern datacenter and enterprise networks require application identification to enable granular traffic control that eJther Jmproves data transfer rates or ensures network security. Providing application visi- bility ... Modern datacenter and enterprise networks require application identification to enable granular traffic control that eJther Jmproves data transfer rates or ensures network security. Providing application visi- bility as a core network function is challenging due to its performance requirements, including high through- put, low memory usage, and high identification accuracy. This paper presents a payload-based application identification method using a signature matching engine utilizing characteristics of the application identifica- tion. The solution uses two-stage matching and pre-classification to simultaneously improve the throughput and reduce the memory. Compared to a state-of-the-art common regular expression engine, this matching engine achieves 38% memory use reduction and triples the throughput. In addition, the solution is orthogonal to most existing optimization techniques for regular expression matching, which means it can be leveraged to further increase the performance of other matching algorithms. 展开更多
关键词 application identification deep inspection regular expression traffic classification
原文传递
Behavior-Consistent Service Substitutions in Dynamic Environments
11
作者 陈俊清 黄林鹏 于程远 《Journal of Shanghai Jiaotong university(Science)》 EI 2014年第1期17-27,共11页
In this paper, a novel approach for service substitutions based on the service type in terms of its interface type and behavior semantics is proposed. In order to analyze and verify behavior-consistent service substit... In this paper, a novel approach for service substitutions based on the service type in terms of its interface type and behavior semantics is proposed. In order to analyze and verify behavior-consistent service substitutions in dynamic environments, we first present a formal language to describe services from control-flow perspective, then introduce a type and effiect system to infer conservative approximations of all possible behaviors of these services. The service behaviors are represented by concurrent behavior expressions(CBEs). Built upon the interpretation of CBEs, behavior-consistent service substitutions are defined and analyzed by subtyping technology.The correctness of the analysis approach is guaranteed by type safety theorem, which is mechanically proved in the Coq proof assistant. Finally, applications in web services show that our method is effiective and feasible. 展开更多
关键词 behavior consistency concurrent regular expressions subtyping technology type and effiect systems service substitutions
原文传递
RPE Query Processing and Optimization Techniques for XML Databases 被引量:7
12
作者 Guo-RenWang BingSun Jian-HuaLv GeYu 《Journal of Computer Science & Technology》 SCIE EI CSCD 2004年第2期224-237,共14页
An extent join to compute path expressions containing parent-children andancestor-descendent operations and two path expression optimization rules, path-shortening andpath-complementing, are presented in this paper. P... An extent join to compute path expressions containing parent-children andancestor-descendent operations and two path expression optimization rules, path-shortening andpath-complementing, are presented in this paper. Path-shortening reduces the number of joins byshortening the path while path-complementing optimizes the path execution by using an equivalentcomplementary path expression to compute the original one. Experimental results show that thealgorithms proposed are more efficient than traditional algorithms. 展开更多
关键词 XML regular path expressions query processing and optimization
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部