The Extensible Markup Language(XML)files,widely used for storing and exchanging information on the web require efficient parsing mechanisms to improve the performance of the applications.With the existing Document Obj...The Extensible Markup Language(XML)files,widely used for storing and exchanging information on the web require efficient parsing mechanisms to improve the performance of the applications.With the existing Document Object Model(DOM)based parsing,the performance degrades due to sequential processing and large memory requirements,thereby requiring an efficient XML parser to mitigate these issues.In this paper,we propose a Parallel XML Tree Generator(PXTG)algorithm for accelerating the parsing of XML files and a Regression-based XML Parsing Framework(RXPF)that analyzes and predicts performance through profiling,regression,and code generation for efficient parsing.The PXTG algorithm is based on dividing the XML file into n parts and producing n trees in parallel.The profiling phase of the RXPF framework produces a dataset by measuring the performance of various parsing models including StAX,SAX,DOM,JDOM,and PXTG on different cores by using multiple file sizes.The regression phase produces the prediction model,based on which the final code for efficient parsing of XML files is produced through the code generation phase.The RXPF framework has shown a significant improvement in performance varying from 9.54%to 32.34%over other existing models used for parsing XML files.展开更多
Due to the lack of long-range association and spatial location information,fine details and accurate boundaries of complex clothing images cannot always be obtained by using the existing deep learning-based methods.Th...Due to the lack of long-range association and spatial location information,fine details and accurate boundaries of complex clothing images cannot always be obtained by using the existing deep learning-based methods.This paper presents a convolutional structure with multi-scale fusion to optimize the step of clothing feature extraction and a self-attention module to capture long-range association information.The structure enables the self-attention mechanism to directly participate in the process of information exchange through the down-scaling projection operation of the multi-scale framework.In addition,the improved self-attention module introduces the extraction of 2-dimensional relative position information to make up for its lack of ability to extract spatial position features from clothing images.The experimental results based on the colorful fashion parsing dataset(CFPD)show that the proposed network structure achieves 53.68%mean intersection over union(mIoU)and has better performance on the clothing parsing task.展开更多
The present work aims is to propose a solution for automating updates (MAJ) of the radio parameters of the ATOLL database from the OSS NetAct using Parsing. Indeed, this solution will be operated by the RAN (Radio Acc...The present work aims is to propose a solution for automating updates (MAJ) of the radio parameters of the ATOLL database from the OSS NetAct using Parsing. Indeed, this solution will be operated by the RAN (Radio Access Network) service of mobile operators, which ensures the planning and optimization of network coverage. The overall objective of this study is to make synchronous physical data of the sites deployed in the field with the ATOLL database which contains all the data of the coverage of the mobile networks of the operators. We have made an application that automates, updates with the following functionalities: import of radio parameters with the parsing method we have defined, visualization of data and its export to the Template of the ATOLL database. The results of the tests and validations of our application developed for a 4G network have made it possible to have a solution that performs updates with a constraint on the size of data to be imported. Our solution is a reliable resource for updating the databases containing the radio parameters of the network at all mobile operators, subject to a limitation in terms of the volume of data to be imported.展开更多
In this paper, we present a modular incremental statistical model for English full parsing. Unlike other full parsing approaches in which the analysis of the sentence is a uniform process, our model separates the full...In this paper, we present a modular incremental statistical model for English full parsing. Unlike other full parsing approaches in which the analysis of the sentence is a uniform process, our model separates the full parsing into shallow parsing and sentence skeleton parsing. In shallow parsing, we finish POS tagging, Base NP identification, prepositional phrase attachment and subordinate clause identification. In skeleton parsing, we use a layered feature-oriented statistical method. Modularity possesses the advantage of solving different problems in parsing with corresponding mechanisms. Feature-oriented rule is able to express the complex lingual phenomena at the key point if needed. Evaluated on Penn Treebank corpus, we obtained 89.2% precision and 89.8% recall.展开更多
Information content security is a branch of cyberspace security. How to effectively manage and use Weibo comment information has become a research focus in the field of information content security. Three main tasks i...Information content security is a branch of cyberspace security. How to effectively manage and use Weibo comment information has become a research focus in the field of information content security. Three main tasks involved are emotion sentence identification and classification,emotion tendency classification,and emotion expression extraction. Combining with the latent Dirichlet allocation(LDA) model,a Gibbs sampling implementation for inference of our algorithm is presented,and can be used to categorize emotion tendency automatically with the computer. In accordance with the lower ratio of recall for emotion expression extraction in Weibo,use dependency parsing,divided into two categories with subject and object,summarized six kinds of dependency models from evaluating objects and emotion words,and proposed that a merge algorithm for evaluating objects can be accurately evaluated by participating in a public bakeoff and in the shared tasks among the best methods in the sub-task of emotion expression extraction,indicating the value of our method as not only innovative but practical.展开更多
Natural language parsing is a task of great importance and extreme difficulty. In this paper, we present a full Chinese parsing system based on a two-stage approach. Rather than identifying all phrases by a uniform mo...Natural language parsing is a task of great importance and extreme difficulty. In this paper, we present a full Chinese parsing system based on a two-stage approach. Rather than identifying all phrases by a uniform model, we utilize a divide and conquer strategy. We propose an effective and fast method based on Markov model to identify the base phrases. Then we make the first attempt to extend one of the best English parsing models i.e. the head-driven model to recognize Chinese complex phrases. Our two-stage approach is superior to the uniform approach in two aspects. First, it creates synergy between the Markov model and the head-driven model. Second, it reduces the complexity of full Chinese parsing and makes the parsing system space and time efficient. We evaluate our approach in PARSEVAL measures on the open test set, the parsing system performances at 87.53% precision, 87.95% recall.展开更多
Head-driven statistical models for natural language parsing are the most representative lexicalized syntactic parsing models, but they only utilize semantic dependency between words, and do not incorporate other seman...Head-driven statistical models for natural language parsing are the most representative lexicalized syntactic parsing models, but they only utilize semantic dependency between words, and do not incorporate other semantic information such as semantic collocation and semantic category. Some improvements on this distinctive parser are presented. Firstly, "valency" is an essential semantic feature of words. Once the valency of word is determined, the collocation of the word is clear, and the sentence structure can be directly derived. Thus, a syntactic parsing model combining valence structure with semantic dependency is purposed on the base of head-driven statistical syntactic parsing models. Secondly, semantic role labeling(SRL) is very necessary for deep natural language processing. An integrated parsing approach is proposed to integrate semantic parsing into the syntactic parsing process. Experiments are conducted for the refined statistical parser. The results show that 87.12% precision and 85.04% recall are obtained, and F measure is improved by 5.68% compared with the head-driven parsing model introduced by Collins.展开更多
This paper proposes a new way to improve the performance of dependency parser: subdividing verbs according to their grammatical functions and integrating the information of verb subclasses into lexicalized parsing mod...This paper proposes a new way to improve the performance of dependency parser: subdividing verbs according to their grammatical functions and integrating the information of verb subclasses into lexicalized parsing model. Firstly,the scheme of verb subdivision is described. Secondly,a maximum entropy model is presented to distinguish verb subclasses. Finally,a statistical parser is developed to evaluate the verb subdivision. Experimental results indicate that the use of verb subclasses has a good influence on parsing performance.展开更多
Clothing parsing, also known as clothing image segmentation, is the problem of assigning a clothing category label to each pixel in clothing images. To address the lack of positional and global prior in existing cloth...Clothing parsing, also known as clothing image segmentation, is the problem of assigning a clothing category label to each pixel in clothing images. To address the lack of positional and global prior in existing clothing parsing algorithms, this paper proposes an enhanced positional attention module(EPAM) to collect positional information in the vertical direction of each pixel, and an efficient global prior module(GPM) to aggregate contextual information from different sub-regions. The EPAM and GPM based residual network(EG-ResNet) could effectively exploit the intrinsic features of clothing images while capturing information between different scales and sub-regions. Experimental results show that the proposed EG-ResNet achieves promising performance in clothing parsing of the colorful fashion parsing dataset(CFPD)(51.12% of mean Intersection over Union(mIoU) and 92.79% of pixel-wise accuracy(PA)) compared with other state-of-the-art methods.展开更多
Currently, large amounts of information exist in Web sites and various digital media. Most of them are in natural lan-guage. They are easy to be browsed, but difficult to be understood by computer. Chunk parsing and e...Currently, large amounts of information exist in Web sites and various digital media. Most of them are in natural lan-guage. They are easy to be browsed, but difficult to be understood by computer. Chunk parsing and entity relation extracting is important work to understanding information semantic in natural language processing. Chunk analysis is a shallow parsing method, and entity relation extraction is used in establishing relationship between entities. Because full syntax parsing is complexity in Chinese text understanding, many researchers is more interesting in chunk analysis and relation extraction. Conditional random fields (CRFs) model is the valid probabilistic model to segment and label sequence data. This paper models chunk and entity relation problems in Chinese text. By transforming them into label solution we can use CRFs to realize the chunk analysis and entities relation extraction.展开更多
Video events recognition is a challenging task for high-level understanding of video se- quence. At present, there are two major limitations in existing methods for events recognition. One is that no algorithms are av...Video events recognition is a challenging task for high-level understanding of video se- quence. At present, there are two major limitations in existing methods for events recognition. One is that no algorithms are available to recognize events which happen alternately. The other is that the temporal relationship between atomic actions is not fully utilized. Aiming at these problems, an algo- rithm based on an extended stochastic context-free grammar (SCFG) representation is proposed for events recognition. Events are modeled by a series of atomic actions and represented by an extended SCFG. The extended SCFG can express the hierarchical structure of the events and the temporal re- lationship between the atomic actions. In comparison with previous work, the main contributions of this paper are as follows: ① Events (include alternating events) can be recognized by an improved stochastic parsing and shortest path finding algorithm. ② The algorithm can disambiguate the detec- tion results of atomic actions by event context. Experimental results show that the proposed algo- rithm can recognize events accurately and most atomic action detection errors can be corrected sim- ultaneously.展开更多
A fast method for phrase structure grammar analysis is proposed based on conditional ran- dom fields (CRF). The method trains several CRF classifiers for recognizing the phrase nodes at dif- ferent levels, and uses ...A fast method for phrase structure grammar analysis is proposed based on conditional ran- dom fields (CRF). The method trains several CRF classifiers for recognizing the phrase nodes at dif- ferent levels, and uses the bottom-up to connect the recognized phrase nodes to construct the syn- tactic tree. On the basis of Beijing forest studio Chinese tagged corpus, two experiments are de- signed to select the training parameters and verify the validity of the method. The result shows that the method costs 78. 98 ms and 4. 63 ms to train and test a Chinese sentence of 17. 9 words. The method is a new way to parse the phrase structure grammar for Chinese, and has good generalization ability and fast speed.展开更多
Fine-grained visual parsing, including fine-grained part segmentation and fine-grained object recognition, has attracted considerable critical attention due to its importance in many real-world applications, e.g., agr...Fine-grained visual parsing, including fine-grained part segmentation and fine-grained object recognition, has attracted considerable critical attention due to its importance in many real-world applications, e.g., agriculture, remote sensing, and space technologies. Predominant research efforts tackle these fine-grained sub-tasks following different paradigms, while the inherent relations between these tasks are neglected. Moreover, given most of the research remains fragmented, we conduct an in-depth study of the advanced work from a new perspective of learning the part relationship. In this perspective, we first consolidate recent research and benchmark syntheses with new taxonomies. Based on this consolidation, we revisit the universal challenges in fine-grained part segmentation and recognition tasks and propose new solutions by part relationship learning for these important challenges. Furthermore, we conclude several promising lines of research in fine-grained visual parsing for future research.展开更多
We present a novel framework,CLIPSP,and a novel adaptive prompt method to leverage pre-trained knowledge from CLIP for scene parsing.Our approach addresses the limitations of DenseCLIP,which demonstrates the superior ...We present a novel framework,CLIPSP,and a novel adaptive prompt method to leverage pre-trained knowledge from CLIP for scene parsing.Our approach addresses the limitations of DenseCLIP,which demonstrates the superior image segmentation provided by CLIP pre-trained models over ImageNet pre-trained models,but struggles with rough pixel-text score maps for complex scene parsing.We argue that,as they contain all textual information in a dataset,the pixel-text score maps,i.e.,dense prompts,are inevitably mixed with noise.To overcome this challenge,we propose a two-step method.Firstly,we extract visual and language features and perform multi-label classification to identify the most likely categories in the input images.Secondly,based on the top-k categories and confidence scores,our method generates scene tokens which can be treated as adaptive prompts for implicit modeling of scenes,and incorporates them into the visual features fed into the decoder for segmentation.Our method imposes a constraint on prompts and suppresses the probability of irrelevant categories appearing in the scene parsing results.Our method achieves competitive performance,limited by the available visual-language pre-trained models.Our CLIP-SP performs 1.14%better(in terms of mIoU)than DenseCLIP on ADE20K,using a ResNet-50 backbone.展开更多
Syntactic and semantic parsing has been investigated for decades,which is one primary topic in the natural language processing community.This article aims for a brief survey on this topic.The parsing community include...Syntactic and semantic parsing has been investigated for decades,which is one primary topic in the natural language processing community.This article aims for a brief survey on this topic.The parsing community includes many tasks,which are difficult to be covered fully.Here we focus on two of the most popular formalizations of parsing:constituent parsing and dependency parsing.Constituent parsing is majorly targeted to syntactic analysis,and dependency parsing can handle both syntactic and semantic analysis.This article briefly reviews the representative models of constituent parsing and dependency parsing,and also dependency graph parsing with rich semantics.Besides,we also review the closely-related topics such as cross-domain,cross-lingual and joint parsing models,parser application as well as corpus development of parsing in the article.展开更多
Logs contain runtime information for both systems and users.As many of them use natural language,a typical log-based analysis needs to parse logs into the structured format first.Existing parsing approaches often take...Logs contain runtime information for both systems and users.As many of them use natural language,a typical log-based analysis needs to parse logs into the structured format first.Existing parsing approaches often take two steps.The first step is to find similar words(tokens)or sentences.Second,parsers extract log templates by replacing different tokens with variable placeholders.However,we observe that most parsers concentrate on precisely grouping similar tokens or logs.But they do not have a well-designed template extraction process,which leads to inconsistent accuracy on particular datasets.The root cause is the ambiguous definition of variable placeholders and similar templates.The consequences include abuse of variable placeholders,incorrectly divided templates,and an excessive number of templates over time.In this paper,we propose our online log parsing approach Cognition.It redefines variable placeholders via a strict lower bound to avoid ambiguity first.Then,it applies our template correction technique to merge and absorb similar templates.It eliminates the interference of commonly used parameters and thus isolates template quantity.Evaluation through 16 public datasets shows that Cognition has better accuracy and consistency than the state-of-the-art approaches.It also saves up to 52.1%of time cost on average than the others.展开更多
Early studies on discourse rhetorical structure parsing mainly adopt bottom-up approaches,limiting the parsing process to local information.Although current top-down parsers can better capture global information and h...Early studies on discourse rhetorical structure parsing mainly adopt bottom-up approaches,limiting the parsing process to local information.Although current top-down parsers can better capture global information and have achieved particular success,the importance of local and global information at various levels of discourse parsing is differ-ent.This paper argues that combining local and global information for discourse parsing is more sensible.To prove this,we introduce a top-down discourse parser with bidirectional representation learning capabilities.Existing corpora on Rhetorical Structure Theory(RST)are known to be much limited in size,which makes discourse parsing very challenging.To alleviate this problem,we leverage some boundary features and a data augmentation strategy to tap the potential of our parser.We use two methods for evaluation,and the experiments on the RST-DT corpus show that our parser can pri-marily improve the performance due to the effective combination of local and global information.The boundary features and the data augmentation strategy also play a role.Based on gold standard elementary discourse units(EDUs),our pars-er significantly advances the baseline systems in nuclearity detection,with the results on the other three indicators(span,relation,and full)being competitive.Based on automatically segmented EDUs,our parser still outperforms previous state-of-the-artwork.展开更多
This paper puts forward and explores the problem of empty element (EE) recovery in Chinese from the syntactic parsing perspective, which has been largely ignored in the literature. First, we demonstrate why EEs play...This paper puts forward and explores the problem of empty element (EE) recovery in Chinese from the syntactic parsing perspective, which has been largely ignored in the literature. First, we demonstrate why EEs play a critical role in syntactic parsing of Chinese and how EEs can better benefit syntactic parsing of Chinese via re-categorization from the syntactic perspective. Then, we propose two ways to automatically recover EEs: a joint constituent parsing approach and a chunk-based dependency parsing approach. Evaluation on the Chinese TreeBank (CTB) 5.1 corpus shows that integrating EE recovery into the Charniak parser achieves a significant performance improvement of 1.29 in Fl-measure. To the best of our knowledge, this is the first close examination of EEs in syntactic parsing of Chinese, which deserves more attention in the future with regard to its specific importance.展开更多
This paper introduces Certis, a powerful framework that addresses the challenges of cloud asset tracking, management, and threat detection in modern cybersecurity landscapes. It enhances asset identification and anoma...This paper introduces Certis, a powerful framework that addresses the challenges of cloud asset tracking, management, and threat detection in modern cybersecurity landscapes. It enhances asset identification and anomaly detection through SSL certificate parsing, cloud service provider integration, and advanced fingerprinting techniques like JARM at the application layer. Current work will focus on cross-layer malicious behavior identification to further enhance its capabilities, including minimizing false positives through AI-based learning techniques. Certis promises to offer a powerful solution for organizations seeking proactive cybersecurity defenses in the face of evolving threats.展开更多
We utilized Raspberry Pi 4B to develop a microbial monitoring system to simplify the microbial image-capturing process and facilitate the informatization of microbial observation results.The Raspberry Pi 4B firmware,d...We utilized Raspberry Pi 4B to develop a microbial monitoring system to simplify the microbial image-capturing process and facilitate the informatization of microbial observation results.The Raspberry Pi 4B firmware,developed under Python on the Linux platform,achieves sum verification of serial data,file upload based on TCP protocol,control of sequence light source and light valve,real-time self-test based on multithreading,and an experiment-oriented file management method.The system demonstrated improved code logic,scheduling,exception handling,and code readability.展开更多
文摘The Extensible Markup Language(XML)files,widely used for storing and exchanging information on the web require efficient parsing mechanisms to improve the performance of the applications.With the existing Document Object Model(DOM)based parsing,the performance degrades due to sequential processing and large memory requirements,thereby requiring an efficient XML parser to mitigate these issues.In this paper,we propose a Parallel XML Tree Generator(PXTG)algorithm for accelerating the parsing of XML files and a Regression-based XML Parsing Framework(RXPF)that analyzes and predicts performance through profiling,regression,and code generation for efficient parsing.The PXTG algorithm is based on dividing the XML file into n parts and producing n trees in parallel.The profiling phase of the RXPF framework produces a dataset by measuring the performance of various parsing models including StAX,SAX,DOM,JDOM,and PXTG on different cores by using multiple file sizes.The regression phase produces the prediction model,based on which the final code for efficient parsing of XML files is produced through the code generation phase.The RXPF framework has shown a significant improvement in performance varying from 9.54%to 32.34%over other existing models used for parsing XML files.
文摘Due to the lack of long-range association and spatial location information,fine details and accurate boundaries of complex clothing images cannot always be obtained by using the existing deep learning-based methods.This paper presents a convolutional structure with multi-scale fusion to optimize the step of clothing feature extraction and a self-attention module to capture long-range association information.The structure enables the self-attention mechanism to directly participate in the process of information exchange through the down-scaling projection operation of the multi-scale framework.In addition,the improved self-attention module introduces the extraction of 2-dimensional relative position information to make up for its lack of ability to extract spatial position features from clothing images.The experimental results based on the colorful fashion parsing dataset(CFPD)show that the proposed network structure achieves 53.68%mean intersection over union(mIoU)and has better performance on the clothing parsing task.
文摘The present work aims is to propose a solution for automating updates (MAJ) of the radio parameters of the ATOLL database from the OSS NetAct using Parsing. Indeed, this solution will be operated by the RAN (Radio Access Network) service of mobile operators, which ensures the planning and optimization of network coverage. The overall objective of this study is to make synchronous physical data of the sites deployed in the field with the ATOLL database which contains all the data of the coverage of the mobile networks of the operators. We have made an application that automates, updates with the following functionalities: import of radio parameters with the parsing method we have defined, visualization of data and its export to the Template of the ATOLL database. The results of the tests and validations of our application developed for a 4G network have made it possible to have a solution that performs updates with a constraint on the size of data to be imported. Our solution is a reliable resource for updating the databases containing the radio parameters of the network at all mobile operators, subject to a limitation in terms of the volume of data to be imported.
文摘In this paper, we present a modular incremental statistical model for English full parsing. Unlike other full parsing approaches in which the analysis of the sentence is a uniform process, our model separates the full parsing into shallow parsing and sentence skeleton parsing. In shallow parsing, we finish POS tagging, Base NP identification, prepositional phrase attachment and subordinate clause identification. In skeleton parsing, we use a layered feature-oriented statistical method. Modularity possesses the advantage of solving different problems in parsing with corresponding mechanisms. Feature-oriented rule is able to express the complex lingual phenomena at the key point if needed. Evaluated on Penn Treebank corpus, we obtained 89.2% precision and 89.8% recall.
基金supported by National Key Basic Research Program of China (No.2014CB340600)partially supported by National Natural Science Foundation of China (Grant Nos.61332019,61672531)partially supported by National Social Science Foundation of China (Grant No.14GJ003-152)
文摘Information content security is a branch of cyberspace security. How to effectively manage and use Weibo comment information has become a research focus in the field of information content security. Three main tasks involved are emotion sentence identification and classification,emotion tendency classification,and emotion expression extraction. Combining with the latent Dirichlet allocation(LDA) model,a Gibbs sampling implementation for inference of our algorithm is presented,and can be used to categorize emotion tendency automatically with the computer. In accordance with the lower ratio of recall for emotion expression extraction in Weibo,use dependency parsing,divided into two categories with subject and object,summarized six kinds of dependency models from evaluating objects and emotion words,and proposed that a merge algorithm for evaluating objects can be accurately evaluated by participating in a public bakeoff and in the shared tasks among the best methods in the sub-task of emotion expression extraction,indicating the value of our method as not only innovative but practical.
基金国家高技术研究发展计划(863计划),the National Natural Science Foundation of China
文摘Natural language parsing is a task of great importance and extreme difficulty. In this paper, we present a full Chinese parsing system based on a two-stage approach. Rather than identifying all phrases by a uniform model, we utilize a divide and conquer strategy. We propose an effective and fast method based on Markov model to identify the base phrases. Then we make the first attempt to extend one of the best English parsing models i.e. the head-driven model to recognize Chinese complex phrases. Our two-stage approach is superior to the uniform approach in two aspects. First, it creates synergy between the Markov model and the head-driven model. Second, it reduces the complexity of full Chinese parsing and makes the parsing system space and time efficient. We evaluate our approach in PARSEVAL measures on the open test set, the parsing system performances at 87.53% precision, 87.95% recall.
基金Project(61262035) supported by the National Natural Science Foundation of ChinaProjects(GJJ12271,GJJ12742) supported by the Science and Technology Foundation of Education Department of Jiangxi Province,ChinaProject(20122BAB201033) supported by the Natural Science Foundation of Jiangxi Province,China
文摘Head-driven statistical models for natural language parsing are the most representative lexicalized syntactic parsing models, but they only utilize semantic dependency between words, and do not incorporate other semantic information such as semantic collocation and semantic category. Some improvements on this distinctive parser are presented. Firstly, "valency" is an essential semantic feature of words. Once the valency of word is determined, the collocation of the word is clear, and the sentence structure can be directly derived. Thus, a syntactic parsing model combining valence structure with semantic dependency is purposed on the base of head-driven statistical syntactic parsing models. Secondly, semantic role labeling(SRL) is very necessary for deep natural language processing. An integrated parsing approach is proposed to integrate semantic parsing into the syntactic parsing process. Experiments are conducted for the refined statistical parser. The results show that 87.12% precision and 85.04% recall are obtained, and F measure is improved by 5.68% compared with the head-driven parsing model introduced by Collins.
基金the National Natural Science Foundation of China (No.60435020, 60575042 and 60503072).
文摘This paper proposes a new way to improve the performance of dependency parser: subdividing verbs according to their grammatical functions and integrating the information of verb subclasses into lexicalized parsing model. Firstly,the scheme of verb subdivision is described. Secondly,a maximum entropy model is presented to distinguish verb subclasses. Finally,a statistical parser is developed to evaluate the verb subdivision. Experimental results indicate that the use of verb subclasses has a good influence on parsing performance.
基金National Natural Science Foundation of China (No.62006039)Shanghai Special Fund for Software and Integrated Circuit Industry Development,China (No.180330)。
文摘Clothing parsing, also known as clothing image segmentation, is the problem of assigning a clothing category label to each pixel in clothing images. To address the lack of positional and global prior in existing clothing parsing algorithms, this paper proposes an enhanced positional attention module(EPAM) to collect positional information in the vertical direction of each pixel, and an efficient global prior module(GPM) to aggregate contextual information from different sub-regions. The EPAM and GPM based residual network(EG-ResNet) could effectively exploit the intrinsic features of clothing images while capturing information between different scales and sub-regions. Experimental results show that the proposed EG-ResNet achieves promising performance in clothing parsing of the colorful fashion parsing dataset(CFPD)(51.12% of mean Intersection over Union(mIoU) and 92.79% of pixel-wise accuracy(PA)) compared with other state-of-the-art methods.
文摘Currently, large amounts of information exist in Web sites and various digital media. Most of them are in natural lan-guage. They are easy to be browsed, but difficult to be understood by computer. Chunk parsing and entity relation extracting is important work to understanding information semantic in natural language processing. Chunk analysis is a shallow parsing method, and entity relation extraction is used in establishing relationship between entities. Because full syntax parsing is complexity in Chinese text understanding, many researchers is more interesting in chunk analysis and relation extraction. Conditional random fields (CRFs) model is the valid probabilistic model to segment and label sequence data. This paper models chunk and entity relation problems in Chinese text. By transforming them into label solution we can use CRFs to realize the chunk analysis and entities relation extraction.
基金Supported by the National Natural Science Foundation of China(60805028,60903146)Natural Science Foundation of Shandong Province of China (ZR2010FM027)+1 种基金SDUST Research Fund(2010KYTD101)China Postdoctoral Science Foundation(2012M521336)
文摘Video events recognition is a challenging task for high-level understanding of video se- quence. At present, there are two major limitations in existing methods for events recognition. One is that no algorithms are available to recognize events which happen alternately. The other is that the temporal relationship between atomic actions is not fully utilized. Aiming at these problems, an algo- rithm based on an extended stochastic context-free grammar (SCFG) representation is proposed for events recognition. Events are modeled by a series of atomic actions and represented by an extended SCFG. The extended SCFG can express the hierarchical structure of the events and the temporal re- lationship between the atomic actions. In comparison with previous work, the main contributions of this paper are as follows: ① Events (include alternating events) can be recognized by an improved stochastic parsing and shortest path finding algorithm. ② The algorithm can disambiguate the detec- tion results of atomic actions by event context. Experimental results show that the proposed algo- rithm can recognize events accurately and most atomic action detection errors can be corrected sim- ultaneously.
基金Supported by the Science and Technology Innovation Plan of Beijing Institute of Technology(2013)
文摘A fast method for phrase structure grammar analysis is proposed based on conditional ran- dom fields (CRF). The method trains several CRF classifiers for recognizing the phrase nodes at dif- ferent levels, and uses the bottom-up to connect the recognized phrase nodes to construct the syn- tactic tree. On the basis of Beijing forest studio Chinese tagged corpus, two experiments are de- signed to select the training parameters and verify the validity of the method. The result shows that the method costs 78. 98 ms and 4. 63 ms to train and test a Chinese sentence of 17. 9 words. The method is a new way to parse the phrase structure grammar for Chinese, and has good generalization ability and fast speed.
基金supported in part by National Natural Science Foundation of China(Nos.62132002,61825101 and 62202010)the Key-Area Research and Development Program of Guangdong Province,China(No.2021B0101400002)the China Postdoctoral Science Foundation(No.2022M710212).
文摘Fine-grained visual parsing, including fine-grained part segmentation and fine-grained object recognition, has attracted considerable critical attention due to its importance in many real-world applications, e.g., agriculture, remote sensing, and space technologies. Predominant research efforts tackle these fine-grained sub-tasks following different paradigms, while the inherent relations between these tasks are neglected. Moreover, given most of the research remains fragmented, we conduct an in-depth study of the advanced work from a new perspective of learning the part relationship. In this perspective, we first consolidate recent research and benchmark syntheses with new taxonomies. Based on this consolidation, we revisit the universal challenges in fine-grained part segmentation and recognition tasks and propose new solutions by part relationship learning for these important challenges. Furthermore, we conclude several promising lines of research in fine-grained visual parsing for future research.
文摘We present a novel framework,CLIPSP,and a novel adaptive prompt method to leverage pre-trained knowledge from CLIP for scene parsing.Our approach addresses the limitations of DenseCLIP,which demonstrates the superior image segmentation provided by CLIP pre-trained models over ImageNet pre-trained models,but struggles with rough pixel-text score maps for complex scene parsing.We argue that,as they contain all textual information in a dataset,the pixel-text score maps,i.e.,dense prompts,are inevitably mixed with noise.To overcome this challenge,we propose a two-step method.Firstly,we extract visual and language features and perform multi-label classification to identify the most likely categories in the input images.Secondly,based on the top-k categories and confidence scores,our method generates scene tokens which can be treated as adaptive prompts for implicit modeling of scenes,and incorporates them into the visual features fed into the decoder for segmentation.Our method imposes a constraint on prompts and suppresses the probability of irrelevant categories appearing in the scene parsing results.Our method achieves competitive performance,limited by the available visual-language pre-trained models.Our CLIP-SP performs 1.14%better(in terms of mIoU)than DenseCLIP on ADE20K,using a ResNet-50 backbone.
基金the National Natural Science Foundation of China(Grant Nos.61602160 and 61672211)。
文摘Syntactic and semantic parsing has been investigated for decades,which is one primary topic in the natural language processing community.This article aims for a brief survey on this topic.The parsing community includes many tasks,which are difficult to be covered fully.Here we focus on two of the most popular formalizations of parsing:constituent parsing and dependency parsing.Constituent parsing is majorly targeted to syntactic analysis,and dependency parsing can handle both syntactic and semantic analysis.This article briefly reviews the representative models of constituent parsing and dependency parsing,and also dependency graph parsing with rich semantics.Besides,we also review the closely-related topics such as cross-domain,cross-lingual and joint parsing models,parser application as well as corpus development of parsing in the article.
基金supported by the National Key Research and Development Program of China under Grant No.2019YFB1802800the National Science Fund for Distinguished Young Scholars of China under Grant No.61725206。
文摘Logs contain runtime information for both systems and users.As many of them use natural language,a typical log-based analysis needs to parse logs into the structured format first.Existing parsing approaches often take two steps.The first step is to find similar words(tokens)or sentences.Second,parsers extract log templates by replacing different tokens with variable placeholders.However,we observe that most parsers concentrate on precisely grouping similar tokens or logs.But they do not have a well-designed template extraction process,which leads to inconsistent accuracy on particular datasets.The root cause is the ambiguous definition of variable placeholders and similar templates.The consequences include abuse of variable placeholders,incorrectly divided templates,and an excessive number of templates over time.In this paper,we propose our online log parsing approach Cognition.It redefines variable placeholders via a strict lower bound to avoid ambiguity first.Then,it applies our template correction technique to merge and absorb similar templates.It eliminates the interference of commonly used parameters and thus isolates template quantity.Evaluation through 16 public datasets shows that Cognition has better accuracy and consistency than the state-of-the-art approaches.It also saves up to 52.1%of time cost on average than the others.
基金supported by the National Natural Science Foundation of China under Grant No.62276178。
文摘Early studies on discourse rhetorical structure parsing mainly adopt bottom-up approaches,limiting the parsing process to local information.Although current top-down parsers can better capture global information and have achieved particular success,the importance of local and global information at various levels of discourse parsing is differ-ent.This paper argues that combining local and global information for discourse parsing is more sensible.To prove this,we introduce a top-down discourse parser with bidirectional representation learning capabilities.Existing corpora on Rhetorical Structure Theory(RST)are known to be much limited in size,which makes discourse parsing very challenging.To alleviate this problem,we leverage some boundary features and a data augmentation strategy to tap the potential of our parser.We use two methods for evaluation,and the experiments on the RST-DT corpus show that our parser can pri-marily improve the performance due to the effective combination of local and global information.The boundary features and the data augmentation strategy also play a role.Based on gold standard elementary discourse units(EDUs),our pars-er significantly advances the baseline systems in nuclearity detection,with the results on the other three indicators(span,relation,and full)being competitive.Based on automatically segmented EDUs,our parser still outperforms previous state-of-the-artwork.
基金Supported by the National Natural Science Foundation of China under Grant Nos.61273320,61331011,61070123the National High Technology Research and Development 863 Program of China under Grant No.2012AA011102
文摘This paper puts forward and explores the problem of empty element (EE) recovery in Chinese from the syntactic parsing perspective, which has been largely ignored in the literature. First, we demonstrate why EEs play a critical role in syntactic parsing of Chinese and how EEs can better benefit syntactic parsing of Chinese via re-categorization from the syntactic perspective. Then, we propose two ways to automatically recover EEs: a joint constituent parsing approach and a chunk-based dependency parsing approach. Evaluation on the Chinese TreeBank (CTB) 5.1 corpus shows that integrating EE recovery into the Charniak parser achieves a significant performance improvement of 1.29 in Fl-measure. To the best of our knowledge, this is the first close examination of EEs in syntactic parsing of Chinese, which deserves more attention in the future with regard to its specific importance.
文摘This paper introduces Certis, a powerful framework that addresses the challenges of cloud asset tracking, management, and threat detection in modern cybersecurity landscapes. It enhances asset identification and anomaly detection through SSL certificate parsing, cloud service provider integration, and advanced fingerprinting techniques like JARM at the application layer. Current work will focus on cross-layer malicious behavior identification to further enhance its capabilities, including minimizing false positives through AI-based learning techniques. Certis promises to offer a powerful solution for organizations seeking proactive cybersecurity defenses in the face of evolving threats.
文摘We utilized Raspberry Pi 4B to develop a microbial monitoring system to simplify the microbial image-capturing process and facilitate the informatization of microbial observation results.The Raspberry Pi 4B firmware,developed under Python on the Linux platform,achieves sum verification of serial data,file upload based on TCP protocol,control of sequence light source and light valve,real-time self-test based on multithreading,and an experiment-oriented file management method.The system demonstrated improved code logic,scheduling,exception handling,and code readability.