期刊文献+
共找到4篇文章
< 1 >
每页显示 20 50 100
Analysis on n-gram statistics and linguistic features of whole genome protein sequences
1
作者 董启文 王晓龙 林磊 《Journal of Harbin Institute of Technology(New Series)》 EI CAS 2008年第5期694-698,共5页
To obtain the statistical sequence analysis on a large number of genomic and proteomic sequences available for different organisms, the n-grams of whole genome protein sequences from 20 organisms were extracted. Their... To obtain the statistical sequence analysis on a large number of genomic and proteomic sequences available for different organisms, the n-grams of whole genome protein sequences from 20 organisms were extracted. Their linguistic features were analyzed by two tests: Zipf power law and Shannon entropy, developed for analysis of natural languages and symbolic sequences. The natural genome proteins and the artificial genome proteins were compared with each other and some statistical features of n-grams were discovered. The results show that: the n-grams of whole genome protein sequences approximately follow the Zipf law when n is larger than 4; the Shannon n-gram entropy of natural genome proteins is lower than that of artificial proteins; a simple uni-gram model can distinguish different organisms; there exist organism-specific usages of "phrases" in protein sequences. It is suggested that further detailed analysis on n-gram of whole genome protein sequences will result in a powerful model for mapping the relationship of protein sequence, structure and function. 展开更多
关键词 n-gram statistics protein sequence Zipf law
下载PDF
Enhancing Amharic Information Retrieval System Based on Statistical Co-Occurrence Technique
2
作者 Abey Bruck Tulu Tilahun 《Journal of Computer and Communications》 2015年第12期67-76,共10页
Information retrieval (IR) systems are designed to help information seekers retrieving relevant information from vast document. The need for relevant information from a vast amount of document gave birth to IR systems... Information retrieval (IR) systems are designed to help information seekers retrieving relevant information from vast document. The need for relevant information from a vast amount of document gave birth to IR systems. Even though different IR systems exist, they cannot meet all users’ expectations. A different level of users’ knowledge makes queries to be expressed in different ways. As a result, the system may miss the core meaning of users query and retrieve dissatisfactory results. This happens mainly because of the ambiguities of words involved in the natural languages and expression mismatch among users and authors. The existing ambiguities in Amharic language have negative impacts on the performance of Amharic IR system. Some of the ambiguities for this type of problem are: spelling variants of the same word, polysemous and synonymous terms. If users are not fully knowledgeable about the information domain area, they will mostly formulate weak queries to retrieve documents. Thus, they end up frustrated with the results found from an IR system. This research has been conducted, aiming at augmenting the recall of previous work. Statistical co-occurrence technique has been used in order to expand query terms. The main reason for performing query expansion is to provide relevant documents as per users’ query that can satisfy their information need. Statistical co-occurrence method considers, frequently appearing terms with the query term, regardless of their position. The efficiency of proposed technique has been tested on the prototype system and the result found compared with the result of previous study. Accordingly, 6% recall and 2% f-measure improvement has been made. Hence, the statistical co-occurrence method outperformed the bi-gram based IR system. 展开更多
关键词 statistICAL co-occurrence Information RETRIEVAL QUERY EXPANSION Amharic
下载PDF
Statistical Language Model for Chinese Text Proofreading
3
作者 张仰森 曹元大 《Journal of Beijing Institute of Technology》 EI CAS 2003年第4期441-445,共5页
Statistical language modeling techniques are investigated so as to construct a language model for Chinese text proofreading. After the defects of n-gram model are analyzed, a novel statistical language model for Chine... Statistical language modeling techniques are investigated so as to construct a language model for Chinese text proofreading. After the defects of n-gram model are analyzed, a novel statistical language model for Chinese text proofreading is proposed. This model takes full account of the information located before and after the target word wi, and the relationship between un-neighboring words w_i and w_j in linguistic environment(LE). First, the word association degree between w_i and w_j is defined by using the distance-weighted factor, w_j is l words apart from w_i in the LE, then Bayes formula is used to calculate the LE related degree of word w_i, and lastly, the LE related degree is taken as criterion to predict the reasonability of word w_i that appears in context. Comparing the proposed model with the traditional n-gram in a Chinese text automatic error detection system, the experiments results show that the error detection recall rate and precision rate of the system have been improved. 展开更多
关键词 statistical language model n-gram linguistic environment text proofreading
下载PDF
An Approach to Fault Diagnosis of Rotating Machinery Using the Second-Order Statistical Features of Thermal Images and Simplified Fuzzy ARTMAP
4
作者 Faisal Al Thobiani Van Tung Tran Tiedo Tinga 《Engineering(科研)》 2017年第6期524-539,共16页
Thermal image, or thermogram, becomes a new type of signal for machine condition monitoring and fault diagnosis due to the capability to display real-time temperature distribution and possibility to indicate the mach... Thermal image, or thermogram, becomes a new type of signal for machine condition monitoring and fault diagnosis due to the capability to display real-time temperature distribution and possibility to indicate the machine’s operating condition through its temperature. In this paper, an investigation of using the second-order statistical features of thermogram in association with minimum redundancy maximum relevance (mRMR) feature selection and simplified fuzzy ARTMAP (SFAM) classification is conducted for rotating machinery fault diagnosis. The thermograms of different machine conditions are firstly preprocessed for improving the image contrast, removing noise, and cropping to obtain the regions of interest (ROIs). Then, an enhanced algorithm based on bi-dimensional empirical mode decomposition is implemented to further increase the quality of ROIs before the second-order statistical features are extracted from their gray-level co-occurrence matrix (GLCM). The highly relevant features to the machine condition are selected from the total feature set by mRMR and are fed into SFAM to accomplish the fault diagnosis. In order to verify this investigation, the thermograms acquired from different conditions of a fault simulator including normal, misalignment, faulty bearing, and mass unbalance are used. This investigation also provides a comparative study of SFAM and other traditional methods such as back-propagation and probabilistic neural networks. The results show that the second-order statistical features used in this framework can provide a plausible accuracy in fault diagnosis of rotating machinery. 展开更多
关键词 Thermal Images SECOND-ORDER statistical Features Gray-Level co-occurrence Matrix Minimum Redundancy Maximum RELEVANCE Rotating Machinery Fault Diagnosis Simplified Fuzzy ARTMAP
下载PDF
上一页 1 下一页 到第
使用帮助 返回顶部