In the last two decades of the 20th century, there has been an increasing interest in and emphasis on the study of the Hong Kong literature in both the academic and general public in Hong Kong. Recognizing the emergen...In the last two decades of the 20th century, there has been an increasing interest in and emphasis on the study of the Hong Kong literature in both the academic and general public in Hong Kong. Recognizing the emergent need of the resources on Hong Kong literature, the University Library System of the Chinese University of Hong Kong set up the Hong Kong Literature Database (the “Database”), which was the first Chinese literature database in the Internet in 2000. The paper will examine how the database is constructed using XML technology andometadata schema, The database also employs Unicode UTF-8 as the internal code. A mapping table for traditional and simplified Chinese characters was created based on Unihan and is used behind the scene so that a user can either input traditional or simplified Chinese characters and retrieval will give both traditional and simplified Chinese characters. Currently 65% of journals use OCR technology so that full-text searching is possible. The Chinese OCR technology will be examined in greater detail. Special features of the Database such as, page-by-page browse mode, position-highlight for full-page newspaper, linking Table-Of-Contents and book jackets from the Library catalogue, etc. are described. The paper will also bring out the problem of massive downloading and compare the state-of-the-art technology and their shortcomings. This paper shows how the Hong Kong Literature Database facilitates future collaboration and data exchange by using open standard, shareable structure and the latest technology.展开更多
Considerable interest in hydrogen bonding involving chalcogen has been growing since the IUPAC committee has redefined hydrogen bonding. Not only the focus is on unconventional acceptors, but also on donors not discus...Considerable interest in hydrogen bonding involving chalcogen has been growing since the IUPAC committee has redefined hydrogen bonding. Not only the focus is on unconventional acceptors, but also on donors not discussed before. It has been mentioned in previous studies that the proton of the H-C group could be involved in hydrogen bonding, but with conventional acceptors. In this study, we explored the ability of hydrogen bond formation of Se, S and Te acceptors with the H-C donor using Cambridge Structural Database in conjunction with Ab Initio calculations. In the CSD, there are respectively 256, 6249 and 11 R1,R2,-C=Se, R1,R2,-C=S and R1,R2,-C=Te structures that form hydrogen bonds, in which the N,N groups are majority. Except for C=S acceptor which can form a hydrogen bond with its C, C group, both C=Se and C=Te acceptors could form a hydrogen bond only with N,C and N,N groups. CSD analysis shows very similar d (norm) around -0.04 Å, while DFT-calculated interaction for N,C and N,N groups are also similar. Both interaction distances derived from CSD analysis and DFT-calculated interaction energies demonstrate that the acceptors form stable complexes with H-CF3. Besides hydrogen bonds, dispersion interactions are forces stabilizing the complexes since their contribution can reach 50%. Analysis of intra-molecular geometries and Ab Initio partial charges show that this bonding stems from resonance induced C<sup>δ+</sup>=X<sup>δ-</sup> dipoles. In many respects, both C=Se, C=S and C=Te are similar to C=S, with similar d (norm) and calculated interaction strengths.展开更多
The integration of remote sensing (RS) with geographical information system (GIS) is a hotspot in geographical information science.A good database structure is important to the integration of RS with GIS,which should ...The integration of remote sensing (RS) with geographical information system (GIS) is a hotspot in geographical information science.A good database structure is important to the integration of RS with GIS,which should be beneficial to the complete integration of RS with GIS,able to deal with the disagreement between the resolution of remote sensing images and the precision of GIS data,and also helpful to the knowledge discovery and exploitation.In this paper,the database structure storing the spatial data based on semantic network is presented.This database structure has several advantages.Firstly,the spatial data is stored as raster data with space index,so the image processing can be done directly on the GIS data that is stored hierarchically according to the distinguishing precision.Secondly,the simple objects are aggregated into complex ones.Thirdly,because we use the indexing tree to depict the relationship of aggregation and the indexing pictures expressed by 2_D strings to describe the topology structure of the objects,the concepts of surrounding and region are expressed clearly and the semantic content of the landscape can be illustrated well.All the factors that affect the recognition of the objects are depicted in the factor space,which provides a uniform mathematical frame for the fusion of the semantic and non_semantic information.Lastly,the object node,knowledge node and the indexing node are integrated into one node.This feature enhances the ability of system in knowledge expressing,intelligent inference and association.The application shows that this database structure can benefit the interpretation of remote sensing image with the information of GIS.展开更多
The typical characteristic of the topology of Bayesian networks (BNs) is the interdependence among different nodes (variables), which makes it impossible to optimize one variable independently of others, and the learn...The typical characteristic of the topology of Bayesian networks (BNs) is the interdependence among different nodes (variables), which makes it impossible to optimize one variable independently of others, and the learning of BNs structures by general genetic algorithms is liable to converge to local extremum. To resolve efficiently this problem, a self-organizing genetic algorithm (SGA) based method for constructing BNs from databases is presented. This method makes use of a self-organizing mechanism to develop a genetic algorithm that extended the crossover operator from one to two, providing mutual competition between them, even adjusting the numbers of parents in recombination (crossover/recomposition) schemes. With the K2 algorithm, this method also optimizes the genetic operators, and utilizes adequately the domain knowledge. As a result, with this method it is able to find a global optimum of the topology of BNs, avoiding premature convergence to local extremum. The experimental results proved to be and the convergence of the SGA was discussed.展开更多
Hirshfeld surface analysis has been widely used in recent years as a means to quantify and visualize various types of intermolecular interactions in molecular crystals. This review article introduces intermolecular in...Hirshfeld surface analysis has been widely used in recent years as a means to quantify and visualize various types of intermolecular interactions in molecular crystals. This review article introduces intermolecular interactions discussed with Hirshfeld surface analysis and 2D fingerprint plots. In addition, using CIF files obtained from our previous results, Hirshfeld surface analysis was newly performed, and the resulting 3DHirshfeld surfaces, 2D print plots, molecular structural features, and crystal structure relationships were described. Classification of their intermolecular interactions, statistical discussion focused on crystalline water and perspective on ligand-protein docking are also mentioned.展开更多
In soft sensor field, just-in-time learning(JITL) is an effective approach to model nonlinear and time varying processes. However, most similarity criterions in JITL are computed in the input space only while ignoring...In soft sensor field, just-in-time learning(JITL) is an effective approach to model nonlinear and time varying processes. However, most similarity criterions in JITL are computed in the input space only while ignoring important output information, which may lead to inaccurate construction of relevant sample set. To solve this problem, we propose a novel supervised feature extraction method suitable for the regression problem called supervised local and non-local structure preserving projections(SLNSPP), in which both input and output information can be easily and effectively incorporated through a newly defined similarity index. The SLNSPP can not only retain the virtue of locality preserving projections but also prevent faraway points from nearing after projection,which endues SLNSPP with powerful discriminating ability. Such two good properties of SLNSPP are desirable for JITL as they are expected to enhance the accuracy of similar sample selection. Consequently, we present a SLNSPP-JITL framework for developing adaptive soft sensor, including a sparse learning strategy to limit the scale and update the frequency of database. Finally, two case studies are conducted with benchmark datasets to evaluate the performance of the proposed schemes. The results demonstrate the effectiveness of LNSPP and SLNSPP.展开更多
In accordance with previous reports, the sequences related to phosporylated protein segments occur in conserved variable domains of immunoglobulins including first of all certain N-terminally located segments. Consequ...In accordance with previous reports, the sequences related to phosporylated protein segments occur in conserved variable domains of immunoglobulins including first of all certain N-terminally located segments. Consequently, we look here for the sequences 1) composing human and mouse proteins different from antigen receptors, 2) identical with or highly similar to nucleotide sequence representatives of conserved variable immunoglobulin segments and 3) identical with or closely related to phosphorylation sites. More precisely, we searched for the corresponding actual pairs of DNA and protein sequence segments using five-step bilingual approach employing among others a) different types of BLAST searches, b) two in-principle-different machine-learning methods predicting phosphorylated sites and c) two large databases recording existing phosphorylation sites. The approach identified seven existing phosphorylation sites and thirty-seven related human and mouse segments achieving limits for several predictions or phylogenic parameters. Mostly serines phosporylated with ataxia-telangiectasia-related kinase (involved in regulation of DNA-double-strand-break repair) were indicated or predicted in this study. Hypermutation motifs, located in effective positions of the selected sequence segments, occurred significantly less frequently in transcribed than non-transcribed DNA strands suggesting thus the incidence of mutation events. In addition, marked differences between the numbers and proportions of human and mouse cancer-related sequence items were found in different steps of selection process. The possible role of hypermutation changes within the selected segments and the observed structural relationships are discussed here with respect to DNA damage, carcinogenesis, cancer vaccination, ageing and evolution. Taken together, our data represent additional and sometimes perhaps complementary information to the existing databases of empirically proven phosphorylation sites or pathogenically important spots.展开更多
Logic flaws within web applications will allow malicious operations to be triggered towards back-end database. Existing approaches to identifying logic flaws of database accesses are strongly tied to structured query ...Logic flaws within web applications will allow malicious operations to be triggered towards back-end database. Existing approaches to identifying logic flaws of database accesses are strongly tied to structured query language (SQL) statement construction and cannot be applied to the new generation of web applications that use not only structured query language (NoSQL) databases as the storage tier. In this paper, we present Lom, a black-box approach for discovering many categories of logic flaws within MongoDB- based web applications. Our approach introduces a MongoDB operation model to support new features of MongoDB and models the application logic as a mealy finite state machine. During the testing phase, test inputs which emulate state violation attacks are constructed for identifying logic flaws at each application state. We apply Lom to several MongoDB-based web applications and demonstrate its effectiveness.展开更多
文摘In the last two decades of the 20th century, there has been an increasing interest in and emphasis on the study of the Hong Kong literature in both the academic and general public in Hong Kong. Recognizing the emergent need of the resources on Hong Kong literature, the University Library System of the Chinese University of Hong Kong set up the Hong Kong Literature Database (the “Database”), which was the first Chinese literature database in the Internet in 2000. The paper will examine how the database is constructed using XML technology andometadata schema, The database also employs Unicode UTF-8 as the internal code. A mapping table for traditional and simplified Chinese characters was created based on Unihan and is used behind the scene so that a user can either input traditional or simplified Chinese characters and retrieval will give both traditional and simplified Chinese characters. Currently 65% of journals use OCR technology so that full-text searching is possible. The Chinese OCR technology will be examined in greater detail. Special features of the Database such as, page-by-page browse mode, position-highlight for full-page newspaper, linking Table-Of-Contents and book jackets from the Library catalogue, etc. are described. The paper will also bring out the problem of massive downloading and compare the state-of-the-art technology and their shortcomings. This paper shows how the Hong Kong Literature Database facilitates future collaboration and data exchange by using open standard, shareable structure and the latest technology.
文摘Considerable interest in hydrogen bonding involving chalcogen has been growing since the IUPAC committee has redefined hydrogen bonding. Not only the focus is on unconventional acceptors, but also on donors not discussed before. It has been mentioned in previous studies that the proton of the H-C group could be involved in hydrogen bonding, but with conventional acceptors. In this study, we explored the ability of hydrogen bond formation of Se, S and Te acceptors with the H-C donor using Cambridge Structural Database in conjunction with Ab Initio calculations. In the CSD, there are respectively 256, 6249 and 11 R1,R2,-C=Se, R1,R2,-C=S and R1,R2,-C=Te structures that form hydrogen bonds, in which the N,N groups are majority. Except for C=S acceptor which can form a hydrogen bond with its C, C group, both C=Se and C=Te acceptors could form a hydrogen bond only with N,C and N,N groups. CSD analysis shows very similar d (norm) around -0.04 Å, while DFT-calculated interaction for N,C and N,N groups are also similar. Both interaction distances derived from CSD analysis and DFT-calculated interaction energies demonstrate that the acceptors form stable complexes with H-CF3. Besides hydrogen bonds, dispersion interactions are forces stabilizing the complexes since their contribution can reach 50%. Analysis of intra-molecular geometries and Ab Initio partial charges show that this bonding stems from resonance induced C<sup>δ+</sup>=X<sup>δ-</sup> dipoles. In many respects, both C=Se, C=S and C=Te are similar to C=S, with similar d (norm) and calculated interaction strengths.
文摘The integration of remote sensing (RS) with geographical information system (GIS) is a hotspot in geographical information science.A good database structure is important to the integration of RS with GIS,which should be beneficial to the complete integration of RS with GIS,able to deal with the disagreement between the resolution of remote sensing images and the precision of GIS data,and also helpful to the knowledge discovery and exploitation.In this paper,the database structure storing the spatial data based on semantic network is presented.This database structure has several advantages.Firstly,the spatial data is stored as raster data with space index,so the image processing can be done directly on the GIS data that is stored hierarchically according to the distinguishing precision.Secondly,the simple objects are aggregated into complex ones.Thirdly,because we use the indexing tree to depict the relationship of aggregation and the indexing pictures expressed by 2_D strings to describe the topology structure of the objects,the concepts of surrounding and region are expressed clearly and the semantic content of the landscape can be illustrated well.All the factors that affect the recognition of the objects are depicted in the factor space,which provides a uniform mathematical frame for the fusion of the semantic and non_semantic information.Lastly,the object node,knowledge node and the indexing node are integrated into one node.This feature enhances the ability of system in knowledge expressing,intelligent inference and association.The application shows that this database structure can benefit the interpretation of remote sensing image with the information of GIS.
文摘The typical characteristic of the topology of Bayesian networks (BNs) is the interdependence among different nodes (variables), which makes it impossible to optimize one variable independently of others, and the learning of BNs structures by general genetic algorithms is liable to converge to local extremum. To resolve efficiently this problem, a self-organizing genetic algorithm (SGA) based method for constructing BNs from databases is presented. This method makes use of a self-organizing mechanism to develop a genetic algorithm that extended the crossover operator from one to two, providing mutual competition between them, even adjusting the numbers of parents in recombination (crossover/recomposition) schemes. With the K2 algorithm, this method also optimizes the genetic operators, and utilizes adequately the domain knowledge. As a result, with this method it is able to find a global optimum of the topology of BNs, avoiding premature convergence to local extremum. The experimental results proved to be and the convergence of the SGA was discussed.
文摘Hirshfeld surface analysis has been widely used in recent years as a means to quantify and visualize various types of intermolecular interactions in molecular crystals. This review article introduces intermolecular interactions discussed with Hirshfeld surface analysis and 2D fingerprint plots. In addition, using CIF files obtained from our previous results, Hirshfeld surface analysis was newly performed, and the resulting 3DHirshfeld surfaces, 2D print plots, molecular structural features, and crystal structure relationships were described. Classification of their intermolecular interactions, statistical discussion focused on crystalline water and perspective on ligand-protein docking are also mentioned.
基金Supported by the National Natural Science Foundation of China(61273160)the Fundamental Research Funds for the Central Universities(14CX06067A,13CX05021A)
文摘In soft sensor field, just-in-time learning(JITL) is an effective approach to model nonlinear and time varying processes. However, most similarity criterions in JITL are computed in the input space only while ignoring important output information, which may lead to inaccurate construction of relevant sample set. To solve this problem, we propose a novel supervised feature extraction method suitable for the regression problem called supervised local and non-local structure preserving projections(SLNSPP), in which both input and output information can be easily and effectively incorporated through a newly defined similarity index. The SLNSPP can not only retain the virtue of locality preserving projections but also prevent faraway points from nearing after projection,which endues SLNSPP with powerful discriminating ability. Such two good properties of SLNSPP are desirable for JITL as they are expected to enhance the accuracy of similar sample selection. Consequently, we present a SLNSPP-JITL framework for developing adaptive soft sensor, including a sparse learning strategy to limit the scale and update the frequency of database. Finally, two case studies are conducted with benchmark datasets to evaluate the performance of the proposed schemes. The results demonstrate the effectiveness of LNSPP and SLNSPP.
文摘In accordance with previous reports, the sequences related to phosporylated protein segments occur in conserved variable domains of immunoglobulins including first of all certain N-terminally located segments. Consequently, we look here for the sequences 1) composing human and mouse proteins different from antigen receptors, 2) identical with or highly similar to nucleotide sequence representatives of conserved variable immunoglobulin segments and 3) identical with or closely related to phosphorylation sites. More precisely, we searched for the corresponding actual pairs of DNA and protein sequence segments using five-step bilingual approach employing among others a) different types of BLAST searches, b) two in-principle-different machine-learning methods predicting phosphorylated sites and c) two large databases recording existing phosphorylation sites. The approach identified seven existing phosphorylation sites and thirty-seven related human and mouse segments achieving limits for several predictions or phylogenic parameters. Mostly serines phosporylated with ataxia-telangiectasia-related kinase (involved in regulation of DNA-double-strand-break repair) were indicated or predicted in this study. Hypermutation motifs, located in effective positions of the selected sequence segments, occurred significantly less frequently in transcribed than non-transcribed DNA strands suggesting thus the incidence of mutation events. In addition, marked differences between the numbers and proportions of human and mouse cancer-related sequence items were found in different steps of selection process. The possible role of hypermutation changes within the selected segments and the observed structural relationships are discussed here with respect to DNA damage, carcinogenesis, cancer vaccination, ageing and evolution. Taken together, our data represent additional and sometimes perhaps complementary information to the existing databases of empirically proven phosphorylation sites or pathogenically important spots.
基金supported by China Scholarship Council,Tianjin Science and Technology Committee(No.12JCZDJC20800)Science and Technology Planning Project of Tianjin(No.13ZCZDGX01098)+2 种基金NSF TRUST(The Team for Research in Ubiquitous Secure Technology)Science and Technology Center(No.CCF-0424422)National High Technology Research and Development Program of Chia(863Program)(No.2013BAH01B05)National Natural Science Foundation of China(No.61402264)
文摘Logic flaws within web applications will allow malicious operations to be triggered towards back-end database. Existing approaches to identifying logic flaws of database accesses are strongly tied to structured query language (SQL) statement construction and cannot be applied to the new generation of web applications that use not only structured query language (NoSQL) databases as the storage tier. In this paper, we present Lom, a black-box approach for discovering many categories of logic flaws within MongoDB- based web applications. Our approach introduces a MongoDB operation model to support new features of MongoDB and models the application logic as a mealy finite state machine. During the testing phase, test inputs which emulate state violation attacks are constructed for identifying logic flaws at each application state. We apply Lom to several MongoDB-based web applications and demonstrate its effectiveness.