Pattern discovery from time series is of fundamental importance. Most of the algorithms of pattern discovery in time series capture the values of time series based on some kinds of similarity measures. Affected by the...Pattern discovery from time series is of fundamental importance. Most of the algorithms of pattern discovery in time series capture the values of time series based on some kinds of similarity measures. Affected by the scale and baseline, value-based methods bring about problem when the objective is to capture the shape. Thus, a similarity measure based on shape, Sh measure, is originally proposed, andthe properties of this similarity and corresponding proofs are given. Then a time series shape pattern discovery algorithm based on Sh measure is put forward. The proposed algorithm is terminated in finite iteration with given computational and storage complexity. Finally the experiments on synthetic datasets and sunspot datasets demonstrate that the time series shape pattern algorithm is valid.展开更多
Protein design has become a powerful method to expand the number of natural proteins and design customized proteins according to demands.Domain-based protein design spares the need to create novel elements from scratc...Protein design has become a powerful method to expand the number of natural proteins and design customized proteins according to demands.Domain-based protein design spares the need to create novel elements from scratch,which makes it a more efficient strategy than scratch-based protein design in designing multi-domain proteins,protein complexes and biomaterials.As the surface shape plays a central role in domain-domain and protein-protein interactions,a global map of the surface shapes of all domains should be very beneficial for domain-based protein design.Therefore,in this study,we characterized the surface shapes of protein domains,collected from CATH and SCOP databases,with their 3D-Zernike descriptors(3DZDs).Then similarities of domain shape features were identified,and all domains were classified accordingly.The preferences of the combinations of domains between different clusters were analyzed in natural proteins from the Protein Data Bank.A user-friendly website,termed CPD3DS,was also developed for storage,retrieval,analyses and visualization of our results.This work not only provides an overall view of protein domain shapes by showing their variety and similarities,but also opens up a new avenue to understand the properties of protein structural domains,and design principles of protein architectures.展开更多
文摘Pattern discovery from time series is of fundamental importance. Most of the algorithms of pattern discovery in time series capture the values of time series based on some kinds of similarity measures. Affected by the scale and baseline, value-based methods bring about problem when the objective is to capture the shape. Thus, a similarity measure based on shape, Sh measure, is originally proposed, andthe properties of this similarity and corresponding proofs are given. Then a time series shape pattern discovery algorithm based on Sh measure is put forward. The proposed algorithm is terminated in finite iteration with given computational and storage complexity. Finally the experiments on synthetic datasets and sunspot datasets demonstrate that the time series shape pattern algorithm is valid.
基金supported by the National Natural Science Foundation of China(No.31971176 and 31800616)the Fundamental Research Funds for the Central Universities(No.A03018023601045).
文摘Protein design has become a powerful method to expand the number of natural proteins and design customized proteins according to demands.Domain-based protein design spares the need to create novel elements from scratch,which makes it a more efficient strategy than scratch-based protein design in designing multi-domain proteins,protein complexes and biomaterials.As the surface shape plays a central role in domain-domain and protein-protein interactions,a global map of the surface shapes of all domains should be very beneficial for domain-based protein design.Therefore,in this study,we characterized the surface shapes of protein domains,collected from CATH and SCOP databases,with their 3D-Zernike descriptors(3DZDs).Then similarities of domain shape features were identified,and all domains were classified accordingly.The preferences of the combinations of domains between different clusters were analyzed in natural proteins from the Protein Data Bank.A user-friendly website,termed CPD3DS,was also developed for storage,retrieval,analyses and visualization of our results.This work not only provides an overall view of protein domain shapes by showing their variety and similarities,but also opens up a new avenue to understand the properties of protein structural domains,and design principles of protein architectures.