Cloud storage is essential for managing user data to store and retrieve from the distributed data centre.The storage service is distributed as pay a service for accessing the size to collect the data.Due to the massiv...Cloud storage is essential for managing user data to store and retrieve from the distributed data centre.The storage service is distributed as pay a service for accessing the size to collect the data.Due to the massive amount of data stored in the data centre containing similar information and file structures remaining in multi-copy,duplication leads to increase storage space.The potential deduplication system doesn’t make efficient data reduction because of inaccuracy in finding similar data analysis.It creates a complex nature to increase the storage consumption under cost.To resolve this problem,this paper proposes an efficient storage reduction called Hash-Indexing Block-based Deduplication(HIBD)based on Segmented Bind Linkage(SBL)Methods for reducing storage in a cloud environment.Initially,preprocessing is done using the sparse augmentation technique.Further,the preprocessed files are segmented into blocks to make Hash-Index.The block of the contents is compared with other files through Semantic Content Source Deduplication(SCSD),which identifies the similar content presence between the file.Based on the content presence count,the Distance Vector Weightage Correlation(DVWC)estimates the document similarity weight,and related files are grouped into a cluster.Finally,the segmented bind linkage compares the document to find duplicate content in the cluster using similarity weight based on the coefficient match case.This implementation helps identify the data redundancy efficiently and reduces the service cost in distributed cloud storage.展开更多
Digitization has created an abundance of new information sources by altering how pictures are captured.Accessing large image databases from a web portal requires an opted indexing structure instead of reducing the con...Digitization has created an abundance of new information sources by altering how pictures are captured.Accessing large image databases from a web portal requires an opted indexing structure instead of reducing the contents of different kinds of databases for quick processing.This approach paves a path toward the increase of efficient image retrieval techniques and numerous research in image indexing involving large image datasets.Image retrieval usually encounters difficulties like a)merging the diverse representations of images and their Indexing,b)the low-level visual characters and semantic characters associated with an image are indirectly proportional,and c)noisy and less accurate extraction of image information(semantic and predicted attributes).This work clearly focuses and takes the base of reverse engineering and de-normalizing concept by evaluating how data can be stored effectively.Thus,retrieval becomes straightforward and rapid.This research also deals with deep root indexing with a multidimensional approach about how images can be indexed and provides improved results in terms of good performance in query processing and the reduction of maintenance and storage cost.We focus on the schema design on a non-clustered index solution,especially cover queries.This schema provides a filter predication to make an index with a particular content of rows and an index table called filtered indexing.Finally,we include non-key columns in addition to the key columns.Experiments on two image data sets‘with and without’filtered indexing show low query cost.We compare efficiency as regards accuracy in mean average precision to measure the accuracy of retrieval with the developed coherent semantic indexing.The results show that retrieval by using deep root indexing is simple and fast.展开更多
In a very large digital library that support computer aided collaborative design, an indexing process is crucial whenever the retrieval process has to select among many possible designs. In this paper, we address the ...In a very large digital library that support computer aided collaborative design, an indexing process is crucial whenever the retrieval process has to select among many possible designs. In this paper, we address the problem of retrieving important design and engineering information by structural indexing. A design is represented by a model dependency graph, therefor, the indexing problem is to determine whether a graph is present or absent in a database of model dependency graphs. we present a novel graph indexing method using polynomial characterization of a model dependency graph and on hashing. Such an approach is able to create an high efficient 3D solid digital library for retrieving and extracting solid geometric model and engineering information.展开更多
A new type of parallel indexing cam mechanism is discovered.The composition of the cam profile is proposed and the formulas of the mechanism parameters are established by synthesis of its configuration and size.The eq...A new type of parallel indexing cam mechanism is discovered.The composition of the cam profile is proposed and the formulas of the mechanism parameters are established by synthesis of its configuration and size.The equations for working pro- file and pressure angle of the cam are derived based on the inverse model and the derivative of pitch curve of cams.An example is given and demonstrates that it is available to design the new mechanism using the derived expressions.展开更多
The dynamic responses of roller gear indexing cam mechanism are investigated .With applying Lagarange equation and Gear method,motion equations of this mechanism including clearance,motor characteristic,torsion flexib...The dynamic responses of roller gear indexing cam mechanism are investigated .With applying Lagarange equation and Gear method,motion equations of this mechanism including clearance,motor characteristic,torsion flexibility are developed and solved.The results show that clearance affects primarily the response on turret,and has little effects on the responses on rotary table.At the same time,the velocity fluctuation of motor shaft is not serious for the existence of inertia of reducer,and the high frequency of velocity fluctuation of camshaft is related with the torsion stiffness of shaft and the clearance between pairs.展开更多
There is an international cricket governing body that ranks the expertise of all the cricket playing nations,known as the International Cricket Council(ICC).The ranking system followed by the ICC relies on the winning...There is an international cricket governing body that ranks the expertise of all the cricket playing nations,known as the International Cricket Council(ICC).The ranking system followed by the ICC relies on the winnings and defeats of the teams.The model used by the ICC to implement rankings is deficient in certain key respects.It ignores key factors like winning margin and strength of the opposition.Various measures of the ranking concept are presented in this research.The proposed methods adopt the concepts of h-Index and PageRank for presenting more comprehensive ranking metrics.The proposed approaches not only rank the teams on their losing/winning stats but also take into consideration the margin of winning and the quality of the opposition.Three cricket team ranking techniques are presented i.e.,(1)Cricket Team-Index(ct-index),(2)Cricket Team Rank(CTR)and(3)Weighted Cricket Team Rank(WCTR).The proposed metrics are validated through the collection of cricket dataset,extracted from Cricinfo,having instances for all the three formats of the game i.e.,T20 International(T20i),One Day International(ODI)and Test matches.The comparative analysis between the proposed and existing techniques,for all the three formats,is presented as well.展开更多
In the world of science, recognition of scientific performance is strongly correlated with publication visibility and interest generated among other researchers, which is evident by downloads and citations. A publishe...In the world of science, recognition of scientific performance is strongly correlated with publication visibility and interest generated among other researchers, which is evident by downloads and citations. A published paper’s number of downloads and citations are the best indices of its importance and are useful measures of the researchers’ performance. However, the published paper should be valuated and indexed independently, and the prestige of the journal in which it is published should not influence the value of the paper itself. By participating in and presenting at congresses and international meetings, scientists strongly increase the visibility of their results and recognition of their research;this also promotes their publications. Status in Research Gate (RG), the so-called RG Score, the Percentile, and the h-index give researchers feedback about their performance, or their place and prestige within the scientific community. RG has become an excellent tool for disseminating scientific results and connecting researchers worldwide. RG also allows researchers to present achievements other than publications (e.g., membership in recognized associations such as the American Chemist Society, a biography in Marquis Who’s Who in the World, awards received, and/or ongoing projects). This paper discusses questions regarding how the RG Score, Percentile, and h-index are calculated, whether these methods are correct, and alternative criteria. RG also lists papers with falsified results and the journals that publish them. Thus, it may be appropriate to reduce the indices for such journals, authors, and the institutions with which these authors are affiliated.展开更多
Sound indexing and segmentation of digital documentsespecially in the internet and digital libraries are very useful tosimplify and to accelerate the multimedia document retrieval. Wecan imagine that we can extract mu...Sound indexing and segmentation of digital documentsespecially in the internet and digital libraries are very useful tosimplify and to accelerate the multimedia document retrieval. Wecan imagine that we can extract multimedia files not only bykeywords but also by speech semantic contents. The maindifficulty of this operation is the parameterization and modellingof the sound track and the discrimination of the speech, musicand noise segments. In this paper, we will present aSpeech/Music/Noise indexing interface designed for audiodiscrimination in multimedia documents. The program uses astatistical method based on ANN and HMM classifiers. After preemphasisand segmentation, the audio segments are analysed bythe cepstral acoustic analysis method. The developed system wasevaluated on a database constituted of music songs with Arabicspeech segments under several noisy environments.展开更多
With the advent of single cell technology and its rapid applications in the biomedical field,it becomes possible to identify,categorize and characterize each single cell in the brain.Shanghai Brain Bank is accumulatin...With the advent of single cell technology and its rapid applications in the biomedical field,it becomes possible to identify,categorize and characterize each single cell in the brain.Shanghai Brain Bank is accumulating donorbrains very fast(estimately 80 brains till now)from people with normal aging or various neurodegenerative diseases.The basal ganglia is critical for movement control,behavior,cognition,emotion and reward.Many neurodegenerative diseases are related to the abnormality or aging of human basal ganglia.Single nuclei transcriptomic profiles of both normal and diseased human basal ganglia are still sparse.In our study,we systematically examined different nuclei(caudate,putamen,globus pallidus,and substantia nigra)in human basal ganglia from 6 control and 5 diseased donors(2 psychiatric diseases,2 PD and 1 AD)using combinatory indexing single nuclei RNA sequencing.Gene expression was measured on a total of 22309 control and 42590 diseased single cells.展开更多
Asian Agricultural Research(ISSN 1943-9903),founded in 2009,is a monthly comprehensive agricultural academic journal pub-lished and approved by the Library of Congress of the United States of America.·CNKI(China ...Asian Agricultural Research(ISSN 1943-9903),founded in 2009,is a monthly comprehensive agricultural academic journal pub-lished and approved by the Library of Congress of the United States of America.·CNKI(China National Knowledge Infrastructure),China·CSA Illumina(Cambridge Scientific Abstracts),USA·EBSCO database.展开更多
Graph similarity join has become imperative for integrating noisy and inconsistent data from multiple data sources. The edit distance is commonly used to measure the similarity between graphs. To accelerate the simila...Graph similarity join has become imperative for integrating noisy and inconsistent data from multiple data sources. The edit distance is commonly used to measure the similarity between graphs. To accelerate the similarity join based on graph edit distance, in the paper, we make use of a preprocessing strategy to remove the mismatching graph pairs with significant differences. Then a novel method of building indexes for each graph is proposed by grouping the nodes which can be reached in k hops for each key node with structure conservation, which is the k-hop-tree based indexing method. Experiments on real and synthetic graph databases also confirm that our method can achieve good join quality in graph similarity join. Besides, the join process can be finished in polynomial time.展开更多
recently the indexed modulation(IM) technique in conjunction with the multi-carrier modulation gains an increasing attention. It conveys additional information on the subcarrier indices by activating specific subcarri...recently the indexed modulation(IM) technique in conjunction with the multi-carrier modulation gains an increasing attention. It conveys additional information on the subcarrier indices by activating specific subcarriers in the frequency domain besides the conventional amplitude-phase modulation of the activated subcarriers. Orthogonal frequency division multiplexing(OFDM) with IM(OFDM-IM) is deeply compared with the classical OFDM. It leads to an attractive trade-off between the spectral efficiency(SE) and the energy efficiency(EE). In this paper, the concept of the combinatorial modulation is introduced from a new point of view. The sparsity mapping is suggested intentionally to enable the compressive sensing(CS) concept in the data recovery process to provide further performance and EE enhancement without SE loss. Generating artificial data sparsity in the frequency domain along with naturally embedded channel sparsity in the time domain allows joint data recovery and channel estimation in a double sparsity framework. Based on simulation results, the performance of the proposed approach agrees with the predicted CS superiority even under low signal-to-noise ratio without channel coding. Moreover, the proposed sparsely indexed modulation system outperforms the conventional OFDM system and the OFDM-IM system in terms of error performance, peak-to-average power ratio(PAPR) and energy efficiency under the same spectral efficiency.展开更多
Generation of baseline information about ambient air quality of any given region assumes significance, when the area is 1) an active mine site, 2) proposed to be mined out in future, and 3) industrialization in the ar...Generation of baseline information about ambient air quality of any given region assumes significance, when the area is 1) an active mine site, 2) proposed to be mined out in future, and 3) industrialization in the area is in fast pace. Ambient air quality monitoring (with respect to SPM, RPM, SO2, NOx and CO) was carried out in and around two mining complexes in western parts of Kachchh district in Gujarat to generate baseline air quality status of the area. This area has two major mine complexes and various large scale industrial projects (thermal power plants, cement plants and several ports and jetties) are also in pipeline. Ambient air sampling was carried out in eight locations within five km radial distance from two major mine sites, i.e. Panandhro and Mata-na-Madh, with four locations for each mine site. Air Quality Indexing was done for all the locations, since it is a simplest way for the prediction of ambient air quality status of any region with respect to industrial, residential and rural areas. Of the eight locations studied the air quality for six locations fell under fairly clean (Light Air Pollution, AQI 25-50) category, while the rest (rural areas in the region), had relatively better air quality and fell under clean (Clean Air, AQI 10-25) category.展开更多
In this paper, we propose a new method based on index to realize IR-style Chinese keyword search with ranking strategies in relational databases. This method creates an index by using the related information of tuple ...In this paper, we propose a new method based on index to realize IR-style Chinese keyword search with ranking strategies in relational databases. This method creates an index by using the related information of tuple words and presents a ranking strategy in terms of the nature of Chinese words. For a Chinese keyword query, the index is used to match query search words and the tuple words in index quickly, and to compute similarities between the query and tuples by the ranking strategy, and then the set of identifiers of candidate tuples is generated. Thus, we retrieve top-N results of the query using SQL selection statements and output the ranked answers according to the similarities. The experimental results show that our method is efficient and effective.展开更多
In recent years, a growing number of math contents are available on the Web. When conventional search engines deal with mathematical expressions, the two-dimen- sion-al structure of mathematical expressions is lost, w...In recent years, a growing number of math contents are available on the Web. When conventional search engines deal with mathematical expressions, the two-dimen- sion-al structure of mathematical expressions is lost, which results in a low performance of math retrieval. While the retrieval technology specifically designed for mathematical expressions is not mature currently. Aiming at these problems, an improved mathematical expression indexing and matching method was proposed through employing full text index method to deal with the two-dimensional structure of mathematical expressions. Firstly, through the fully consideration of LaTeX formulae’ characteristics, a feature representation method of mathematical expressions and a clustering method of feature keywords were put forward. Then, an improved inter-relevant successive trees index model was applied to the construction of the mathematical expression index, in which the cluster algorithm of mathematical expression features was employed to solve the problem of the quantity growth of the trees in processing large amount of formulae. Finally, the matching algorithms of mathematical expressions were given which provide four query modes called exact matching, compatible matching, sub-expression matching and fuzzy matching. In browser/server mode, 110027 formulae were used as experimental samples. The index file size was 29.02 Mb. The average time of retrieval was 1.092 seconds. The experimental result shows the effectiveness of the method.展开更多
The exponential growth of bioinformatics tools in recent years has posed challenges for scientists in selecting the most suitable one for their data analysis assignments.Therefore,to aid scientists in making informed ...The exponential growth of bioinformatics tools in recent years has posed challenges for scientists in selecting the most suitable one for their data analysis assignments.Therefore,to aid scientists in making informed choices,a community-based platform that indexes and rates bioinformatics tools is urgently needed.In this study,we introduce Bio Treasury(http://biotreasury.rjmart.cn),an integrated communitybased repository that provides an interactive platform for users and developers to share their experiences in various bioinformatics tools.Bio Treasury offers a comprehensive collection of well-indexed bioinformatics software,tools,and databases,totaling over 10,000 entries.In the past two years,we have continuously improved and maintained Bio Treasury,adding several exciting features,including creating structured homepages for every tool and user,a hierarchical category of bioinformatics tools and classifying tools using large language model(LLM).Bio Treasury streamlines the tool submission process with intelligent auto-completion.Additionally,Bio Treasury provides a wide range of social features,for example,enabling users to participate in interactive discussions,rate tools,build and share tool collections for the public.We believe Bio Treasury can be a valuable resource and knowledge-sharing platform for the biomedical community.It empowers researchers to effectively discover and evaluate bioinformatics tools,fostering collaboration and advancing bioinformatics research.展开更多
A large volume of Remote Sensing(RS)data has been generated with the deployment of satellite technologies.The data facilitate research in ecological monitoring,land management and desertification,etc.The characteristi...A large volume of Remote Sensing(RS)data has been generated with the deployment of satellite technologies.The data facilitate research in ecological monitoring,land management and desertification,etc.The characteristics of RS data(e.g.,enormous volume,large single-file size,and demanding requirement of fault tolerance)make the Hadoop Distributed File System(HDFS)an ideal choice for RS data storage as it is efficient,scalable,and equipped with a data replication mechanism for failure resilience.To use RS data,one of the most important techniques is geospatial indexing.However,the large data volume makes it time-consuming to efficiently construct and leverage.Considering that most modern geospatial data centres are equipped with HDFS-based big data processing infrastructures,deploying multiple geospatial indices becomes natural to optimise the efficacy.Moreover,because of the reliability introduced by high-quality hardware and the infrequently modified property of the RS data,the use of multi-indexing will not cause large overhead.Therefore,we design a framework called Multi-IndeXing-RS(MIX-RS)that unifies the multi-indexing mechanism on top of the HDFS with data replication enabled for both fault tolerance and geospatial indexing efficiency.Given the fault tolerance provided by the HDFS,RS data are structurally stored inside for faster geospatial indexing.Additionally,multi-indexing enhances efficiency.The proposed technique naturally sits on top of the HDFS to form a holistic framework without incurring severe overhead or sophisticated system implementation efforts.The MIX-RS framework is implemented and evaluated using real remote sensing data provided by the Chinese Academy of Sciences,demonstrating excellent geospatial indexing performance.展开更多
文摘Cloud storage is essential for managing user data to store and retrieve from the distributed data centre.The storage service is distributed as pay a service for accessing the size to collect the data.Due to the massive amount of data stored in the data centre containing similar information and file structures remaining in multi-copy,duplication leads to increase storage space.The potential deduplication system doesn’t make efficient data reduction because of inaccuracy in finding similar data analysis.It creates a complex nature to increase the storage consumption under cost.To resolve this problem,this paper proposes an efficient storage reduction called Hash-Indexing Block-based Deduplication(HIBD)based on Segmented Bind Linkage(SBL)Methods for reducing storage in a cloud environment.Initially,preprocessing is done using the sparse augmentation technique.Further,the preprocessed files are segmented into blocks to make Hash-Index.The block of the contents is compared with other files through Semantic Content Source Deduplication(SCSD),which identifies the similar content presence between the file.Based on the content presence count,the Distance Vector Weightage Correlation(DVWC)estimates the document similarity weight,and related files are grouped into a cluster.Finally,the segmented bind linkage compares the document to find duplicate content in the cluster using similarity weight based on the coefficient match case.This implementation helps identify the data redundancy efficiently and reduces the service cost in distributed cloud storage.
文摘Digitization has created an abundance of new information sources by altering how pictures are captured.Accessing large image databases from a web portal requires an opted indexing structure instead of reducing the contents of different kinds of databases for quick processing.This approach paves a path toward the increase of efficient image retrieval techniques and numerous research in image indexing involving large image datasets.Image retrieval usually encounters difficulties like a)merging the diverse representations of images and their Indexing,b)the low-level visual characters and semantic characters associated with an image are indirectly proportional,and c)noisy and less accurate extraction of image information(semantic and predicted attributes).This work clearly focuses and takes the base of reverse engineering and de-normalizing concept by evaluating how data can be stored effectively.Thus,retrieval becomes straightforward and rapid.This research also deals with deep root indexing with a multidimensional approach about how images can be indexed and provides improved results in terms of good performance in query processing and the reduction of maintenance and storage cost.We focus on the schema design on a non-clustered index solution,especially cover queries.This schema provides a filter predication to make an index with a particular content of rows and an index table called filtered indexing.Finally,we include non-key columns in addition to the key columns.Experiments on two image data sets‘with and without’filtered indexing show low query cost.We compare efficiency as regards accuracy in mean average precision to measure the accuracy of retrieval with the developed coherent semantic indexing.The results show that retrieval by using deep root indexing is simple and fast.
文摘In a very large digital library that support computer aided collaborative design, an indexing process is crucial whenever the retrieval process has to select among many possible designs. In this paper, we address the problem of retrieving important design and engineering information by structural indexing. A design is represented by a model dependency graph, therefor, the indexing problem is to determine whether a graph is present or absent in a database of model dependency graphs. we present a novel graph indexing method using polynomial characterization of a model dependency graph and on hashing. Such an approach is able to create an high efficient 3D solid digital library for retrieving and extracting solid geometric model and engineering information.
文摘A new type of parallel indexing cam mechanism is discovered.The composition of the cam profile is proposed and the formulas of the mechanism parameters are established by synthesis of its configuration and size.The equations for working pro- file and pressure angle of the cam are derived based on the inverse model and the derivative of pitch curve of cams.An example is given and demonstrates that it is available to design the new mechanism using the derived expressions.
文摘The dynamic responses of roller gear indexing cam mechanism are investigated .With applying Lagarange equation and Gear method,motion equations of this mechanism including clearance,motor characteristic,torsion flexibility are developed and solved.The results show that clearance affects primarily the response on turret,and has little effects on the responses on rotary table.At the same time,the velocity fluctuation of motor shaft is not serious for the existence of inertia of reducer,and the high frequency of velocity fluctuation of camshaft is related with the torsion stiffness of shaft and the clearance between pairs.
文摘There is an international cricket governing body that ranks the expertise of all the cricket playing nations,known as the International Cricket Council(ICC).The ranking system followed by the ICC relies on the winnings and defeats of the teams.The model used by the ICC to implement rankings is deficient in certain key respects.It ignores key factors like winning margin and strength of the opposition.Various measures of the ranking concept are presented in this research.The proposed methods adopt the concepts of h-Index and PageRank for presenting more comprehensive ranking metrics.The proposed approaches not only rank the teams on their losing/winning stats but also take into consideration the margin of winning and the quality of the opposition.Three cricket team ranking techniques are presented i.e.,(1)Cricket Team-Index(ct-index),(2)Cricket Team Rank(CTR)and(3)Weighted Cricket Team Rank(WCTR).The proposed metrics are validated through the collection of cricket dataset,extracted from Cricinfo,having instances for all the three formats of the game i.e.,T20 International(T20i),One Day International(ODI)and Test matches.The comparative analysis between the proposed and existing techniques,for all the three formats,is presented as well.
文摘In the world of science, recognition of scientific performance is strongly correlated with publication visibility and interest generated among other researchers, which is evident by downloads and citations. A published paper’s number of downloads and citations are the best indices of its importance and are useful measures of the researchers’ performance. However, the published paper should be valuated and indexed independently, and the prestige of the journal in which it is published should not influence the value of the paper itself. By participating in and presenting at congresses and international meetings, scientists strongly increase the visibility of their results and recognition of their research;this also promotes their publications. Status in Research Gate (RG), the so-called RG Score, the Percentile, and the h-index give researchers feedback about their performance, or their place and prestige within the scientific community. RG has become an excellent tool for disseminating scientific results and connecting researchers worldwide. RG also allows researchers to present achievements other than publications (e.g., membership in recognized associations such as the American Chemist Society, a biography in Marquis Who’s Who in the World, awards received, and/or ongoing projects). This paper discusses questions regarding how the RG Score, Percentile, and h-index are calculated, whether these methods are correct, and alternative criteria. RG also lists papers with falsified results and the journals that publish them. Thus, it may be appropriate to reduce the indices for such journals, authors, and the institutions with which these authors are affiliated.
文摘Sound indexing and segmentation of digital documentsespecially in the internet and digital libraries are very useful tosimplify and to accelerate the multimedia document retrieval. Wecan imagine that we can extract multimedia files not only bykeywords but also by speech semantic contents. The maindifficulty of this operation is the parameterization and modellingof the sound track and the discrimination of the speech, musicand noise segments. In this paper, we will present aSpeech/Music/Noise indexing interface designed for audiodiscrimination in multimedia documents. The program uses astatistical method based on ANN and HMM classifiers. After preemphasisand segmentation, the audio segments are analysed bythe cepstral acoustic analysis method. The developed system wasevaluated on a database constituted of music songs with Arabicspeech segments under several noisy environments.
文摘With the advent of single cell technology and its rapid applications in the biomedical field,it becomes possible to identify,categorize and characterize each single cell in the brain.Shanghai Brain Bank is accumulating donorbrains very fast(estimately 80 brains till now)from people with normal aging or various neurodegenerative diseases.The basal ganglia is critical for movement control,behavior,cognition,emotion and reward.Many neurodegenerative diseases are related to the abnormality or aging of human basal ganglia.Single nuclei transcriptomic profiles of both normal and diseased human basal ganglia are still sparse.In our study,we systematically examined different nuclei(caudate,putamen,globus pallidus,and substantia nigra)in human basal ganglia from 6 control and 5 diseased donors(2 psychiatric diseases,2 PD and 1 AD)using combinatory indexing single nuclei RNA sequencing.Gene expression was measured on a total of 22309 control and 42590 diseased single cells.
文摘Asian Agricultural Research(ISSN 1943-9903),founded in 2009,is a monthly comprehensive agricultural academic journal pub-lished and approved by the Library of Congress of the United States of America.·CNKI(China National Knowledge Infrastructure),China·CSA Illumina(Cambridge Scientific Abstracts),USA·EBSCO database.
文摘Graph similarity join has become imperative for integrating noisy and inconsistent data from multiple data sources. The edit distance is commonly used to measure the similarity between graphs. To accelerate the similarity join based on graph edit distance, in the paper, we make use of a preprocessing strategy to remove the mismatching graph pairs with significant differences. Then a novel method of building indexes for each graph is proposed by grouping the nodes which can be reached in k hops for each key node with structure conservation, which is the k-hop-tree based indexing method. Experiments on real and synthetic graph databases also confirm that our method can achieve good join quality in graph similarity join. Besides, the join process can be finished in polynomial time.
文摘recently the indexed modulation(IM) technique in conjunction with the multi-carrier modulation gains an increasing attention. It conveys additional information on the subcarrier indices by activating specific subcarriers in the frequency domain besides the conventional amplitude-phase modulation of the activated subcarriers. Orthogonal frequency division multiplexing(OFDM) with IM(OFDM-IM) is deeply compared with the classical OFDM. It leads to an attractive trade-off between the spectral efficiency(SE) and the energy efficiency(EE). In this paper, the concept of the combinatorial modulation is introduced from a new point of view. The sparsity mapping is suggested intentionally to enable the compressive sensing(CS) concept in the data recovery process to provide further performance and EE enhancement without SE loss. Generating artificial data sparsity in the frequency domain along with naturally embedded channel sparsity in the time domain allows joint data recovery and channel estimation in a double sparsity framework. Based on simulation results, the performance of the proposed approach agrees with the predicted CS superiority even under low signal-to-noise ratio without channel coding. Moreover, the proposed sparsely indexed modulation system outperforms the conventional OFDM system and the OFDM-IM system in terms of error performance, peak-to-average power ratio(PAPR) and energy efficiency under the same spectral efficiency.
文摘Generation of baseline information about ambient air quality of any given region assumes significance, when the area is 1) an active mine site, 2) proposed to be mined out in future, and 3) industrialization in the area is in fast pace. Ambient air quality monitoring (with respect to SPM, RPM, SO2, NOx and CO) was carried out in and around two mining complexes in western parts of Kachchh district in Gujarat to generate baseline air quality status of the area. This area has two major mine complexes and various large scale industrial projects (thermal power plants, cement plants and several ports and jetties) are also in pipeline. Ambient air sampling was carried out in eight locations within five km radial distance from two major mine sites, i.e. Panandhro and Mata-na-Madh, with four locations for each mine site. Air Quality Indexing was done for all the locations, since it is a simplest way for the prediction of ambient air quality status of any region with respect to industrial, residential and rural areas. Of the eight locations studied the air quality for six locations fell under fairly clean (Light Air Pollution, AQI 25-50) category, while the rest (rural areas in the region), had relatively better air quality and fell under clean (Clean Air, AQI 10-25) category.
文摘In this paper, we propose a new method based on index to realize IR-style Chinese keyword search with ranking strategies in relational databases. This method creates an index by using the related information of tuple words and presents a ranking strategy in terms of the nature of Chinese words. For a Chinese keyword query, the index is used to match query search words and the tuple words in index quickly, and to compute similarities between the query and tuples by the ranking strategy, and then the set of identifiers of candidate tuples is generated. Thus, we retrieve top-N results of the query using SQL selection statements and output the ranked answers according to the similarities. The experimental results show that our method is efficient and effective.
文摘In recent years, a growing number of math contents are available on the Web. When conventional search engines deal with mathematical expressions, the two-dimen- sion-al structure of mathematical expressions is lost, which results in a low performance of math retrieval. While the retrieval technology specifically designed for mathematical expressions is not mature currently. Aiming at these problems, an improved mathematical expression indexing and matching method was proposed through employing full text index method to deal with the two-dimensional structure of mathematical expressions. Firstly, through the fully consideration of LaTeX formulae’ characteristics, a feature representation method of mathematical expressions and a clustering method of feature keywords were put forward. Then, an improved inter-relevant successive trees index model was applied to the construction of the mathematical expression index, in which the cluster algorithm of mathematical expression features was employed to solve the problem of the quantity growth of the trees in processing large amount of formulae. Finally, the matching algorithms of mathematical expressions were given which provide four query modes called exact matching, compatible matching, sub-expression matching and fuzzy matching. In browser/server mode, 110027 formulae were used as experimental samples. The index file size was 29.02 Mb. The average time of retrieval was 1.092 seconds. The experimental result shows the effectiveness of the method.
基金supported by the National Key Research and Development Program of China(2021YFA1302100)the National Natural Science Foundation of China(82172861,32200542)+2 种基金the Young Elite Scientists Sponsorship Program by Guangzhou Association for Science and Technology(QT-2023-045)the Youth Talent Support Program of Guangdong Provincial Association for Science and Technology(SKXRC202313)the Young Talents Program of Sun Yat-sen University Cancer Center(YTP-SYSUCC-0033)。
文摘The exponential growth of bioinformatics tools in recent years has posed challenges for scientists in selecting the most suitable one for their data analysis assignments.Therefore,to aid scientists in making informed choices,a community-based platform that indexes and rates bioinformatics tools is urgently needed.In this study,we introduce Bio Treasury(http://biotreasury.rjmart.cn),an integrated communitybased repository that provides an interactive platform for users and developers to share their experiences in various bioinformatics tools.Bio Treasury offers a comprehensive collection of well-indexed bioinformatics software,tools,and databases,totaling over 10,000 entries.In the past two years,we have continuously improved and maintained Bio Treasury,adding several exciting features,including creating structured homepages for every tool and user,a hierarchical category of bioinformatics tools and classifying tools using large language model(LLM).Bio Treasury streamlines the tool submission process with intelligent auto-completion.Additionally,Bio Treasury provides a wide range of social features,for example,enabling users to participate in interactive discussions,rate tools,build and share tool collections for the public.We believe Bio Treasury can be a valuable resource and knowledge-sharing platform for the biomedical community.It empowers researchers to effectively discover and evaluate bioinformatics tools,fostering collaboration and advancing bioinformatics research.
基金supported in part by Key-Area Research and Development Program of Guangdong Province(No.2020B010164002)the Fundamental Research Foundation of Shenzhen Technology and Innovation Council(No.KCXFZ20201221173613035).
文摘A large volume of Remote Sensing(RS)data has been generated with the deployment of satellite technologies.The data facilitate research in ecological monitoring,land management and desertification,etc.The characteristics of RS data(e.g.,enormous volume,large single-file size,and demanding requirement of fault tolerance)make the Hadoop Distributed File System(HDFS)an ideal choice for RS data storage as it is efficient,scalable,and equipped with a data replication mechanism for failure resilience.To use RS data,one of the most important techniques is geospatial indexing.However,the large data volume makes it time-consuming to efficiently construct and leverage.Considering that most modern geospatial data centres are equipped with HDFS-based big data processing infrastructures,deploying multiple geospatial indices becomes natural to optimise the efficacy.Moreover,because of the reliability introduced by high-quality hardware and the infrequently modified property of the RS data,the use of multi-indexing will not cause large overhead.Therefore,we design a framework called Multi-IndeXing-RS(MIX-RS)that unifies the multi-indexing mechanism on top of the HDFS with data replication enabled for both fault tolerance and geospatial indexing efficiency.Given the fault tolerance provided by the HDFS,RS data are structurally stored inside for faster geospatial indexing.Additionally,multi-indexing enhances efficiency.The proposed technique naturally sits on top of the HDFS to form a holistic framework without incurring severe overhead or sophisticated system implementation efforts.The MIX-RS framework is implemented and evaluated using real remote sensing data provided by the Chinese Academy of Sciences,demonstrating excellent geospatial indexing performance.