The identification of influential nodes in complex networks is one of the most exciting topics in network science.The latest work successfully compares each node using local connectivity and weak tie theory from a new...The identification of influential nodes in complex networks is one of the most exciting topics in network science.The latest work successfully compares each node using local connectivity and weak tie theory from a new perspective.We study the structural properties of networks in depth and extend this successful node evaluation from single-scale to multi-scale.In particular,one novel position parameter based on node transmission efficiency is proposed,which mainly depends on the shortest distances from target nodes to high-degree nodes.In this regard,the novel multi-scale information importance(MSII)method is proposed to better identify the crucial nodes by combining the network's local connectivity and global position information.In simulation comparisons,five state-of-the-art algorithms,i.e.the neighbor nodes degree algorithm(NND),betweenness centrality,closeness centrality,Katz centrality and the k-shell decomposition method,are selected to compare with our MSII.The results demonstrate that our method obtains superior performance in terms of robustness and spreading propagation for both real-world and artificial networks.展开更多
The application of frequency distribution statistics to data provides objective means to assess the nature of the data distribution and viability of numerical models that are used to visualize and interpret data.Two c...The application of frequency distribution statistics to data provides objective means to assess the nature of the data distribution and viability of numerical models that are used to visualize and interpret data.Two commonly used tools are the kernel density estimation and reduced chi-squared statistic used in combination with a weighted mean.Due to the wide applicability of these tools,we present a Java-based computer application called KDX to facilitate the visualization of data and the utilization of these numerical tools.展开更多
In a question answering (QA) system, the fundamental problem is how to measure the distance between a question and an answer, hence ranking different answers. We demonstrate that such a distance can be precisely and...In a question answering (QA) system, the fundamental problem is how to measure the distance between a question and an answer, hence ranking different answers. We demonstrate that such a distance can be precisely and mathematically defined. Not only such a definition is possible, it is actually provably better than any other feasible definitions. Not only such an ultimate definition is possible, but also it can be conveniently and fruitfully applied to construct a QA system. We have built such a system -- QUANTA. Extensive experiments are conducted to justify the new theory.展开更多
Obtaining training material for rarely used English words and common given names from countries where English is not spoken is difficult due to excessive time, storage and cost factors. By considering personal privacy...Obtaining training material for rarely used English words and common given names from countries where English is not spoken is difficult due to excessive time, storage and cost factors. By considering personal privacy, language- independent (LI) with lightweight speaker-dependent (SD) automatic speech recognition (ASR) is a convenient option to solve tile problem. The dynamic time warping (DTW) algorithm is the state-of-the-art algorithm for small-footprint SD ASR for real-time applications with limited storage and small vocabularies. These applications include voice dialing on mobile devices, menu-driven recognition, and voice control on vehicles and robotics. However, traditional DTW has several lhnitations, such as high computational complexity, constraint induced coarse approximation, and inaccuracy problems. In this paper, we introduce the merge-weighted dynamic time warping (MWDTW) algorithm. This method defines a template confidence index for measuring the similarity between merged training data and testing data, while following the core DTW process. MWDTW is simple, efficient, and easy to implement. With extensive experiments on three representative SD speech recognition datasets, we demonstrate that our method outperforms DTW, DTW on merged speech data, the hidden Markov model (HMM) significantly, and is also six times faster than DTW overall.展开更多
Background Protein–RNA interaction is ubiquitous in cells and serves as the main mechanism for post-transcriptional regulation.RNA binding proteins(RBPs)not only control which transcripts are translated,but also dete...Background Protein–RNA interaction is ubiquitous in cells and serves as the main mechanism for post-transcriptional regulation.RNA binding proteins(RBPs)not only control which transcripts are translated,but also determine the speed,location,and concentration of m RNA translation,through controlling multiple layers of gene regulation.Base-dominant interaction and backbone-dominant interaction categorize the two main modes of the way RBPs interact with RNA.展开更多
Multiword Expressions (MWEs) appear frequently and ungrammatically in natural languages. Identifying MWEs in free texts is a very challenging problem. This paper proposes a knowledge-free, unsupervised, and language...Multiword Expressions (MWEs) appear frequently and ungrammatically in natural languages. Identifying MWEs in free texts is a very challenging problem. This paper proposes a knowledge-free, unsupervised, and languageindependent Multiword Expression Distance (MED). The new metric is derived from an accepted physical principle, measures the distance from an n-gram to its semantics, and outperforms other state-of-the-art methods on MWEs in two applications: question answering and named entity extraction.展开更多
基金Project supported by the National Natural Science Foundation of China(Grant Nos.11801430,11801200,61877046,and 61877047).
文摘The identification of influential nodes in complex networks is one of the most exciting topics in network science.The latest work successfully compares each node using local connectivity and weak tie theory from a new perspective.We study the structural properties of networks in depth and extend this successful node evaluation from single-scale to multi-scale.In particular,one novel position parameter based on node transmission efficiency is proposed,which mainly depends on the shortest distances from target nodes to high-degree nodes.In this regard,the novel multi-scale information importance(MSII)method is proposed to better identify the crucial nodes by combining the network's local connectivity and global position information.In simulation comparisons,five state-of-the-art algorithms,i.e.the neighbor nodes degree algorithm(NND),betweenness centrality,closeness centrality,Katz centrality and the k-shell decomposition method,are selected to compare with our MSII.The results demonstrate that our method obtains superior performance in terms of robustness and spreading propagation for both real-world and artificial networks.
文摘The application of frequency distribution statistics to data provides objective means to assess the nature of the data distribution and viability of numerical models that are used to visualize and interpret data.Two commonly used tools are the kernel density estimation and reduced chi-squared statistic used in combination with a weighted mean.Due to the wide applicability of these tools,we present a Java-based computer application called KDX to facilitate the visualization of data and the utilization of these numerical tools.
基金the National Natural Science Foundation of China under Grant Nos.60572084 and 60621062.
文摘In a question answering (QA) system, the fundamental problem is how to measure the distance between a question and an answer, hence ranking different answers. We demonstrate that such a distance can be precisely and mathematically defined. Not only such a definition is possible, it is actually provably better than any other feasible definitions. Not only such an ultimate definition is possible, but also it can be conveniently and fruitfully applied to construct a QA system. We have built such a system -- QUANTA. Extensive experiments are conducted to justify the new theory.
基金supported by the Research Plan Project of National University of Defense Technology under Grant No.JC13-06-01the OCRit Project made possible by the Global Leadership Round in Genomics&Life Sciences Grant(GL2)
文摘Obtaining training material for rarely used English words and common given names from countries where English is not spoken is difficult due to excessive time, storage and cost factors. By considering personal privacy, language- independent (LI) with lightweight speaker-dependent (SD) automatic speech recognition (ASR) is a convenient option to solve tile problem. The dynamic time warping (DTW) algorithm is the state-of-the-art algorithm for small-footprint SD ASR for real-time applications with limited storage and small vocabularies. These applications include voice dialing on mobile devices, menu-driven recognition, and voice control on vehicles and robotics. However, traditional DTW has several lhnitations, such as high computational complexity, constraint induced coarse approximation, and inaccuracy problems. In this paper, we introduce the merge-weighted dynamic time warping (MWDTW) algorithm. This method defines a template confidence index for measuring the similarity between merged training data and testing data, while following the core DTW process. MWDTW is simple, efficient, and easy to implement. With extensive experiments on three representative SD speech recognition datasets, we demonstrate that our method outperforms DTW, DTW on merged speech data, the hidden Markov model (HMM) significantly, and is also six times faster than DTW overall.
基金the National Key R&D Program of China(Grant No.2016YFB1000902)the National Natural Science Foundation of China(Grant No.61832019).
文摘Background Protein–RNA interaction is ubiquitous in cells and serves as the main mechanism for post-transcriptional regulation.RNA binding proteins(RBPs)not only control which transcripts are translated,but also determine the speed,location,and concentration of m RNA translation,through controlling multiple layers of gene regulation.Base-dominant interaction and backbone-dominant interaction categorize the two main modes of the way RBPs interact with RNA.
基金supported mainly by Canada's IDRC Research Chair in Information Technology Program,under Grant No.104519006supported by the National Natural Science Foundation of China under Grant No.60973104+2 种基金the National Basic Research 973 Program of China under Grant No.2007CB311003NSERC Grant OGP0046506Canada Research Chair Program,MITACS,an NSERC Collaborative Grant,and Ontario's Premier's Discovery Award
文摘Multiword Expressions (MWEs) appear frequently and ungrammatically in natural languages. Identifying MWEs in free texts is a very challenging problem. This paper proposes a knowledge-free, unsupervised, and languageindependent Multiword Expression Distance (MED). The new metric is derived from an accepted physical principle, measures the distance from an n-gram to its semantics, and outperforms other state-of-the-art methods on MWEs in two applications: question answering and named entity extraction.