This paper proposes a new approach for classification for query interfaces of Deep Web, which extracts features from the form's text data on the query interfaces, assisted with the synonym library, and uses radial ba...This paper proposes a new approach for classification for query interfaces of Deep Web, which extracts features from the form's text data on the query interfaces, assisted with the synonym library, and uses radial basic function neural network (RBFNN) algorithm to classify the query interfaces. The applied RBFNN is a kind of effective feed-forward artificial neural network, which has a simple networking structure but features with strength of excellent nonlinear approximation, fast convergence and global convergence. A TEL_8 query interfaces' data set from UIUC on-line database is used in our experiments, which consists of 477 query interfaces in 8 typical domains. Experimental results proved that the proposed approach can efficiently classify the query interfaces with an accuracy of 95.67%.展开更多
Deep Reinforcement Learning(DRL)is a class of Machine Learning(ML)that combines Deep Learning with Reinforcement Learning and provides a framework by which a system can learn from its previous actions in an environmen...Deep Reinforcement Learning(DRL)is a class of Machine Learning(ML)that combines Deep Learning with Reinforcement Learning and provides a framework by which a system can learn from its previous actions in an environment to select its efforts in the future efficiently.DRL has been used in many application fields,including games,robots,networks,etc.for creating autonomous systems that improve themselves with experience.It is well acknowledged that DRL is well suited to solve optimization problems in distributed systems in general and network routing especially.Therefore,a novel query routing approach called Deep Reinforcement Learning based Route Selection(DRLRS)is proposed for unstructured P2P networks based on a Deep Q-Learning algorithm.The main objective of this approach is to achieve better retrieval effectiveness with reduced searching cost by less number of connected peers,exchangedmessages,and reduced time.The simulation results shows a significantly improve searching a resource with compression to k-Random Walker and Directed BFS.Here,retrieval effectiveness,search cost in terms of connected peers,and average overhead are 1.28,106,149,respectively.展开更多
Users can obtain the information through a basic web searching and find the answer to the questions directly,but maybe the expected answer does not exist.Besides,we do not know the update of new information in time.Th...Users can obtain the information through a basic web searching and find the answer to the questions directly,but maybe the expected answer does not exist.Besides,we do not know the update of new information in time.The online social networking services spread quickly and store many user data,but these data are worth less and may be unreliable answer to users’ questions.Users can obtain the simple answer but can not expect more additional information in knowledge question-answering(QA)system.In this paper,we design the system with the advantages of knowledge QA system,web searching and characteristics of social networking service for providing social network channel based on the query and answer without users’ contact network.The user can obtain real-time answers by the user network interested in users’ querires through the network channel of this system,get the additional information effectively and share it with others in the social network channel in this system.展开更多
The neural network has attracted researchers immensely in the last couple of years due to its wide applications in various areas such as Data mining,Natural language processing,Image processing,and Information retriev...The neural network has attracted researchers immensely in the last couple of years due to its wide applications in various areas such as Data mining,Natural language processing,Image processing,and Information retrieval etc.Word embedding has been applied by many researchers for Information retrieval tasks.In this paper word embedding-based skip-gram model has been developed for the query expansion task.Vocabulary terms are obtained from the top“k”initially retrieved documents using the Pseudo relevance feedback model and then they are trained using the skip-gram model to find the expansion terms for the user query.The performance of the model based on mean average precision is 0.3176.The proposed model compares with other existing models.An improvement of 6.61%,6.93%,and 9.07%on MAP value is observed compare to the Original query,BM25 model,and query expansion with the Chi-Square model respectively.The proposed model also retrieves 84,25,and 81 additional relevant documents compare to the original query,query expansion with Chi-Square model,and BM25 model respectively and thus improves the recall value also.The per query analysis reveals that the proposed model performs well in 30,36,and 30 queries compare to the original query,query expansion with Chi-square model,and BM25 model respectively.展开更多
As one of the commonly used queries in modern databases, skyline query has received extensive attention from database research community. The uncertainty of the data in wireless sensor networks makes the corresponding...As one of the commonly used queries in modern databases, skyline query has received extensive attention from database research community. The uncertainty of the data in wireless sensor networks makes the corresponding skyline uncertain and not unique. This paper investigates the Pr-Skyline problem, i.e., how to compute the skyline with the highest existence probability in a computational and energy-efficient way. We formulate the problem and prove that it is NP-Complete and cannot be approximated in a given expression. However, the proposed algorithm SKY-SEARCH with pruning techniques can guarantee the computational efficiency given relatively large input size, while the filter-based distributed optimization strategy significantly reduces the transmission cost and the required storage space of the sensor nodes. Extensive experiments verify the efficiency and scalability of SKY-SEARCH and the distributed optimizing strategy.展开更多
Sensor networks consisted of low-cost, low-power, multifunctional miniature sensor devices have played an important role in our daily life. Light and humidity monitoring, seismic and animal activity detection, environ...Sensor networks consisted of low-cost, low-power, multifunctional miniature sensor devices have played an important role in our daily life. Light and humidity monitoring, seismic and animal activity detection, environment and habitat monitoring are the most common applications. However, due to the limited power supply, ordinary query methods and algorithms can not be applied on sensor networks. Queries over sensor networks should be power-aware to guarantee the maximum power savings. The minimal power consumption by avoiding the expensive communication of the redundant sensor nodes is concentrated on. A lot of work have been done to reduce the participated nodes, but none of them have considered the overlapping minimum bounded rectangle (MBR) of sensors which make them impossible to reach the optimization solution. The proposed OMSI-tree and OMR algorithm can efficiently solve this problem by executing a given query only on the sensors involved. Experiments show that there is an obvious improvement compared with TinyDB and other spatial index, adopting the proposed schema and algorithm.展开更多
HashQuery,a Hash-area-based data dissemination protocol,was designed in wireless sensor networks. Using a Hash function which uses time as the key,both mobile sinks and sensors can determine the same Hash area. The se...HashQuery,a Hash-area-based data dissemination protocol,was designed in wireless sensor networks. Using a Hash function which uses time as the key,both mobile sinks and sensors can determine the same Hash area. The sensors can send the information about the events that they monitor to the Hash area and the mobile sinks need only to query that area instead of flooding among the whole network,and thus much energy can be saved. In addition,the location of the Hash area changes over time so as to balance the energy consumption in the whole network. Theoretical analysis shows that the proposed protocol can be energy-efficient and simulation studies further show that when there are 5 sources and 5 sinks in the network,it can save at least 50% energy compared with the existing two-tier data dissemination(TTDD) protocol,especially in large-scale wireless sensor networks.展开更多
Fast identifying the amount of information that can be gained by measuring a network via shortest-paths is one of the fundamental problem for networks exploration and monitoring.However,the existing methods are time-c...Fast identifying the amount of information that can be gained by measuring a network via shortest-paths is one of the fundamental problem for networks exploration and monitoring.However,the existing methods are time-consuming for even moderate-scale networks.In this paper,we present a method for fast shortest-path cover identification in both exact and approximate scenarios based on the relationship between the identification and the shortest distance queries.The effectiveness of the proposed method is validated through synthetic and real-world networks.The experimental results show that our method is 105 times faster than the existing methods and can solve the shortest-path cover identification in a few seconds for large-scale networks with millions of nodes and edges.展开更多
The state-of-the-art query techniques in power grid monitoring systems focus on querying history data, which typically introduces an unwanted lag when the systems try to discover emergency situations. The monitoring d...The state-of-the-art query techniques in power grid monitoring systems focus on querying history data, which typically introduces an unwanted lag when the systems try to discover emergency situations. The monitoring data of large-scale smart grids are massive, dynamic and highly dimensional, so global query, the method widely adopted in continuous queries in Wireless Sensor Networks(WSN), is rendered not suitable for its high energy consumption. The situation is even worse with increasing application complexity. We propose an energy-efficient query technique for large-scale smart grids based on variable regions. This method can query an arbitrary region based on variable physical windows, and optimizes data retrieve paths by a key nodes selection strategy. According to the characteristics of sensing data, we introduce an efficient filter into the each query subtree to keep non-skyline data from being retrieved. Experimental results show that our method can efficiently return the overview situation of any query region. Compared to TAG and ESA, the average query efficiency of our approach is improved by 79% and 46%, respectively; the total energy consumption of regional query is decreased by 82% and 50%, respectively.展开更多
Depicting the associating degrees between two concepts and their relationships are major works for constructing a multi-relationship fuzzy concept network. This paper indicates some drawbacks of the existing methods o...Depicting the associating degrees between two concepts and their relationships are major works for constructing a multi-relationship fuzzy concept network. This paper indicates some drawbacks of the existing methods of calculating associating degrees between concepts, and proposes a new method for overcoming these drawbacks. We also use some examples to compare the proposed method with the existing methods for calculating the associating degrees between two concepts in a multi-relationship fuzzy concept networks.展开更多
A novel technique called the bitmap lattice index(BLI) is proposed, which combines the advantages of a wireless broadcasting environment with a road network. Existing road networks are based on the on-demand method: a...A novel technique called the bitmap lattice index(BLI) is proposed, which combines the advantages of a wireless broadcasting environment with a road network. Existing road networks are based on the on-demand method: a server's workload increases as the query request increases when a server sends a client information. To solve this problem, we propose the BLI. The BLI denotes an object and a node as 0 and 1 in the Hilbert curve(HC) map. The BLI can identify the position of a node and an object through bit information; it can also reduce the broadcasting frequency of a server by reducing the size of the index, thereby decreasing the access latency and query processing times. Moreover, the BLI is highly effective for data filtering, as it can identify the positions of both an object and a node. In a road network, if filtering is done via the Euclidean distance, it may result in an error. To prevent this, we add another validation procedure. The experiment is conducted by applying the BLI to kNN query, and the technique is assessed by a performance evaluation experiment.展开更多
In the data retrieval process of the Data recommendation system,the matching prediction and similarity identification take place a major role in the ontology.In that,there are several methods to improve the retrieving...In the data retrieval process of the Data recommendation system,the matching prediction and similarity identification take place a major role in the ontology.In that,there are several methods to improve the retrieving process with improved accuracy and to reduce the searching time.Since,in the data recommendation system,this type of data searching becomes complex to search for the best matching for given query data and fails in the accuracy of the query recommendation process.To improve the performance of data validation,this paper proposed a novel model of data similarity estimation and clustering method to retrieve the relevant data with the best matching in the big data processing.In this paper advanced model of the Logarithmic Directionality Texture Pattern(LDTP)method with a Metaheuristic Pattern Searching(MPS)system was used to estimate the similarity between the query data in the entire database.The overall work was implemented for the application of the data recommendation process.These are all indexed and grouped as a cluster to form a paged format of database structure which can reduce the computation time while at the searching period.Also,with the help of a neural network,the relevancies of feature attributes in the database are predicted,and the matching index was sorted to provide the recommended data for given query data.This was achieved by using the Distributional Recurrent Neural Network(DRNN).This is an enhanced model of Neural Network technology to find the relevancy based on the correlation factor of the feature set.The training process of the DRNN classifier was carried out by estimating the correlation factor of the attributes of the dataset.These are formed as clusters and paged with proper indexing based on the MPS parameter of similarity metric.The overall performance of the proposed work can be evaluated by varying the size of the training database by 60%,70%,and 80%.The parameters that are considered for performance analysis are Precision,Recall,F1-score and the accuracy of data retrieval,the query recommendation output,and comparison with other state-of-art methods.展开更多
We investigated the application of Causal Bayesian Networks (CBNs) to large data sets in order to predict user intent via internet search prediction. Here, sample data are taken from search engine logs (Excite, Altavi...We investigated the application of Causal Bayesian Networks (CBNs) to large data sets in order to predict user intent via internet search prediction. Here, sample data are taken from search engine logs (Excite, Altavista, and Alltheweb). These logs are parsed and sorted in order to create a data structure that was used to build a CBN. This network is used to predict the next term or terms that the user may be about to search (type). We looked at the application of CBNs, compared with Naive Bays and Bays Net classifiers on very large datasets. To simulate our proposed results, we took a small sample of search data logs to predict intentional query typing. Additionally, problems that arise with the use of such a data structure are addressed individually along with the solutions used and their prediction accuracy and sensitivity.展开更多
In Delay Tolerant Networks (DTNs), the offiine users can, through the encountering nodes, use the specific peer-to-peer message routing approach to deliver messages to the destination. Thus, it solves the problem th...In Delay Tolerant Networks (DTNs), the offiine users can, through the encountering nodes, use the specific peer-to-peer message routing approach to deliver messages to the destination. Thus, it solves the problem that users have the demands to deliver messages while they are temporarily not able to connect to the Internet. Therefore, by the characteristics of DTNs, people who are not online can still query some location based information, with the help of users using the same service in the nearby area. In this paper, we proposed a location-based content search approach. Based on the concept of three-tier area and hybrid node types, we presented four strategies to solve the query problem, namely, Data Replication, Query Replication, Data Reply, and Data Synchronization strategies. Especially we proposed a Message Queue Selection algorithm for message transferring. The priority concept is set associated with every message such that the most "important" one could be sent first. In this way, it can increase the query success ratio and reduce the query delay time. Finally, we evaluated our approach, and compared with other routing schemes. The simulation results showed that our proposed approach had better query efficiency and shorter delay.展开更多
针对传统的数据库管理系统无法很好地学习谓词之间的交互以及无法准确地估计复杂查询的基数问题,提出了一种树形结构的长短期记忆神经网络(Tree Long Short Term Memory, TreeLSTM)模型建模查询,并使用该模型对新的查询基数进行估计.所...针对传统的数据库管理系统无法很好地学习谓词之间的交互以及无法准确地估计复杂查询的基数问题,提出了一种树形结构的长短期记忆神经网络(Tree Long Short Term Memory, TreeLSTM)模型建模查询,并使用该模型对新的查询基数进行估计.所提出的模型考虑了查询语句中包含的合取和析取运算,根据谓词之间的操作符类型将子表达式构建为树形结构,根据组合子表达式向量来表示连续向量空间中的任意逻辑表达式.TreeLSTM模型通过捕捉查询谓词之间的顺序依赖关系从而提升基数估计的性能和准确度,将TreeLSTM与基于直方图方法、基于学习的MSCN和TreeRNN方法进行了比较.实验结果表明:TreeLSTM的估算误差比直方图、MSCN、TreeRNN方法的误差分别降低了60.41%,33.33%和11.57%,该方法显著提高了基数估计器的性能.展开更多
基金Supported by the National Natural Science Foundation of China(60473045)the Research Plan of Hebei Province(05213573)the Research Plan of Education Office of Hebei Province(2004406).
文摘This paper proposes a new approach for classification for query interfaces of Deep Web, which extracts features from the form's text data on the query interfaces, assisted with the synonym library, and uses radial basic function neural network (RBFNN) algorithm to classify the query interfaces. The applied RBFNN is a kind of effective feed-forward artificial neural network, which has a simple networking structure but features with strength of excellent nonlinear approximation, fast convergence and global convergence. A TEL_8 query interfaces' data set from UIUC on-line database is used in our experiments, which consists of 477 query interfaces in 8 typical domains. Experimental results proved that the proposed approach can efficiently classify the query interfaces with an accuracy of 95.67%.
基金Authors would like to thank the Deanship of Scientific Research at Shaqra University for supporting this work under Project No.g01/n04.
文摘Deep Reinforcement Learning(DRL)is a class of Machine Learning(ML)that combines Deep Learning with Reinforcement Learning and provides a framework by which a system can learn from its previous actions in an environment to select its efforts in the future efficiently.DRL has been used in many application fields,including games,robots,networks,etc.for creating autonomous systems that improve themselves with experience.It is well acknowledged that DRL is well suited to solve optimization problems in distributed systems in general and network routing especially.Therefore,a novel query routing approach called Deep Reinforcement Learning based Route Selection(DRLRS)is proposed for unstructured P2P networks based on a Deep Q-Learning algorithm.The main objective of this approach is to achieve better retrieval effectiveness with reduced searching cost by less number of connected peers,exchangedmessages,and reduced time.The simulation results shows a significantly improve searching a resource with compression to k-Random Walker and Directed BFS.Here,retrieval effectiveness,search cost in terms of connected peers,and average overhead are 1.28,106,149,respectively.
基金Industrial Strategic Technology Development Program,Development of a Cognitive Planning and Learning Model for Mobile Platforms(No.10035348) funded by MKE(the Ministry of Knowledge Economy),Korea
文摘Users can obtain the information through a basic web searching and find the answer to the questions directly,but maybe the expected answer does not exist.Besides,we do not know the update of new information in time.The online social networking services spread quickly and store many user data,but these data are worth less and may be unreliable answer to users’ questions.Users can obtain the simple answer but can not expect more additional information in knowledge question-answering(QA)system.In this paper,we design the system with the advantages of knowledge QA system,web searching and characteristics of social networking service for providing social network channel based on the query and answer without users’ contact network.The user can obtain real-time answers by the user network interested in users’ querires through the network channel of this system,get the additional information effectively and share it with others in the social network channel in this system.
文摘The neural network has attracted researchers immensely in the last couple of years due to its wide applications in various areas such as Data mining,Natural language processing,Image processing,and Information retrieval etc.Word embedding has been applied by many researchers for Information retrieval tasks.In this paper word embedding-based skip-gram model has been developed for the query expansion task.Vocabulary terms are obtained from the top“k”initially retrieved documents using the Pseudo relevance feedback model and then they are trained using the skip-gram model to find the expansion terms for the user query.The performance of the model based on mean average precision is 0.3176.The proposed model compares with other existing models.An improvement of 6.61%,6.93%,and 9.07%on MAP value is observed compare to the Original query,BM25 model,and query expansion with the Chi-Square model respectively.The proposed model also retrieves 84,25,and 81 additional relevant documents compare to the original query,query expansion with Chi-Square model,and BM25 model respectively and thus improves the recall value also.The per query analysis reveals that the proposed model performs well in 30,36,and 30 queries compare to the original query,query expansion with Chi-square model,and BM25 model respectively.
文摘As one of the commonly used queries in modern databases, skyline query has received extensive attention from database research community. The uncertainty of the data in wireless sensor networks makes the corresponding skyline uncertain and not unique. This paper investigates the Pr-Skyline problem, i.e., how to compute the skyline with the highest existence probability in a computational and energy-efficient way. We formulate the problem and prove that it is NP-Complete and cannot be approximated in a given expression. However, the proposed algorithm SKY-SEARCH with pruning techniques can guarantee the computational efficiency given relatively large input size, while the filter-based distributed optimization strategy significantly reduces the transmission cost and the required storage space of the sensor nodes. Extensive experiments verify the efficiency and scalability of SKY-SEARCH and the distributed optimizing strategy.
基金This work is supported by the MIC ( Ministry of Information and Communication) , Korea ,under the ITRC(Information Technology Research Center) support program supervised by the IITA (Institute of Information Technology Assess-ment) .
文摘Sensor networks consisted of low-cost, low-power, multifunctional miniature sensor devices have played an important role in our daily life. Light and humidity monitoring, seismic and animal activity detection, environment and habitat monitoring are the most common applications. However, due to the limited power supply, ordinary query methods and algorithms can not be applied on sensor networks. Queries over sensor networks should be power-aware to guarantee the maximum power savings. The minimal power consumption by avoiding the expensive communication of the redundant sensor nodes is concentrated on. A lot of work have been done to reduce the participated nodes, but none of them have considered the overlapping minimum bounded rectangle (MBR) of sensors which make them impossible to reach the optimization solution. The proposed OMSI-tree and OMR algorithm can efficiently solve this problem by executing a given query only on the sensors involved. Experiments show that there is an obvious improvement compared with TinyDB and other spatial index, adopting the proposed schema and algorithm.
基金Project(07JJ1010) supported by Hunan Provincial Natural Science Foundation of ChinaProjects(2006AA01Z202, 2006AA01Z199) supported by the National High-Tech Research and Development Program of China+2 种基金Project(7002102) supported by the City University of Hong Kong, Strategic Research Grant (SRG)Project(IRT-0661) supported by the Program for Changjiang Scholars and Innovative Research Team in UniversityProject(NCET-06-0686) supported by the Program for New Century Excellent Talents in University
文摘HashQuery,a Hash-area-based data dissemination protocol,was designed in wireless sensor networks. Using a Hash function which uses time as the key,both mobile sinks and sensors can determine the same Hash area. The sensors can send the information about the events that they monitor to the Hash area and the mobile sinks need only to query that area instead of flooding among the whole network,and thus much energy can be saved. In addition,the location of the Hash area changes over time so as to balance the energy consumption in the whole network. Theoretical analysis shows that the proposed protocol can be energy-efficient and simulation studies further show that when there are 5 sources and 5 sinks in the network,it can save at least 50% energy compared with the existing two-tier data dissemination(TTDD) protocol,especially in large-scale wireless sensor networks.
基金This work was supported in part by the National Natural Science Foundation of China(61471101)the National Natural Science Foundation of China(U1736205).
文摘Fast identifying the amount of information that can be gained by measuring a network via shortest-paths is one of the fundamental problem for networks exploration and monitoring.However,the existing methods are time-consuming for even moderate-scale networks.In this paper,we present a method for fast shortest-path cover identification in both exact and approximate scenarios based on the relationship between the identification and the shortest distance queries.The effectiveness of the proposed method is validated through synthetic and real-world networks.The experimental results show that our method is 105 times faster than the existing methods and can solve the shortest-path cover identification in a few seconds for large-scale networks with millions of nodes and edges.
基金supported by the National Natural Science Foundation of China (NO. 61472072, 61528202, 61501105, 61472169)the Foundation of Science Public Welfare of Liaoning Province in China (NO. 2015003003)
文摘The state-of-the-art query techniques in power grid monitoring systems focus on querying history data, which typically introduces an unwanted lag when the systems try to discover emergency situations. The monitoring data of large-scale smart grids are massive, dynamic and highly dimensional, so global query, the method widely adopted in continuous queries in Wireless Sensor Networks(WSN), is rendered not suitable for its high energy consumption. The situation is even worse with increasing application complexity. We propose an energy-efficient query technique for large-scale smart grids based on variable regions. This method can query an arbitrary region based on variable physical windows, and optimizes data retrieve paths by a key nodes selection strategy. According to the characteristics of sensing data, we introduce an efficient filter into the each query subtree to keep non-skyline data from being retrieved. Experimental results show that our method can efficiently return the overview situation of any query region. Compared to TAG and ESA, the average query efficiency of our approach is improved by 79% and 46%, respectively; the total energy consumption of regional query is decreased by 82% and 50%, respectively.
文摘Depicting the associating degrees between two concepts and their relationships are major works for constructing a multi-relationship fuzzy concept network. This paper indicates some drawbacks of the existing methods of calculating associating degrees between concepts, and proposes a new method for overcoming these drawbacks. We also use some examples to compare the proposed method with the existing methods for calculating the associating degrees between two concepts in a multi-relationship fuzzy concept networks.
基金supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science and Technology (NRF2013R1A1A1004593, 2013R1A1A1A05012348)
文摘A novel technique called the bitmap lattice index(BLI) is proposed, which combines the advantages of a wireless broadcasting environment with a road network. Existing road networks are based on the on-demand method: a server's workload increases as the query request increases when a server sends a client information. To solve this problem, we propose the BLI. The BLI denotes an object and a node as 0 and 1 in the Hilbert curve(HC) map. The BLI can identify the position of a node and an object through bit information; it can also reduce the broadcasting frequency of a server by reducing the size of the index, thereby decreasing the access latency and query processing times. Moreover, the BLI is highly effective for data filtering, as it can identify the positions of both an object and a node. In a road network, if filtering is done via the Euclidean distance, it may result in an error. To prevent this, we add another validation procedure. The experiment is conducted by applying the BLI to kNN query, and the technique is assessed by a performance evaluation experiment.
文摘In the data retrieval process of the Data recommendation system,the matching prediction and similarity identification take place a major role in the ontology.In that,there are several methods to improve the retrieving process with improved accuracy and to reduce the searching time.Since,in the data recommendation system,this type of data searching becomes complex to search for the best matching for given query data and fails in the accuracy of the query recommendation process.To improve the performance of data validation,this paper proposed a novel model of data similarity estimation and clustering method to retrieve the relevant data with the best matching in the big data processing.In this paper advanced model of the Logarithmic Directionality Texture Pattern(LDTP)method with a Metaheuristic Pattern Searching(MPS)system was used to estimate the similarity between the query data in the entire database.The overall work was implemented for the application of the data recommendation process.These are all indexed and grouped as a cluster to form a paged format of database structure which can reduce the computation time while at the searching period.Also,with the help of a neural network,the relevancies of feature attributes in the database are predicted,and the matching index was sorted to provide the recommended data for given query data.This was achieved by using the Distributional Recurrent Neural Network(DRNN).This is an enhanced model of Neural Network technology to find the relevancy based on the correlation factor of the feature set.The training process of the DRNN classifier was carried out by estimating the correlation factor of the attributes of the dataset.These are formed as clusters and paged with proper indexing based on the MPS parameter of similarity metric.The overall performance of the proposed work can be evaluated by varying the size of the training database by 60%,70%,and 80%.The parameters that are considered for performance analysis are Precision,Recall,F1-score and the accuracy of data retrieval,the query recommendation output,and comparison with other state-of-art methods.
文摘We investigated the application of Causal Bayesian Networks (CBNs) to large data sets in order to predict user intent via internet search prediction. Here, sample data are taken from search engine logs (Excite, Altavista, and Alltheweb). These logs are parsed and sorted in order to create a data structure that was used to build a CBN. This network is used to predict the next term or terms that the user may be about to search (type). We looked at the application of CBNs, compared with Naive Bays and Bays Net classifiers on very large datasets. To simulate our proposed results, we took a small sample of search data logs to predict intentional query typing. Additionally, problems that arise with the use of such a data structure are addressed individually along with the solutions used and their prediction accuracy and sensitivity.
文摘In Delay Tolerant Networks (DTNs), the offiine users can, through the encountering nodes, use the specific peer-to-peer message routing approach to deliver messages to the destination. Thus, it solves the problem that users have the demands to deliver messages while they are temporarily not able to connect to the Internet. Therefore, by the characteristics of DTNs, people who are not online can still query some location based information, with the help of users using the same service in the nearby area. In this paper, we proposed a location-based content search approach. Based on the concept of three-tier area and hybrid node types, we presented four strategies to solve the query problem, namely, Data Replication, Query Replication, Data Reply, and Data Synchronization strategies. Especially we proposed a Message Queue Selection algorithm for message transferring. The priority concept is set associated with every message such that the most "important" one could be sent first. In this way, it can increase the query success ratio and reduce the query delay time. Finally, we evaluated our approach, and compared with other routing schemes. The simulation results showed that our proposed approach had better query efficiency and shorter delay.
文摘针对传统的数据库管理系统无法很好地学习谓词之间的交互以及无法准确地估计复杂查询的基数问题,提出了一种树形结构的长短期记忆神经网络(Tree Long Short Term Memory, TreeLSTM)模型建模查询,并使用该模型对新的查询基数进行估计.所提出的模型考虑了查询语句中包含的合取和析取运算,根据谓词之间的操作符类型将子表达式构建为树形结构,根据组合子表达式向量来表示连续向量空间中的任意逻辑表达式.TreeLSTM模型通过捕捉查询谓词之间的顺序依赖关系从而提升基数估计的性能和准确度,将TreeLSTM与基于直方图方法、基于学习的MSCN和TreeRNN方法进行了比较.实验结果表明:TreeLSTM的估算误差比直方图、MSCN、TreeRNN方法的误差分别降低了60.41%,33.33%和11.57%,该方法显著提高了基数估计器的性能.