The use of online discussion forum can?effectively engage students in their studies. As the number of messages posted on the forum is increasing, it is more difficult for instructors to read and respond to them in a p...The use of online discussion forum can?effectively engage students in their studies. As the number of messages posted on the forum is increasing, it is more difficult for instructors to read and respond to them in a prompt way. In this paper, we apply non-negative matrix factorization and visualization to clustering message data, in order to provide a summary view of messages that disclose their deep semantic relationships. In particular, the NMF is able to find the underlying issues hidden in the messages about which most of the students are concerned. Visualization is employed to estimate the initial number of clusters, showing the relation communities. The experiments and comparison on a real dataset have been reported to demonstrate the effectiveness of the approaches.展开更多
Querying over XML elements using keyword search is steadily gaining popularity. The traditional similarity measure is widely employed in order to effectively retrieve various XML documents. A number of authors have al...Querying over XML elements using keyword search is steadily gaining popularity. The traditional similarity measure is widely employed in order to effectively retrieve various XML documents. A number of authors have already proposed different similarity-measure methods that take advantage of the structure and content of XML documents. However, they do not consider the similarity between latent semantic information of element texts and that of keywords in a query. Although many algorithms on XML element search are available, some of them have the high computational complexity due to searching for a huge number of elements. In this paper, we propose a new algorithm that makes use of the se-mantic similarity between elements instead of between entire XML documents, considering not only the structure and content of an XML document, but also semantic information of namespaces in elements. We compare our algorithm with the three other algorithms by testing on real datasets. The experiments have demonstrated that our proposed method is able to improve the query accuracy, as well as to reduce the running time.展开更多
Similarity search,that is,finding similar items in massive data,is a fundamental computing problem in many fields such as data mining and information retrieval.However,for large-scale and high-dimension data,it suffer...Similarity search,that is,finding similar items in massive data,is a fundamental computing problem in many fields such as data mining and information retrieval.However,for large-scale and high-dimension data,it suffers from high computational complexity,requiring tremendous computation resources.Here,based on the low-power self-selective memristors,for the first time,we propose an in-memory search(IMS)system with two innovative designs.First,by exploiting the natural distribution law of the devices resistance,a hardware locality sensitive hashing encoder has been designed to transform the realvalued vectors into more efficient binary codes.Second,a compact memristive ternary content addressable memory is developed to calculate the Hamming distances between the binary codes in parallel.Our IMS system demonstrated a 168energy efficiency improvement over all-transistors counterparts in clustering and classification tasks,while achieving a software-comparable accuracy,thus providing a low-complexity and low-power solution for in-memory data mining applications.展开更多
文摘The use of online discussion forum can?effectively engage students in their studies. As the number of messages posted on the forum is increasing, it is more difficult for instructors to read and respond to them in a prompt way. In this paper, we apply non-negative matrix factorization and visualization to clustering message data, in order to provide a summary view of messages that disclose their deep semantic relationships. In particular, the NMF is able to find the underlying issues hidden in the messages about which most of the students are concerned. Visualization is employed to estimate the initial number of clusters, showing the relation communities. The experiments and comparison on a real dataset have been reported to demonstrate the effectiveness of the approaches.
文摘Querying over XML elements using keyword search is steadily gaining popularity. The traditional similarity measure is widely employed in order to effectively retrieve various XML documents. A number of authors have already proposed different similarity-measure methods that take advantage of the structure and content of XML documents. However, they do not consider the similarity between latent semantic information of element texts and that of keywords in a query. Although many algorithms on XML element search are available, some of them have the high computational complexity due to searching for a huge number of elements. In this paper, we propose a new algorithm that makes use of the se-mantic similarity between elements instead of between entire XML documents, considering not only the structure and content of an XML document, but also semantic information of namespaces in elements. We compare our algorithm with the three other algorithms by testing on real datasets. The experiments have demonstrated that our proposed method is able to improve the query accuracy, as well as to reduce the running time.
基金National Key Research and Development Plan of MOST of China,Grant/Award Numbers:2019YFB2205100,2021ZD0201201National Natural Science Foundation of China,Grant/Award Number:92064012+1 种基金Hubei Engineering Research Center on MicroelectronicsChua Memristor Institute。
文摘Similarity search,that is,finding similar items in massive data,is a fundamental computing problem in many fields such as data mining and information retrieval.However,for large-scale and high-dimension data,it suffers from high computational complexity,requiring tremendous computation resources.Here,based on the low-power self-selective memristors,for the first time,we propose an in-memory search(IMS)system with two innovative designs.First,by exploiting the natural distribution law of the devices resistance,a hardware locality sensitive hashing encoder has been designed to transform the realvalued vectors into more efficient binary codes.Second,a compact memristive ternary content addressable memory is developed to calculate the Hamming distances between the binary codes in parallel.Our IMS system demonstrated a 168energy efficiency improvement over all-transistors counterparts in clustering and classification tasks,while achieving a software-comparable accuracy,thus providing a low-complexity and low-power solution for in-memory data mining applications.