The user’s intent to seek online information has been an active area of research in user profiling.User profiling considers user characteristics,behaviors,activities,and preferences to sketch user intentions,interest...The user’s intent to seek online information has been an active area of research in user profiling.User profiling considers user characteristics,behaviors,activities,and preferences to sketch user intentions,interests,and motivations.Determining user characteristics can help capture implicit and explicit preferences and intentions for effective user-centric and customized content presentation.The user’s complete online experience in seeking information is a blend of activities such as searching,verifying,and sharing it on social platforms.However,a combination of multiple behaviors in profiling users has yet to be considered.This research takes a novel approach and explores user intent types based on multidimensional online behavior in information acquisition.This research explores information search,verification,and dissemination behavior and identifies diverse types of users based on their online engagement using machine learning.The research proposes a generic user profile template that explains the user characteristics based on the internet experience and uses it as ground truth for data annotation.User feedback is based on online behavior and practices collected by using a survey method.The participants include both males and females from different occupation sectors and different ages.The data collected is subject to feature engineering,and the significant features are presented to unsupervised machine learning methods to identify user intent classes or profiles and their characteristics.Different techniques are evaluated,and the K-Mean clustering method successfully generates five user groups observing different user characteristics with an average silhouette of 0.36 and a distortion score of 1136.Feature average is computed to identify user intent type characteristics.The user intent classes are then further generalized to create a user intent template with an Inter-Rater Reliability of 75%.This research successfully extracts different user types based on their preferences in online content,platforms,criteria,and frequency.The study also validates the proposed template on user feedback data through Inter-Rater Agreement process using an external human rater.展开更多
Personalized search utilizes user preferences to optimize search results,and most existing studies obtain user preferences by analyzing user behaviors in search engines that provide click-through data.However,the beha...Personalized search utilizes user preferences to optimize search results,and most existing studies obtain user preferences by analyzing user behaviors in search engines that provide click-through data.However,the behavioral data are noisy because users often clicked some irrelevant documents to find their required information,and the new user cold start issue represents a serious problem,greatly reducing the performance of personalized search.This paper attempts to utilize online social network data to obtain user preferences that can be used to personalize search results,mine the knowledge of user interests,user influence and user relationships from online social networks,and use this knowledge to optimize the results returned by search engines.The proposed model is based on a holonic multiagent system that improves the adaptability and scalability of the model.The experimental results show that utilizing online social network data to implement personalized search is feasible and that online social network data are significant for personalized search.展开更多
Online social media networks are gaining attention worldwide,with an increasing number of people relying on them to connect,communicate and share their daily pertinent event-related information.Event detection is now ...Online social media networks are gaining attention worldwide,with an increasing number of people relying on them to connect,communicate and share their daily pertinent event-related information.Event detection is now increasingly leveraging online social networks for highlighting events happening around the world via the Internet of People.In this paper,a novel Event Detection model based on Scoring and Word Embedding(ED-SWE)is proposed for discovering key events from a large volume of data streams of tweets and for generating an event summary using keywords and top-k tweets.The proposed ED-SWE model can distill high-quality tweets,reduce the negative impact of the advent of spam,and identify latent events in the data streams automatically.Moreover,a word embedding algorithm is used to learn a real-valued vector representation for a predefined fixed-sized vocabulary from a corpus of Twitter data.In order to further improve the performance of the Expectation-Maximization(EM)iteration algorithm,a novel initialization method based on the authority values of the tweets is also proposed in this paper to detect live events efficiently and precisely.Finally,a novel automatic identification method based on the cosine measure is used to automatically evaluate whether a given topic can form a live event.Experiments conducted on a real-world dataset demonstrate that the ED-SWE model exhibits better efficiency and accuracy than several state-of-art event detection models.展开更多
With the rapid development of higher education, more and more people are entitled doctoral or master’s degrees, resulting in the considerable increase in doctoral dissertations and master’s theses. The application o...With the rapid development of higher education, more and more people are entitled doctoral or master’s degrees, resulting in the considerable increase in doctoral dissertations and master’s theses. The application of IT technology renders the possibility to digitize those dissertations, which has contributed a lot to the construction of digital library. The characteristics and corresponding problems like property right, security and sharing will be discussed in this paper. And the paper makes a general introduction will be made to the specific ways adopted by the library of UESTC and its distribution of digitized dissertation resources.展开更多
随着内地和台湾地区交流的日益密切和频繁,加强两岸术语研究工作的交流与互鉴变得尤为重要。文章对台湾地区术语建设的管理结构、历时发展、已有成果,两岸共同编纂术语工具书的合作成果,“乐词网”术语搜索及资源在线平台,两岸共同建设...随着内地和台湾地区交流的日益密切和频繁,加强两岸术语研究工作的交流与互鉴变得尤为重要。文章对台湾地区术语建设的管理结构、历时发展、已有成果,两岸共同编纂术语工具书的合作成果,“乐词网”术语搜索及资源在线平台,两岸共同建设的“中华语文知识库”及其他语料库进行了详细介绍和全面梳理。对台湾地区在Web of Science(WOS)核心合集数据库中与术语相关的研究进行了主题抽样分析,借助文献计量学工具VOSviewer进行了可视化呈现。揭示了台湾地区学者在国际核心期刊上发表的术语相关研究的发展趋势和热点议题。以期为众多两岸术语研究者、语言爱好者提供研究与学习的素材和途径,助力两岸学者的沟通与合作,并确定未来协作努力的方向,也为两岸的术语建设、制定科技发展战略提供有益的参考和支撑。展开更多
MOOC (Massive Open Online Courses) has become more and more popular all over the world in recent years. However, search engines, such as Google, Baidu, Yahoo and Bing, do not support specialized MOOC courses searching...MOOC (Massive Open Online Courses) has become more and more popular all over the world in recent years. However, search engines, such as Google, Baidu, Yahoo and Bing, do not support specialized MOOC courses searching. The purpose of this demo is to present a vertical search engine designed to retrieve MOOC courses for learner. The demo search engine obtains MOOC web pages by a focused Crawler. And the pages are parsed into structure or unstructure data with a modeling-based Parser. Then the Indexer build index for the data by Lucene. Finally, the extraction MOOC list is made by Course_ranking and Retrieval. The demo search engine is accessible at http://www.MOOCsoso.com.展开更多
文摘The user’s intent to seek online information has been an active area of research in user profiling.User profiling considers user characteristics,behaviors,activities,and preferences to sketch user intentions,interests,and motivations.Determining user characteristics can help capture implicit and explicit preferences and intentions for effective user-centric and customized content presentation.The user’s complete online experience in seeking information is a blend of activities such as searching,verifying,and sharing it on social platforms.However,a combination of multiple behaviors in profiling users has yet to be considered.This research takes a novel approach and explores user intent types based on multidimensional online behavior in information acquisition.This research explores information search,verification,and dissemination behavior and identifies diverse types of users based on their online engagement using machine learning.The research proposes a generic user profile template that explains the user characteristics based on the internet experience and uses it as ground truth for data annotation.User feedback is based on online behavior and practices collected by using a survey method.The participants include both males and females from different occupation sectors and different ages.The data collected is subject to feature engineering,and the significant features are presented to unsupervised machine learning methods to identify user intent classes or profiles and their characteristics.Different techniques are evaluated,and the K-Mean clustering method successfully generates five user groups observing different user characteristics with an average silhouette of 0.36 and a distortion score of 1136.Feature average is computed to identify user intent type characteristics.The user intent classes are then further generalized to create a user intent template with an Inter-Rater Reliability of 75%.This research successfully extracts different user types based on their preferences in online content,platforms,criteria,and frequency.The study also validates the proposed template on user feedback data through Inter-Rater Agreement process using an external human rater.
基金supported by the National Natural Science Foundation of China (61972300, 61672401, 61373045, and 61902288,)the Pre-Research Project of the “Thirteenth Five-Year-Plan” of China (315***10101 and 315**0102)
文摘Personalized search utilizes user preferences to optimize search results,and most existing studies obtain user preferences by analyzing user behaviors in search engines that provide click-through data.However,the behavioral data are noisy because users often clicked some irrelevant documents to find their required information,and the new user cold start issue represents a serious problem,greatly reducing the performance of personalized search.This paper attempts to utilize online social network data to obtain user preferences that can be used to personalize search results,mine the knowledge of user interests,user influence and user relationships from online social networks,and use this knowledge to optimize the results returned by search engines.The proposed model is based on a holonic multiagent system that improves the adaptability and scalability of the model.The experimental results show that utilizing online social network data to implement personalized search is feasible and that online social network data are significant for personalized search.
基金The work reported in this paper has been supported by UK-Jiangsu 20-20 World Class University Initiative programme.
文摘Online social media networks are gaining attention worldwide,with an increasing number of people relying on them to connect,communicate and share their daily pertinent event-related information.Event detection is now increasingly leveraging online social networks for highlighting events happening around the world via the Internet of People.In this paper,a novel Event Detection model based on Scoring and Word Embedding(ED-SWE)is proposed for discovering key events from a large volume of data streams of tweets and for generating an event summary using keywords and top-k tweets.The proposed ED-SWE model can distill high-quality tweets,reduce the negative impact of the advent of spam,and identify latent events in the data streams automatically.Moreover,a word embedding algorithm is used to learn a real-valued vector representation for a predefined fixed-sized vocabulary from a corpus of Twitter data.In order to further improve the performance of the Expectation-Maximization(EM)iteration algorithm,a novel initialization method based on the authority values of the tweets is also proposed in this paper to detect live events efficiently and precisely.Finally,a novel automatic identification method based on the cosine measure is used to automatically evaluate whether a given topic can form a live event.Experiments conducted on a real-world dataset demonstrate that the ED-SWE model exhibits better efficiency and accuracy than several state-of-art event detection models.
文摘With the rapid development of higher education, more and more people are entitled doctoral or master’s degrees, resulting in the considerable increase in doctoral dissertations and master’s theses. The application of IT technology renders the possibility to digitize those dissertations, which has contributed a lot to the construction of digital library. The characteristics and corresponding problems like property right, security and sharing will be discussed in this paper. And the paper makes a general introduction will be made to the specific ways adopted by the library of UESTC and its distribution of digitized dissertation resources.
文摘随着内地和台湾地区交流的日益密切和频繁,加强两岸术语研究工作的交流与互鉴变得尤为重要。文章对台湾地区术语建设的管理结构、历时发展、已有成果,两岸共同编纂术语工具书的合作成果,“乐词网”术语搜索及资源在线平台,两岸共同建设的“中华语文知识库”及其他语料库进行了详细介绍和全面梳理。对台湾地区在Web of Science(WOS)核心合集数据库中与术语相关的研究进行了主题抽样分析,借助文献计量学工具VOSviewer进行了可视化呈现。揭示了台湾地区学者在国际核心期刊上发表的术语相关研究的发展趋势和热点议题。以期为众多两岸术语研究者、语言爱好者提供研究与学习的素材和途径,助力两岸学者的沟通与合作,并确定未来协作努力的方向,也为两岸的术语建设、制定科技发展战略提供有益的参考和支撑。
基金This paper was supported by the National Science Foundation of China (Grant No.61370170) and Heilongjiang Education Planning Projects (Grant No.14G116).
文摘MOOC (Massive Open Online Courses) has become more and more popular all over the world in recent years. However, search engines, such as Google, Baidu, Yahoo and Bing, do not support specialized MOOC courses searching. The purpose of this demo is to present a vertical search engine designed to retrieve MOOC courses for learner. The demo search engine obtains MOOC web pages by a focused Crawler. And the pages are parsed into structure or unstructure data with a modeling-based Parser. Then the Indexer build index for the data by Lucene. Finally, the extraction MOOC list is made by Course_ranking and Retrieval. The demo search engine is accessible at http://www.MOOCsoso.com.