提出了基于主题聚类的Web资源个性化推荐算法PRWRTC(personalized recommendations of Web resource based on Topic clustering),该算法首先基于Web资源的主题提出Web资源聚类算法,从而提高Web资源聚类的精确度,进而提升Web资源推荐的...提出了基于主题聚类的Web资源个性化推荐算法PRWRTC(personalized recommendations of Web resource based on Topic clustering),该算法首先基于Web资源的主题提出Web资源聚类算法,从而提高Web资源聚类的精确度,进而提升Web资源推荐的准确度;然后,基于用户的浏览行为,提出实时获取用户偏好的算法;最后,针对用户偏好的动态演化,在算法中加入了时效的概念,实现了对Web资源的动态推荐.并通过实验验证了该算法的有效性.展开更多
为了解近五年图书情报学在SCI优质期刊中的研究主流及研究热点,提供借鉴及潜在研究方向,本研究提取Web of Science中2018年Journal Citation Reports排名靠前的期刊上发表的论文及其引文数据,应用“DEAN”数据清洗流程,借助CiteSpace软...为了解近五年图书情报学在SCI优质期刊中的研究主流及研究热点,提供借鉴及潜在研究方向,本研究提取Web of Science中2018年Journal Citation Reports排名靠前的期刊上发表的论文及其引文数据,应用“DEAN”数据清洗流程,借助CiteSpace软件进行分析并绘制可视化图谱,分别从发文机构、著者、研究热点等角度对国际图书情报学研究状态及成果进行分析识别,分析各聚类的代表文章,归纳领域主流研究热点。展开更多
Conceptual clustering is mainly used for solving the deficiency and incompleteness of domain knowledge. Based on conceptual clustering technology and aiming at the institutional framework and characteristic of Web the...Conceptual clustering is mainly used for solving the deficiency and incompleteness of domain knowledge. Based on conceptual clustering technology and aiming at the institutional framework and characteristic of Web theme information, this paper proposes and implements dynamic conceptual clustering algorithm and merging algorithm for Web documents, and also analyses the super performance of the clustering algorithm in efficiency and clustering accuracy. Key words conceptual clustering - clustering center - dynamic conceptual clustering - theme - web documents clustering CLC number TP 311 Foundation item: Supported by the National “863” Program of China (2002AA111010, 2003AA001032)Biography: WANG Yun-hua(1979-), male, Master candidate, research direction: knowledge engineering and data mining.展开更多
Document classification is widely applied in many scientific areas and academic environments, using NLP techniques and term extraction algorithms like CValue, TfIdf, TermEx, GlossEx, Weirdness and the others like. Nev...Document classification is widely applied in many scientific areas and academic environments, using NLP techniques and term extraction algorithms like CValue, TfIdf, TermEx, GlossEx, Weirdness and the others like. Nevertheless, they mainly have weaknesses in extracting most important terms when input text has not been rectified grammatically, or even has non-alphabetic methodical and math or chemical notations, and cross-domain inference of terms and phrases. In this paper, we propose a novel Text-Categorization and Term-Extraction method based on human-expert choice of classified categories. Papers are the training phase substances of the proposed algorithm. They have been already labeled with some scientific pre-defined field specific categories, by a human expert, especially one with high experiences and researches and surveys in the field. Our approach thereafter extracts (concept) terms of the labeled papers of each category and assigns all to the category. Categorization of test papers is then applied based on their extracted terms and further comparing with each category’s terms. Besides, our approach will produce semantic enabled outputs that are useful for many goals such as knowledge bases and data sets complement of the Linked Data cloud and for semantic querying of them by some languages such as SparQL. Besides, further finding classified papers’ gained topic or class will be easy by using URIs contained in the ontological outputs. The experimental results, comparing LPTC with five well-known term extraction algorithms by measuring precision and recall, show that categorization effectiveness can be achieved using our approach. In other words, the method LPTC is significantly superior to CValue, TfIdf, TermEx, GlossEx and Weirdness in the target study. As well, we conclude that higher number of papers for training, even higher precision we have.展开更多
以Web of Science为数据源,采用共词分析、聚类分析、战略坐标分析对1994-2013年国际电子学习研究文献的主题演化进行可视化分析。结果表明,协作学习研究、电子学习评价研究从边缘研究领域演化为核心研究领域,电子医疗教育研究和在线培...以Web of Science为数据源,采用共词分析、聚类分析、战略坐标分析对1994-2013年国际电子学习研究文献的主题演化进行可视化分析。结果表明,协作学习研究、电子学习评价研究从边缘研究领域演化为核心研究领域,电子医疗教育研究和在线培训研究逐渐走向成熟,电子学习系统、学习效能、电子学习环境等领域是潜在热点研究主题,技术与行为导向融合是国际电子学习研究的未来方向。展开更多
文摘提出了基于主题聚类的Web资源个性化推荐算法PRWRTC(personalized recommendations of Web resource based on Topic clustering),该算法首先基于Web资源的主题提出Web资源聚类算法,从而提高Web资源聚类的精确度,进而提升Web资源推荐的准确度;然后,基于用户的浏览行为,提出实时获取用户偏好的算法;最后,针对用户偏好的动态演化,在算法中加入了时效的概念,实现了对Web资源的动态推荐.并通过实验验证了该算法的有效性.
文摘为了解近五年图书情报学在SCI优质期刊中的研究主流及研究热点,提供借鉴及潜在研究方向,本研究提取Web of Science中2018年Journal Citation Reports排名靠前的期刊上发表的论文及其引文数据,应用“DEAN”数据清洗流程,借助CiteSpace软件进行分析并绘制可视化图谱,分别从发文机构、著者、研究热点等角度对国际图书情报学研究状态及成果进行分析识别,分析各聚类的代表文章,归纳领域主流研究热点。
文摘Conceptual clustering is mainly used for solving the deficiency and incompleteness of domain knowledge. Based on conceptual clustering technology and aiming at the institutional framework and characteristic of Web theme information, this paper proposes and implements dynamic conceptual clustering algorithm and merging algorithm for Web documents, and also analyses the super performance of the clustering algorithm in efficiency and clustering accuracy. Key words conceptual clustering - clustering center - dynamic conceptual clustering - theme - web documents clustering CLC number TP 311 Foundation item: Supported by the National “863” Program of China (2002AA111010, 2003AA001032)Biography: WANG Yun-hua(1979-), male, Master candidate, research direction: knowledge engineering and data mining.
文摘Document classification is widely applied in many scientific areas and academic environments, using NLP techniques and term extraction algorithms like CValue, TfIdf, TermEx, GlossEx, Weirdness and the others like. Nevertheless, they mainly have weaknesses in extracting most important terms when input text has not been rectified grammatically, or even has non-alphabetic methodical and math or chemical notations, and cross-domain inference of terms and phrases. In this paper, we propose a novel Text-Categorization and Term-Extraction method based on human-expert choice of classified categories. Papers are the training phase substances of the proposed algorithm. They have been already labeled with some scientific pre-defined field specific categories, by a human expert, especially one with high experiences and researches and surveys in the field. Our approach thereafter extracts (concept) terms of the labeled papers of each category and assigns all to the category. Categorization of test papers is then applied based on their extracted terms and further comparing with each category’s terms. Besides, our approach will produce semantic enabled outputs that are useful for many goals such as knowledge bases and data sets complement of the Linked Data cloud and for semantic querying of them by some languages such as SparQL. Besides, further finding classified papers’ gained topic or class will be easy by using URIs contained in the ontological outputs. The experimental results, comparing LPTC with five well-known term extraction algorithms by measuring precision and recall, show that categorization effectiveness can be achieved using our approach. In other words, the method LPTC is significantly superior to CValue, TfIdf, TermEx, GlossEx and Weirdness in the target study. As well, we conclude that higher number of papers for training, even higher precision we have.
文摘以Web of Science为数据源,采用共词分析、聚类分析、战略坐标分析对1994-2013年国际电子学习研究文献的主题演化进行可视化分析。结果表明,协作学习研究、电子学习评价研究从边缘研究领域演化为核心研究领域,电子医疗教育研究和在线培训研究逐渐走向成熟,电子学习系统、学习效能、电子学习环境等领域是潜在热点研究主题,技术与行为导向融合是国际电子学习研究的未来方向。
基金国家自然科学基金(the National Natural Science Foundation of China under Grant No.60475022) 山西省自然科学基金(the NaturalScience Foundation of Shanxi Province of China under Grant No.20041041)山西省留学回国人员基金项目(No.2002004)。