摘要
挖掘专家主页中的信息具有重要的研究意义,因此如何描述专家主页的特征去识别实体内容成为挖掘过程中最为关键的一步。文章对专家主页中的主要信息块进行划分,介绍了识别信息块的主要方法。利用Dreamweaver软件对2 000个专家主页进行标注,然后利用文本特征、视觉特征以及结构特征来提取专家主页中专家基本信息、研究兴趣、研究项目和出版物信息的特征,进行特征构建。
Mining information in the expert homepage has important research significance, therefore how to describe the characteristics of the expert homepage to identify the entity content is the most critical step in the mining process. In this paper, the main message blocks of the expert homepage are divided, and the main methods of identifying the message blocks are introduced. The paper marks the 2 000 expert homepages with Dreamweaver software, and then uses the text features, visual features and structural characteristics to extract the basic information, research interest, research projects and publication information of the expert in the expert homepage to complete characteristics construction.
出处
《情报理论与实践》
CSSCI
北大核心
2013年第10期109-113,共5页
Information Studies:Theory & Application
关键词
专家主页
信息特征
信息提取
研究方法
expert homepage
information characteristics
information extraction
research method