摘要
互联网上免费的中文学术资源数量庞大,但实用的专门检索工具仍然或缺。文章以网络中文学术文献的识别与检索为目标,调查分析网络学术文献网页的特征,并以非学术文献网页作为参照,验证所发现特征的可靠性。研究结果显示,学术文献网页在关键词词频、链接数量和相关链接比例等特征方面与非学术文献网页具有明显差别,差异程度都大于75%,属于程度明显,能较好地用于区分学术文献网页与非学术文献网页,为今后系统开发学术文献网页的自动化识别工具提供了依据和理论支持。
Many high-valued free academic papers on the web are obscured by the large amount of other types of information because of the lack of a more pratical and specific tool for the retrieval of this type of documents.The study analyzes the characteristics of academic document pages and verifies the differences between the academic document pages and the non-academic pages.It found that the academic document pages could be distinguished from other web pages by their differences in keyword frequency,page total links and the proportion of related links.Those differences may be applied in the automatic identification of Chinese academic papers on the web.
出处
《图书馆论坛》
CSSCI
北大核心
2011年第6期178-185,共8页
Library Tribune
基金
国家社会科学基金项目(2010-2012年)"网络中文学术文献的自动识别与检索研究--基于学术文献文体
链接及图文相关度的研究与系统开发"(项目编号:10BTQ049)研究成果之一
关键词
网络文献
学术文献
网页特征
信息检索
web document
academic paper
characteristics of web page
information retrieval