期刊文献+

在未分类英文文档集中挖掘相关词的方法 被引量:1

Approach for mining associative terms in uncategorized English documents set
下载PDF
导出
摘要 在搜索引擎结果相关性判断、文字语音转换与识别等领域中,如何准确地分析单词之间的搭配关系是主要研究问题之一。利用互联网中的海量信息,在对大量英文网页进行统计分析的基础上,利用单词的出现频率和单词对的共现频率归纳总结出了未分类互联网页面中单词相关程度判定的经验性结论,提出了一种基于文档集统计分析的单词相关程度排序方法和计算公式,并根据该方法实现了分布式的英文单词相关性挖掘系统的原型。 In the improvement of search engine result,voices recognize fields,how to analyze the relationship between two words exactly is a key point.To analyze and solve this problem,some experiment conclusions are proposed by statistics of frequency of reims and concurrency terms on the basis of considerable English web pages.According to the conclusions,an approach is addressed to calculate ranks of associative terms and a distributed proto-type system is implemented.
作者 付仲恺 秦华
出处 《计算机工程与应用》 CSCD 北大核心 2009年第5期151-153,163,共4页 Computer Engineering and Applications
关键词 数据挖掘 网页分类 关联规则 排序算法 文本表示 data mining web-page classification association rules sort algorithm text representation
  • 相关文献

参考文献8

  • 1Kou Huaizhong,Gardarin G.Similarity model and term association for document categorization[C]//Proceedings of the 13th International Workshop on Database and Expert Systems Applications.Washington,DC,USA:IEEE Computer Society,2002:256-260.
  • 2..中文词语搭配库[DB/OL]..Sogou搜狗http://www.sogou.com/labs,dl/r.html,,(2006-10)..
  • 3孙宏林 黄昌宁.词语搭配在文本中的分布特征[C]..1998中文信息处理国际会议论文集[C].清华大学出版社,1998.67-72.
  • 4Lo Tsz-wai R,He B,Ounis I.Automatically building a stopword list for an information retrieval system[J].Journal on Digital Information Management:Special Issue on the 5th Dutch-Belgian In- formation Retrieval Workshop(DIR'05),2005.
  • 5Hegaret P L,Wood L,Robie J.What is the document object model? [EB/OL]. ( 2000 ).http://www.w3 .org/TR/DOM -Level -3 -Core/introduction.html.
  • 6Microsoft.Platform SDK document indexing service,IWordBreaker [M/CD].2005.
  • 7Porter M F.An algorithm for suffix stripping[DB/OL].(1998).http:// www.tart arus.org/martin/Port erStemmer.
  • 8The Information Retrieval Group,University of Glasgow.Stop word list of English [EB/OL].http ://www.dcs.gla.ac.uk/idom/ir_resources/ linguistic_utils/stop_words.

共引文献8

同被引文献9

  • 1涂新辉,何婷婷,罗景.一种全文检索系统的设计与实现[J].计算机工程,2005,31(17):55-57. 被引量:13
  • 2Gong CH, Chen C, Zhang JB, Huang RH. The developmentprocess of e-Textbook for K-12 schools in South Korea andits inspiration to China. ICECE2011, 2011 Int. Conf. onElectrical and Control Engineering, Piscataway: IEEEConference Publications. 2011. 6879-6883.
  • 3Davidson AL, Carliner S. Characteristics of effectivee-textbooks: Lessons from the literature. IPCC2013, 20131EEE International Professional Communication Conference,Piscataway: IEEE Conference Publications. 2013. 479-80.
  • 4Lai JY, Ulhas KR. Understanding acceptance of dedicatede-textbook applications for learning involving Taiwaneseuniversity students. Electronic Library, 2012, 3(30):321-338.
  • 5Cristy T. Developing a plug-in tool to make OneNote anE-textbook. 2012 2nd Workshop on Developing Tools asPlug-ins, Piscataway. IEEE Conference Publications. 2012.84-85.
  • 6Karen S, Jones PW. Reading in Information Retrieval.Morgan Kaufinann Publishers, 1997: 589.
  • 7邓攀,刘功申.一种高效的倒排索引存储结构[J].计算机工程与应用,2008,44(31):149-152. 被引量:22
  • 8冯进,丁博,史殿习,张瞩熹,许凯.XML解析技术研究[J].计算机工程与科学,2009,31(2):120-124. 被引量:59
  • 9张维刚,徐永东,雷小强,何辉.Web全文检索中间件的设计与应用[J].计算机应用,2011,31(8):2261-2264. 被引量:2

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部