摘要
规国房系统是辅助政府和企业实现审批、办公的高效协同办公软件,多数情况在内网部署,使得用户获取行业外部资讯困难,现存系统也存在资讯更新慢、行业信息聚合性弱、海量资讯筛查困难等问题。本文利用网络爬虫技术有效解决内网用户获取外部信息渠道和时效性问题,同时融合互联网思维,根据用户行为数据建立用户兴趣模型,采用热度值倒排的方式解决用户冷启动和内容库数据量大的问题,利用TF-IDF关键字提取技术和余弦相似度算法实现用户兴趣和内容精准匹配,最终实现个性化资讯推荐。
The planning land and housing management system is an efficient and cooperative office software used for assist-ing the government and enterprises to approve and work. In most cases,the system is deployed on the Internet,which makes users have difficulty in accessing to outside information. Existing systems also have problems such as updating infor-mation slowly,polymerizing industrial information weakly,and seeking large amounts of information difficultly. The pa-per proposes a method to solve the problem that users obtaining external information and timeliness by the technology of Web crawler. It solves the problem that users are inactive and the data is large by combining Internet thinking and establish-ing users interest model according to users’ behavior data and using the way of heat value inversion. Moreover,the paper achieves accurate matching of user interest and content with the purpose of personalized recommendation of information by the technology of TF-IDF keyword extraction and the algorithm of cosine similarity.
作者
盛逍遥
吴友邦
王翔
李丽
SHENG Xiaoyao;WU Youbang;WANG Xiang;LI Li(Tianjin Binhai New Area PLRG Information Center,Tianjin 300450,China)
出处
《天津科技》
2018年第9期73-76,共4页
Tianjin Science & Technology
关键词
个性化资讯推荐
规国房系统
网络爬虫
TF-IDF
余弦相似度
personalized recommendation of information
planning land and housing management system
web spider
TF-IDF
cosine similarity