摘要
电网企业拥有大量蕴含着重要可靠性信息的设备缺陷文本,依靠人工进行挖掘不仅效率低而且准确性因人而异。以变压器缺陷文本为研究对象,通过分析文本的特点,建立了基于语义框架的电网缺陷文本挖掘模型,解决了缺陷文本句子成分难以划分、数字量无法精确提取等问题,为电网领域的非结构化数据挖掘提供了新技术。首先在建立本体词库基础上,对缺陷文本进行分词、词汇特征提取等预处理;然后定义了电力语义框架与语义槽,提出了槽填充和语义框架构建流程,并通过词串合并实现了本体字典自动完善;最后对缺陷文本挖掘结果在可靠性统计中的应用进行了研究。算例表明,所提出的挖掘技术应用于电网缺陷自动分类与统计中,具有可行性和有效性。
Power grid enterprises have large amounts of equipment defect texts in Chinese, containing important reliability information. It is of low efficiency and uncertain accuracy to mine information hiding behind the texts manually. Taking transformer defect texts as study object, after analyzing text characteristics, a defect text mining model is established based on semantic framework. The model provides a new technology for unstructured data mining in power grid domain because it solves problems of segmenting sentence elements of defect texts and extracting digital information precisely. Firstly, defect texts are pretreated based on established ontology thesaurus, such as segmentation and feature extraction. Then, power semantic framework and semantic slots are defined, process of slot-filling and semantic framework construction is raised, and ontology dictionary is auto-perfected by merging word series. Finally, application of defect text mining results in statistical reliability is studied. Example shows that the proposed mining technology is feasible and effective when applied to automatic classification and statistics of grid defect.
出处
《电网技术》
EI
CSCD
北大核心
2017年第2期637-643,共7页
Power System Technology
关键词
文本挖掘
语义框架
可靠性统计
缺陷文本
text mining
semantic framework
reliability statistics
defect text