In this paper, we propose a rule management system for data cleaning that is based on knowledge. This system combines features of both rule based systems and rule based data cleaning frameworks. The important advantag...In this paper, we propose a rule management system for data cleaning that is based on knowledge. This system combines features of both rule based systems and rule based data cleaning frameworks. The important advantages of our system are threefold. First, it aims at proposing a strong and unified rule form based on first order structure that permits the representation and management of all the types of rules and their quality via some characteristics. Second, it leads to increase the quality of rules which conditions the quality of data cleaning. Third, it uses an appropriate knowledge acquisition process, which is the weakest task in the current rule and knowledge based systems. As several research works have shown that data cleaning is rather driven by domain knowledge than by data, we have identified and analyzed the properties that distinguish knowledge and rules from data for better determining the most components of the proposed system. In order to illustrate our system, we also present a first experiment with a case study at health sector where we demonstrate how the system is useful for the improvement of data quality. The autonomy, extensibility and platform-independency of the proposed rule management system facilitate its incorporation in any system that is interested in data quality management.展开更多
文摘In this paper, we propose a rule management system for data cleaning that is based on knowledge. This system combines features of both rule based systems and rule based data cleaning frameworks. The important advantages of our system are threefold. First, it aims at proposing a strong and unified rule form based on first order structure that permits the representation and management of all the types of rules and their quality via some characteristics. Second, it leads to increase the quality of rules which conditions the quality of data cleaning. Third, it uses an appropriate knowledge acquisition process, which is the weakest task in the current rule and knowledge based systems. As several research works have shown that data cleaning is rather driven by domain knowledge than by data, we have identified and analyzed the properties that distinguish knowledge and rules from data for better determining the most components of the proposed system. In order to illustrate our system, we also present a first experiment with a case study at health sector where we demonstrate how the system is useful for the improvement of data quality. The autonomy, extensibility and platform-independency of the proposed rule management system facilitate its incorporation in any system that is interested in data quality management.
文摘面向未来5G和卫星网构成的空地高通量互联场景,为实现飞机着陆风险提前预警.首先基于统计与模型,建立了一套以多源运行实时数据为主,融合历史统计和专家知识的着陆预警体系;然后,针对现有研究计算结果滞后问题,先通过对ARJ21飞机着陆过程快速存取记录器(QAR)数据的聚类分析,将飞行员着陆操作模式分为4类,进而构建基于决策场理论的飞行员着陆操作模式预测模型,计算并讨论不同场景下、不同个性飞行员的着陆模式选择;在上述基础上,针对着陆过程的复杂性和不确定性,提出一种分层计算的置信规则库推理方法,融合定性与定量信息实现着陆动态风险评估和预警.最后,通过对“2020.10.16攀枝花跑道外接地事件”和“2010.8.2伊春空难”着陆过程的风险推理验证了预警方法的有效性,其中攀枝花事件提前预警时间可达13 s.