To overcome the limitation of the traditional clustering algorithms which fail to produce meaningful clusters in high-dimensional, sparseness and binary value data sets, a new method based on hypergraph model is propo...To overcome the limitation of the traditional clustering algorithms which fail to produce meaningful clusters in high-dimensional, sparseness and binary value data sets, a new method based on hypergraph model is proposed. The hypergraph model maps the relationship present in the original data in high dimensional space into a hypergraph. A hyperedge represents the similarity of attrlbute-value distribution between two points. A hypergraph partitioning algorithm is used to find a partitioning of the vertices such that the corresponding data items in each partition are highly related and the weight of the hyperedges cut by the partitioning is minimized. The quality of the clustering result can be evaluated by applying the intra-cluster singularity value. Analysis and experimental results have demonstrated that this approach is applicable and effective in wide ranging scheme.展开更多
在许多KDD(knowledge discovery in databases)应用中,如电子商务中的欺诈行为监测,例外情况或离群点的发现比常规知识的发现更有意义.现有的离群点发现大多是针对数值属性的,而且这些方法只能发现离群点不能对其含义进行解释.提出了一...在许多KDD(knowledge discovery in databases)应用中,如电子商务中的欺诈行为监测,例外情况或离群点的发现比常规知识的发现更有意义.现有的离群点发现大多是针对数值属性的,而且这些方法只能发现离群点不能对其含义进行解释.提出了一种基于超图模型的离群点(outlier)定义,这一定义既体现了“局部”的概念能很好地解释离群点的含义.同时给出了HOT(hypergraph-based outlier test)算法,通过计算每个点的支持度、隶属度和规模偏差来检测离群点.该算法既能够处理数值属性,又能够处理类别属性.分析表明,该算法能有效地发现高维空间数据中的离群点.展开更多
大数据产品(Big Data Product,BDP)在原材料、用户需求、加工工艺等方面具有不同于实体产品的特征,而现有BDP生产系统的研究仍停留在概念模型阶段。为了解决该问题,提出BDP生产线的概念,基于生产线特征研究了生产线决策要素,强调了质量...大数据产品(Big Data Product,BDP)在原材料、用户需求、加工工艺等方面具有不同于实体产品的特征,而现有BDP生产系统的研究仍停留在概念模型阶段。为了解决该问题,提出BDP生产线的概念,基于生产线特征研究了生产线决策要素,强调了质量作为关键决策要素在BDP生产中的作用机理;采用超图理论建立了嵌入质量、质量传递函数和质量聚集函数的BDP生产系统模型,设计了BDP生产线决策流程;提出了供给侧稳定和需求侧稳定的BDP生产线决策模式。实例验证结果表明,所提出的模型和决策方法能够满足用户对BDP质量的要求。展开更多
文摘To overcome the limitation of the traditional clustering algorithms which fail to produce meaningful clusters in high-dimensional, sparseness and binary value data sets, a new method based on hypergraph model is proposed. The hypergraph model maps the relationship present in the original data in high dimensional space into a hypergraph. A hyperedge represents the similarity of attrlbute-value distribution between two points. A hypergraph partitioning algorithm is used to find a partitioning of the vertices such that the corresponding data items in each partition are highly related and the weight of the hyperedges cut by the partitioning is minimized. The quality of the clustering result can be evaluated by applying the intra-cluster singularity value. Analysis and experimental results have demonstrated that this approach is applicable and effective in wide ranging scheme.
文摘大数据产品(Big Data Product,BDP)在原材料、用户需求、加工工艺等方面具有不同于实体产品的特征,而现有BDP生产系统的研究仍停留在概念模型阶段。为了解决该问题,提出BDP生产线的概念,基于生产线特征研究了生产线决策要素,强调了质量作为关键决策要素在BDP生产中的作用机理;采用超图理论建立了嵌入质量、质量传递函数和质量聚集函数的BDP生产系统模型,设计了BDP生产线决策流程;提出了供给侧稳定和需求侧稳定的BDP生产线决策模式。实例验证结果表明,所提出的模型和决策方法能够满足用户对BDP质量的要求。