目前大多数的Deep Web信息抽取方法依赖Web页面结构,忽略了页面中包含的语义信息及关系,导致抽取结果不理想.针对此问题,提出一种基于领域本体的Deep Web实体信息后处理方法.首先,根据DOM树节点相似性原理和VSM(Vector Space Model)...目前大多数的Deep Web信息抽取方法依赖Web页面结构,忽略了页面中包含的语义信息及关系,导致抽取结果不理想.针对此问题,提出一种基于领域本体的Deep Web实体信息后处理方法.首先,根据DOM树节点相似性原理和VSM(Vector Space Model)的余弦值方法确定数据区域和实体区域;然后,依据数据区域和实体区域的概念和实例构建领域本体,在领域本体的指导下对实体进行语义标注,将量化的标注结果添加到实体与本体的相似度计算中;最后,提出基于领域本体的实体信息抽取算法,获得实体中与本体相似度最大的子树.选取天气、图书、购物网站数据进行测试,实验结果表明,与已有方法相比,所提方法的F值提高了3.6%~4.9%.该方法不仅能减少抽取信息时对Web页面结构的依赖,而且能充分利用页面中的语义信息和关系,使得抽取结果更精确.展开更多
The transformer-2(tra-2) gene plays a key role in the regulatory hierarchy of sexual differentiation in somatic tissues and in the germline of Drosophila melanogaster.In this study,sequences and expression profiles of...The transformer-2(tra-2) gene plays a key role in the regulatory hierarchy of sexual differentiation in somatic tissues and in the germline of Drosophila melanogaster.In this study,sequences and expression profiles of tra-2 in the Chinese mitten crab Eriocheir sinensis were characterized.Four tra-2 isoforms,designated as Estra-2a,Estra-2b,Estra-2c,and Estra-2d,were isolated.They all contained an RNA-recognition motif(RRM) and a linker region,which shared high similarity with other reported tra-2s.Sequence analysis revealed that Estra-2a,Estra-2b and Estra-2c are encoded by the same genomic locus and are generated by alternative splicing of the pre-mRNA.Compared with the other three isoforms,Estra-2d lacks the RS2 domain.Quantitative real-time PCR showed that all four isoforms were highly expressed in the fertilized egg,and in the 2-4 cell and blastula stages compared with larval stages(P<0.01),suggesting their maternal origin in early embryonic developmental stages.Notably,Estra-2a was highly expressed in male somatic tissues,while Estra-2c was significantly highly expressed in the ovary.These results suggest that Estra-2c is involved in sexual differentiation of the Chinese mitten crab.Our findings provide basic information for further functional studies of the tra-2 gene/protein in this species.展开更多
文摘目前大多数的Deep Web信息抽取方法依赖Web页面结构,忽略了页面中包含的语义信息及关系,导致抽取结果不理想.针对此问题,提出一种基于领域本体的Deep Web实体信息后处理方法.首先,根据DOM树节点相似性原理和VSM(Vector Space Model)的余弦值方法确定数据区域和实体区域;然后,依据数据区域和实体区域的概念和实例构建领域本体,在领域本体的指导下对实体进行语义标注,将量化的标注结果添加到实体与本体的相似度计算中;最后,提出基于领域本体的实体信息抽取算法,获得实体中与本体相似度最大的子树.选取天气、图书、购物网站数据进行测试,实验结果表明,与已有方法相比,所提方法的F值提高了3.6%~4.9%.该方法不仅能减少抽取信息时对Web页面结构的依赖,而且能充分利用页面中的语义信息和关系,使得抽取结果更精确.
基金Supported by the National High Technology Research and Development Program of China(863 Program)(No.2012AA10A409)the Scientific and Technological Innovation Project of Qingdao National Laboratory for Marine Science and Technology(No.2015ASKJ02)
文摘The transformer-2(tra-2) gene plays a key role in the regulatory hierarchy of sexual differentiation in somatic tissues and in the germline of Drosophila melanogaster.In this study,sequences and expression profiles of tra-2 in the Chinese mitten crab Eriocheir sinensis were characterized.Four tra-2 isoforms,designated as Estra-2a,Estra-2b,Estra-2c,and Estra-2d,were isolated.They all contained an RNA-recognition motif(RRM) and a linker region,which shared high similarity with other reported tra-2s.Sequence analysis revealed that Estra-2a,Estra-2b and Estra-2c are encoded by the same genomic locus and are generated by alternative splicing of the pre-mRNA.Compared with the other three isoforms,Estra-2d lacks the RS2 domain.Quantitative real-time PCR showed that all four isoforms were highly expressed in the fertilized egg,and in the 2-4 cell and blastula stages compared with larval stages(P<0.01),suggesting their maternal origin in early embryonic developmental stages.Notably,Estra-2a was highly expressed in male somatic tissues,while Estra-2c was significantly highly expressed in the ovary.These results suggest that Estra-2c is involved in sexual differentiation of the Chinese mitten crab.Our findings provide basic information for further functional studies of the tra-2 gene/protein in this species.