先请看下面一组例子:①上海的股市已经历了几次大涨大跌……②80年前冰海沉船一女乘客遗书最近冲上海滩。③省教委宣布68所成人中专校为首批评估合格学校。以上每个句子中的画线部分都有三个以上的字。我们假设它们依次为 A、B、C、D、...先请看下面一组例子:①上海的股市已经历了几次大涨大跌……②80年前冰海沉船一女乘客遗书最近冲上海滩。③省教委宣布68所成人中专校为首批评估合格学校。以上每个句子中的画线部分都有三个以上的字。我们假设它们依次为 A、B、C、D、E……,则 A 和 B 可以组成 AB 一词(或词组,以下同),B 和 C 可以组成 BC 一词,C 和D可以组成 CD 一词,D 和 E 可以组成 DE一词……,这样,词语 AB 和 BC,BC 和 CD,展开更多
In order to improve Chinese overlapping ambiguity resolution based on a support vector machine, statistical features are studied for representing the feature vectors. First, four statistical parameters-mutual informat...In order to improve Chinese overlapping ambiguity resolution based on a support vector machine, statistical features are studied for representing the feature vectors. First, four statistical parameters-mutual information, accessor variety, two-character word frequency and single-character word frequency are used to describe the feature vectors respectively. Then other parameters are tried to add as complementary features to the parameters which obtain the best results for further improving the classification performance. Experimental results show that features represented by mutual information, single-character word frequency and accessor variety can obtain an optimum result of 94. 39%. Compared with a commonly used word probability model, the accuracy has been improved by 6. 62%. Such comparative results confirm that the classification performance can be improved by feature selection and representation.展开更多
文摘先请看下面一组例子:①上海的股市已经历了几次大涨大跌……②80年前冰海沉船一女乘客遗书最近冲上海滩。③省教委宣布68所成人中专校为首批评估合格学校。以上每个句子中的画线部分都有三个以上的字。我们假设它们依次为 A、B、C、D、E……,则 A 和 B 可以组成 AB 一词(或词组,以下同),B 和 C 可以组成 BC 一词,C 和D可以组成 CD 一词,D 和 E 可以组成 DE一词……,这样,词语 AB 和 BC,BC 和 CD,
文摘In order to improve Chinese overlapping ambiguity resolution based on a support vector machine, statistical features are studied for representing the feature vectors. First, four statistical parameters-mutual information, accessor variety, two-character word frequency and single-character word frequency are used to describe the feature vectors respectively. Then other parameters are tried to add as complementary features to the parameters which obtain the best results for further improving the classification performance. Experimental results show that features represented by mutual information, single-character word frequency and accessor variety can obtain an optimum result of 94. 39%. Compared with a commonly used word probability model, the accuracy has been improved by 6. 62%. Such comparative results confirm that the classification performance can be improved by feature selection and representation.