摘要
提出了基于前后文n-gram模型的古汉语句子切分算法,该算法能够在数据稀疏的情况下,通过收集上下文信息,对切分位置进行比较准确的预测,从而较好地处理小规模训练语料的情况,降低数据稀疏对切分准确率的影响。采用《论语》对所提出的算法进行了句子切分实验,达到了81%的召回率和52%的准确率。
An algorithm of punctuating the sentences in archaic Chinese language based on context n-gram model is proposed in the paper. The algorithm can make comparatively accurate prediction of the punctuating-positions of the text under data-sparse instances by collecting and calculating context information to better analyze small-scaled corpus and meanwhile, to bring down the effects of the data-sparse plight on the global accuracy. At last, the paper selects the analects of Confucius ( Lunyu ) to test the algorithm introduced, and the results show that the recall and the precision achieve 81% and 52% respectively.
出处
《计算机工程》
CAS
CSCD
北大核心
2007年第3期192-193,196,共3页
Computer Engineering
基金
国家自然科学基金资助项目(60073046)
高等学校博士学科点专项科研基金"SRFDP"资助项目(20020610007)