Finding out out-of-vocabulary words is an urgent and difficult task in Chinese words segmentation. To avoid the defect causing by offline training in the traditional method, the paper proposes an improved prediction b...Finding out out-of-vocabulary words is an urgent and difficult task in Chinese words segmentation. To avoid the defect causing by offline training in the traditional method, the paper proposes an improved prediction by partical match (PPM) segmenting algorithm for Chinese words based on extracting local context information, which adds the context information of the testing text into the local PPM statistical model so as to guide the detection of new words. The algorithm focuses on the process of online segmentatien and new word detection which achieves a good effect in the close or opening test, and outperforms some well-known Chinese segmentation system to a certain extent.展开更多
For China's information industry, the year 2006 is a milestone, and full of dreams, regrets, hopes, and confidence. The following incidents might become historic. TD-SCDMA special standard It is doubted by experts...For China's information industry, the year 2006 is a milestone, and full of dreams, regrets, hopes, and confidence. The following incidents might become historic. TD-SCDMA special standard It is doubted by experts, puffed by various media, boosted by manufacturers, and wondered by operators. TD-SCDMA is the basic rhythm of 3G technologies in China in 2006, and may be as展开更多
基金National Natural Science Foundation of China ( No.60903129)National High Technology Research and Development Program of China (No.2006AA010107, No.2006AA010108)Foundation of Fujian Province of China (No.2008F3105)
文摘Finding out out-of-vocabulary words is an urgent and difficult task in Chinese words segmentation. To avoid the defect causing by offline training in the traditional method, the paper proposes an improved prediction by partical match (PPM) segmenting algorithm for Chinese words based on extracting local context information, which adds the context information of the testing text into the local PPM statistical model so as to guide the detection of new words. The algorithm focuses on the process of online segmentatien and new word detection which achieves a good effect in the close or opening test, and outperforms some well-known Chinese segmentation system to a certain extent.
文摘For China's information industry, the year 2006 is a milestone, and full of dreams, regrets, hopes, and confidence. The following incidents might become historic. TD-SCDMA special standard It is doubted by experts, puffed by various media, boosted by manufacturers, and wondered by operators. TD-SCDMA is the basic rhythm of 3G technologies in China in 2006, and may be as