Improving the Collocation Extraction Method Using an Untagged Corpus for Persian Word Sense Disambiguation

Improving the Collocation Extraction Method Using an Untagged Corpus for Persian Word Sense Disambiguation

下载PDF

导出

摘要 Word sense disambiguation is used in many natural language processing fields. One of the ways of disambiguation is the use of decision list algorithm which is a supervised method. Supervised methods are considered as the most accurate machine learning algorithms but they are strongly influenced by knowledge acquisition bottleneck which means that their efficiency depends on the size of the tagged training set, in which their preparation is difficult, time-consuming and costly. The proposed method in this article improves the efficiency of this algorithm where there is a small tagged training set. This method uses a statistical method for collocation extraction from a big untagged corpus. Thus, the more important collocations which are the features used for creation of learning hypotheses will be identified. Weighting the features improves the efficiency and accuracy of a decision list algorithm which has been trained with a small training corpus. Word sense disambiguation is used in many natural language processing fields. One of the ways of disambiguation is the use of decision list algorithm which is a supervised method. Supervised methods are considered as the most accurate machine learning algorithms but they are strongly influenced by knowledge acquisition bottleneck which means that their efficiency depends on the size of the tagged training set, in which their preparation is difficult, time-consuming and costly. The proposed method in this article improves the efficiency of this algorithm where there is a small tagged training set. This method uses a statistical method for collocation extraction from a big untagged corpus. Thus, the more important collocations which are the features used for creation of learning hypotheses will be identified. Weighting the features improves the efficiency and accuracy of a decision list algorithm which has been trained with a small training corpus.

作者 Noushin Riahi Fatemeh Sedghi Noushin Riahi;Fatemeh Sedghi(Computer Engineering Department, Alzahra University, Tehran, Iran)

机构地区 Computer Engineering Department

出处《Journal of Computer and Communications》 2016年第4期109-124,共16页 电脑和通信（英文）

关键词 Collocation Extraction Word Sense Disambiguation Untagged Corpus Decision List Collocation Extraction Word Sense Disambiguation Untagged Corpus Decision List

分类号 TP3 [自动化与计算机技术—计算机科学与技术]

引文网络
相关文献

1James E. Mamadu,Ignatius N. Njoseh.Tau-Collocation Approximation Approach for Solving First and Second Order Ordinary Differential Equations[J].Journal of Applied Mathematics and Physics,2016,4(2):383-390.
2Osama M. Rababah,Ahmad K. Hwaitat,Dana A. Al Qudah,Rula Halaseh.Hybrid Algorithm to Evaluate E-Business Website Comments[J].Communications and Network,2016,8(3):137-143.
3Hyacinthe Konan,Bi Tra Gooré,Raymond Gbégbé,Olivier Asseu.Morpho-Syntactic Tagging of Text in “Baoule” Language Based on Hidden Markov Models (HMM)[J].Journal of Software Engineering and Applications,2016,9(10):516-523.
4Hua Wu,Yunzhen Zhu,Hailu Wang,Lingfang Xu.A Domain Decomposition Chebyshev Spectral Collocation Method for Volterra Integral Equations[J].Journal of Mathematical Study,2018,51(1):57-75. 被引量：1
5LIU Hui,LIU Ying-liang.A Corpus-based Analysis of Keywords in Chinese Government Work Reports and State of the Union from the Perspective of Political Linguistics[J].Journal of Literature and Art Studies,2022,12(5):462-468.
6Peter Y. P. Chen.The Lanczos-Chebyshev Pseudospectral Method for Solution of Differential Equations[J].Applied Mathematics,2016,7(9):927-938. 被引量：2
7Li-Lian Wang.A Review of Prolate Spheroidal Wave Functions from the Perspective of Spectral Methods[J].Journal of Mathematical Study,2017,50(2):101-143. 被引量：2
8司博文,孔芳.对话中融入丰富历史信息的回应选择[J].中文信息学报,2022,36(5):85-93.
9Aronee Dasgupta,Roopa Nagaraj,K. Nagamani.An Internet of Things Platform with Google Eddystone Beacons[J].Journal of Software Engineering and Applications,2016,9(6):291-295.
10Rizwan Ali Naqvi,Muhammad Adnan Khan,Nauman Malik,Shazia Saqib,Tahir Alyas,Dildar Hussain.Roman Urdu News Headline Classification Empowered with Machine Learning[J].Computers, Materials & Continua,2020(11):1221-1236. 被引量：2

Journal of Computer and Communications

2016年第4期

浏览历史

内容加载中请稍等...

Improving the Collocation Extraction Method Using an Untagged Corpus for Persian Word Sense Disambiguation

相关作者

相关机构

相关主题

浏览历史