Named entity recognition(NER)is essential in many natural language processing(NLP)tasks such as information extraction and document classification.A construction document usually contains critical named entities,and a...Named entity recognition(NER)is essential in many natural language processing(NLP)tasks such as information extraction and document classification.A construction document usually contains critical named entities,and an effective NER method can provide a solid foundation for downstream applications to improve construction management efficiency.This study presents a NER method for Chinese construction documents based on conditional random field(CRF),including a corpus design pipeline and a CRF model.The corpus design pipeline identifies typical NER tasks in construction management,enables word-based tokenization,and controls the annotation consistency with a newly designed annotating specification.The CRF model engineers nine transformation features and seven classes of state features,covering the impacts of word position,part-of-speech(POS),and word/character states within the context.The F1-measure on a labeled construction data set is 87.9%.Furthermore,as more domain knowledge features are infused,the marginal performance improvement of including POS information will decrease,leading to a promising research direction of POS customization to improve NLP performance with limited data.展开更多
The purpose of this study is to suggest a way of optimized managing and sharing information between standard architectural drawings and construction documents in Korea architectural industry for automated code checkin...The purpose of this study is to suggest a way of optimized managing and sharing information between standard architectural drawings and construction documents in Korea architectural industry for automated code checking system by linked STEP and XML. To archive this purpose, the authors have analyzed current research and technical development for STEP and XML link and developed a prototype system for sharing information between model based drawings and XML based construction documents. Finally, the authors have suggested practical use scenario of sharing information through linked STEP and XML using test case of automatic code checking. In the paper, the possibility of constructing integrated architectural computing environment through exchange and sharing of drawing information and external data for the whole building life-cycle, from the conceptual design stage to the construction and maintenance stage has been examined. Automated code checking through linked STEP and XML could be enhanced through col-laboration business, more completed code, improved building performance, and reduced construction costs.展开更多
基金This work is supported by the National Natural Science Foundation of China(Grant No.71971196).
文摘Named entity recognition(NER)is essential in many natural language processing(NLP)tasks such as information extraction and document classification.A construction document usually contains critical named entities,and an effective NER method can provide a solid foundation for downstream applications to improve construction management efficiency.This study presents a NER method for Chinese construction documents based on conditional random field(CRF),including a corpus design pipeline and a CRF model.The corpus design pipeline identifies typical NER tasks in construction management,enables word-based tokenization,and controls the annotation consistency with a newly designed annotating specification.The CRF model engineers nine transformation features and seven classes of state features,covering the impacts of word position,part-of-speech(POS),and word/character states within the context.The F1-measure on a labeled construction data set is 87.9%.Furthermore,as more domain knowledge features are infused,the marginal performance improvement of including POS information will decrease,leading to a promising research direction of POS customization to improve NLP performance with limited data.
基金the Basic Research Program of the Korea & Engineering Foundation (No. R01-2001-000-00467-0)
文摘The purpose of this study is to suggest a way of optimized managing and sharing information between standard architectural drawings and construction documents in Korea architectural industry for automated code checking system by linked STEP and XML. To archive this purpose, the authors have analyzed current research and technical development for STEP and XML link and developed a prototype system for sharing information between model based drawings and XML based construction documents. Finally, the authors have suggested practical use scenario of sharing information through linked STEP and XML using test case of automatic code checking. In the paper, the possibility of constructing integrated architectural computing environment through exchange and sharing of drawing information and external data for the whole building life-cycle, from the conceptual design stage to the construction and maintenance stage has been examined. Automated code checking through linked STEP and XML could be enhanced through col-laboration business, more completed code, improved building performance, and reduced construction costs.