Text alignment is crucial to the accuracy of MT (Machine Translation) systems, some NLP (Natural Language Processing) tools or any other text processing tasks requiring bilingual data. This research proposes a lan...Text alignment is crucial to the accuracy of MT (Machine Translation) systems, some NLP (Natural Language Processing) tools or any other text processing tasks requiring bilingual data. This research proposes a language independent sentence alignment approach based on Polish (not position-sensitive language) to English experiments. This alignment approach was developed on the TED (Translanguage English Database) talks corpus, but can be used for any text domain or language pair. The proposed approach implements various heuristics for sentence recognition. Some of them value synonyms and semantic text structure analysis as a part of additional information. Minimization of data loss was ensured. The solution is compared to other sentence alignment implementations. Also an improvement in MT system score with text processed with the described tool is shown.展开更多
The state of the art of earthen architecture and vernacular built heritage comprises a complex set of issues that range from fundamental problematic recognition to anthropological and cultural studies and,more recentl...The state of the art of earthen architecture and vernacular built heritage comprises a complex set of issues that range from fundamental problematic recognition to anthropological and cultural studies and,more recently,to technological and experimental analyses.This paper addresses the development of the feld,following the milestones of the international literature and pursuing a refective-theory approach within a historical framework.It aims to explore the main contributions that have enhanced vernacular heritage and earthen architecture as specifc domains,from pioneering public awareness essays to institutional expertise guidelines.Finally,in addition to the literature review process,this paper considers the recent corpus of recommendations from conservation management reference institutions,the updating of the operative problematic of earthen vernacular built heritage,and the relevance of local community involvement in facing increasing challenges.展开更多
文摘Text alignment is crucial to the accuracy of MT (Machine Translation) systems, some NLP (Natural Language Processing) tools or any other text processing tasks requiring bilingual data. This research proposes a language independent sentence alignment approach based on Polish (not position-sensitive language) to English experiments. This alignment approach was developed on the TED (Translanguage English Database) talks corpus, but can be used for any text domain or language pair. The proposed approach implements various heuristics for sentence recognition. Some of them value synonyms and semantic text structure analysis as a part of additional information. Minimization of data loss was ensured. The solution is compared to other sentence alignment implementations. Also an improvement in MT system score with text processed with the described tool is shown.
文摘The state of the art of earthen architecture and vernacular built heritage comprises a complex set of issues that range from fundamental problematic recognition to anthropological and cultural studies and,more recently,to technological and experimental analyses.This paper addresses the development of the feld,following the milestones of the international literature and pursuing a refective-theory approach within a historical framework.It aims to explore the main contributions that have enhanced vernacular heritage and earthen architecture as specifc domains,from pioneering public awareness essays to institutional expertise guidelines.Finally,in addition to the literature review process,this paper considers the recent corpus of recommendations from conservation management reference institutions,the updating of the operative problematic of earthen vernacular built heritage,and the relevance of local community involvement in facing increasing challenges.