期刊论文结构化加工在期刊界已经逐步形成共识,国内期刊平台多采用新版期刊文章标签集(Journal Article Tag Suite,JATS)标准进行加工,但JATS标准仅对数据属性提出建议值,自行拓展空间较大,导致实际的数据加工结果千差万别,数据交换困...期刊论文结构化加工在期刊界已经逐步形成共识,国内期刊平台多采用新版期刊文章标签集(Journal Article Tag Suite,JATS)标准进行加工,但JATS标准仅对数据属性提出建议值,自行拓展空间较大,导致实际的数据加工结果千差万别,数据交换困难重重。本文分析了国内外数字化加工和标准进化的历程及我国在XML结构化数据加工中存在的问题,进一步分析了存档及交换标签集、出版标签集等不同子集的特点,提出既能完整保留论文原始信息,又便于提取各类结构化信息的数据加工及存储解决方案,可以根据需要通过减法转换生成符合各平台标准的数据加工存储格式,从而真正实现一次加工、多渠道投放和传播。展开更多
The label text is a very important tool for the automatic processing of language. It is used in several applications such as morphological and syntactic text analysis, index-ing, retrieval, finished networks determini...The label text is a very important tool for the automatic processing of language. It is used in several applications such as morphological and syntactic text analysis, index-ing, retrieval, finished networks deterministic (in which all combinations of words that are accepted by the grammar are listed) or by statistical grammars (e.g., an n-gram in which the probabilities of sequences of n words in a specific order are given), etc. In this article, we developed a morphosyntactic labeling system language “Baoule” using hidden Markov models. This will allow us to build a tagged reference corpus and rep-resent major grammatical rules faced “Baoule” language in general. To estimate the parameters of this model, we used a training corpus manually labeled using a set of morpho-syntactic labels. We then proceed to an improvement of the system through the re-estimation procedure parameters of this model.展开更多
文摘期刊论文结构化加工在期刊界已经逐步形成共识,国内期刊平台多采用新版期刊文章标签集(Journal Article Tag Suite,JATS)标准进行加工,但JATS标准仅对数据属性提出建议值,自行拓展空间较大,导致实际的数据加工结果千差万别,数据交换困难重重。本文分析了国内外数字化加工和标准进化的历程及我国在XML结构化数据加工中存在的问题,进一步分析了存档及交换标签集、出版标签集等不同子集的特点,提出既能完整保留论文原始信息,又便于提取各类结构化信息的数据加工及存储解决方案,可以根据需要通过减法转换生成符合各平台标准的数据加工存储格式,从而真正实现一次加工、多渠道投放和传播。
文摘The label text is a very important tool for the automatic processing of language. It is used in several applications such as morphological and syntactic text analysis, index-ing, retrieval, finished networks deterministic (in which all combinations of words that are accepted by the grammar are listed) or by statistical grammars (e.g., an n-gram in which the probabilities of sequences of n words in a specific order are given), etc. In this article, we developed a morphosyntactic labeling system language “Baoule” using hidden Markov models. This will allow us to build a tagged reference corpus and rep-resent major grammatical rules faced “Baoule” language in general. To estimate the parameters of this model, we used a training corpus manually labeled using a set of morpho-syntactic labels. We then proceed to an improvement of the system through the re-estimation procedure parameters of this model.