期刊文献+

基于Map/Reduce并行编程模型的XBRL维度数据解析算法 被引量:1

An XBRL dimensional data parsing algorithm based on the Map /Reduce parallel programming model
下载PDF
导出
摘要 从XBRL维度数据处理的角度,研究大规模半结构化数据处理技术,提出一种基于Map/Reduce并行编程模型的XBRL维度数据解析算法.该算法在Map/Reduce编程模型和StAX流式解析技术的基础上,针对XBRL财务报告中各XML文件之间较复杂的数据引用关系,以整份XBRL财务报告为处理的最小单位,结合并行技术提取维度事项所包含的数据,再处理业务语义数据,从而实现复杂XBRL维度数据的解析.性能比较分析表明,该算法在大规模XBRL数据处理方面具有显著优势. This article intends to study mass semi-structured data processing technology from XBRL dimensional data processing perspective. A new XBRL dimensional data parsing algorithm is proposed based on the Map/Reduce parallel programming model and StAX stream parsing technique. The algorithm specifically targets the analysis of complex data reference relationships among XML files in the XBRL financial report. In order to parse complex XBRL dimensional data, the algorithm uses a single XBRL financial report as the minimum processing unit. First, the data are extracted from the dimensional fact items, and then the business semantic data are processed. In experimental tests, the proposed algorithm presents an obvious advantage in large-scale XBRL data processing.
出处 《中国科学院大学学报(中英文)》 CAS CSCD 北大核心 2014年第1期124-129,共6页 Journal of University of Chinese Academy of Sciences
基金 国家自然科学基金(61303155)资助
关键词 XBRL 半结构化数据处理 大数据处理 MAP REDUCE XBRL维度 XBRL semi-structured data processing big data processing Map/Reduce XBRL dimension
  • 相关文献

参考文献8

  • 1李国杰,程学旗.大数据研究:未来科技及经济社会发展的重大战略领域——大数据的研究现状与科学思考[J].中国科学院院刊,2012,27(6):647-657. 被引量:1593
  • 2Shi Z Z. Big data mining in the cloud,intelligent information processing Ⅵ[M].Springer Berlin Heidelberg,2012.13-14.
  • 3Mika S I. Preface to part Ⅲ adaptive big data analytics.procedia computer science[M].Elsevier B V,2013.211.
  • 4Jeffrey D. MapReduce:a flexible data processing tool[J].{H}Communications of the ACM,2010,(01):72-77.
  • 5覃雄派,王会举,杜小勇,王珊.大数据分析——RDBMS与MapReduce的竞争与共生[J].软件学报,2012,23(1):32-45. 被引量:386
  • 6Dean J,Ghemawat S. MapReduce:simplified data processing on large clusters[J].{H}Communications of the ACM,2008,(01):107-113.
  • 7Michele T,Stefano Crespi-Reghizzi. Parallel iterative compilation:using MapReduce to speedup machine learning in compilers[A].ACM New York,NY,USA,2012.18-19.
  • 8Daniel Z,Shawn B,Sven K. Parallelizing XML data-streaming workflows via MapReduce[J].{H}Journal of Computer and System Sciences,2010,(06):447-463.

二级参考文献100

  • 1Zhou MQ, Zhang R, Zeng DD, Qian WN, Zhou AY. Join optimization in the MapReduce environment for column-wise data store. In: Fang YF, Huang ZX, eds. Proc. of the SKG. Ningbo: IEEE Computer Society, 2010.97-104. [doi: 10.1109/SKG.2010.18].
  • 2Afrati FN, Ullman JD. Optimizing joins in a Map-Reduce environment. In: Manolescu I, Spaecapietra S, Teubner J, Kitsuregawa M, Leger A, Naumann F, Ailamaki A, Ozcan F, eds. Proc. of the EDBT. Lausanne: ACM Press, 2010. 99-110. [doi: 10.1145/ 1739041.1739056].
  • 3Sandholm T, Lai K. MapReduce optimization using regulated dynamic prioritization. In: Douceur JR, Greenberg AG, Bonald T, Nieh J, eds. Proc. of the SIGMETRICS. Seattle: ACM Press, 2009. 299-310. [doi: 10.1145/1555349.1555384].
  • 4Hoefler T, Lumsdaine A, Dongarra J. Towards; efficient MapReduce using MPI. In: Oster P, ed. Proc. of the EuroPVM/MPI. Berlin: Springer-Verlag, 2009. 240-249. [doi: 10.100'7/978-3-642-03770-2_30].
  • 5Nykiel T, Potamias M, Mishra C, Kollios G, Koudas N. MRShare: Sharing across multiple queries in MapReduce. PVLDB, 2010, 3(1-2):494-505.
  • 6Kambatla K, Rapolu N, Jagannathan S, Grama A. Asynchronous algorithms in MapReduce. In: Moreira JE, Matsuoka S, Pakin S, Cortes T, eds. Proc. of the CLUSTER. Crete: IEEE Press, 2010. 245-254. [doi: 10.1109/CLUSTER.2010.30].
  • 7Polo J, Carrera D, Becerra Y, Torres J, Ayguad6 E, Steinder M, Whalley I. Performance-Driven task co-scheduling for MapReduce environments. In: Tonouchi T, Kim MS, eds. Proc. of the 1EEE Network Operations and Management Symp. (NOMS). Osaka: IEEE Press, 2010. 373-380. [doi: 10.1109/NOMS.2010.5488494].
  • 8Zaharia M, Konwinski A, Joseph AD, Katz R, Stoica I. Improving MapReduce performance in heterogeneous environments. In: Draves R, van Renesse R, eds. Proc. of the ODSI. Berkeley: USENIX Association, 2008.29-42.
  • 9Xie J, Yin S, Ruan XJ, Ding ZY, Tian Y, Majors J, Manzanares A, Qin X. Improving MapReduce performance through data placement in heterogeneous Hadoop clusters. In: Taufer M, Rfinger G, Du ZH, eds. Proc. of the Workshop on Heterogeneity in Computing (IPDPS 2010). Atlanta: IEEE Press, 2010. 1-9. [doi: 10.1109/IPDPSW.2010.5470880].
  • 10Polo J, Carrera D, Becerra Y, Beltran V, Torres J, Ayguad6 E. Performance management of accelerated MapReduce workloads in heterogeneous clusters. In: Qin F, Barolli L, Cho SY, eds. Proc. of the ICPP. San Diego: IEEE Press, 2010. 653-662. [doi: 10.1109/ ICPP.2010.73].

共引文献1934

同被引文献20

引证文献1

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部