Currently, open-source software is gradually being integrated into industrial software, while industry protocolsin industrial software are also gradually transferred to open-source community development. Industrial pr...Currently, open-source software is gradually being integrated into industrial software, while industry protocolsin industrial software are also gradually transferred to open-source community development. Industrial protocolstandardization organizations are confronted with fragmented and numerous code PR (Pull Request) and informalproposals, and differentworkflowswill lead to increased operating costs. The open-source community maintenanceteam needs software that is more intelligent to guide the identification and classification of these issues. To solvethe above problems, this paper proposes a PR review prediction model based on multi-dimensional features. Weextract 43 features of PR and divide them into five dimensions: contributor, reviewer, software project, PR, andsocial network of developers. The model integrates the above five-dimensional features, and a prediction model isbuilt based on a Random Forest Classifier to predict the review results of PR. On the other hand, to improve thequality of rejected PRs, we focus on problems raised in the review process and review comments of similar PRs.Wepropose a PR revision recommendation model based on the PR review knowledge graph. Entity information andrelationships between entities are extracted from text and code information of PRs, historical review comments,and related issues. PR revisions will be recommended to code contributors by graph-based similarity calculation.The experimental results illustrate that the above twomodels are effective and robust in PR review result predictionand PR revision recommendation.展开更多
This paper empirically investigates the relationships between 15 design metrics and maintainability of 148 Java open source software. The results show that size and complexity metrics are strongly related to the maint...This paper empirically investigates the relationships between 15 design metrics and maintainability of 148 Java open source software. The results show that size and complexity metrics are strongly related to the maintainability of open source software. However, cohesion and coupling, as currently captured by existing metrics, do not seem to have a significant impact on maintainability. When used together, these metrics can predict system maintainability fairly accurately (mean MREs below 30%).展开更多
This paper gave a general evaluation on existing three popular free and open source desktop GIS projects,according to the selected evaluation criteria.To further the understanding of the open source software,this pape...This paper gave a general evaluation on existing three popular free and open source desktop GIS projects,according to the selected evaluation criteria.To further the understanding of the open source software,this paper also presented a customization example of QGIS with python and PyQT.展开更多
This research describes a quantitative,rapid,and low-cost methodology for debris flow susceptibility evaluation at the basin scale using open-access data and geodatabases.The proposed approach can aid decision makers ...This research describes a quantitative,rapid,and low-cost methodology for debris flow susceptibility evaluation at the basin scale using open-access data and geodatabases.The proposed approach can aid decision makers in land management and territorial planning,by first screening for areas with a higher debris flow susceptibility.Five environmental predisposing factors,namely,bedrock lithology,fracture network,quaternary deposits,slope inclination,and hydrographic network,were selected as independent parameters and their mutual interactions were described and quantified using the Rock Engineering System(RES)methodology.For each parameter,specific indexes were proposed,aiming to provide a final synthetic and representative index of debris flow susceptibility at the basin scale.The methodology was tested in four basins located in the Upper Susa Valley(NW Italian Alps)where debris flow events are the predominant natural hazard.The proposed matrix can represent a useful standardized tool,universally applicable,since it is independent of type and characteristic of the basin.展开更多
With the rapid development of lnternet technology, the volume of data has increased exponentially. As the large amounts of data are no longer easy to be managed and secured by the owners, big data security and privacy...With the rapid development of lnternet technology, the volume of data has increased exponentially. As the large amounts of data are no longer easy to be managed and secured by the owners, big data security and privacy has become a hot issue. One of the most popular research fields for solving the data security and data privacy is within the scope of big data governance and security, In this paper, we introduce the basic concepts of data governance and security. Then, all the state-of-the-art open source frameworks for data governance and security, including Apache Falcon, Apache Atlas, Apache Ranger, Apache Sentry and Kerberos, are detailed and discussed with descriptions of their implementation principles and possible applications.展开更多
An open source high level synthesis fixed-to-floating and floating-to-fixed conversion tool is presented for embedded design, communication systems, and signal processing applications. Many systems use a fixed point n...An open source high level synthesis fixed-to-floating and floating-to-fixed conversion tool is presented for embedded design, communication systems, and signal processing applications. Many systems use a fixed point number system. Fixed point numbers often need to be converted to floating point numbers for higher accuracy, dynamic range, fixed-length transmission limitations or end user requirements. A similar conversion system is needed to convert floating point numbers to fixed point numbers due to the advantages that fixed point numbers offer when compared with floating point number systems, such as compact hardware, reduced verification time and design effort. The latest embedded and SoC designs use both number systems together to improve accuracy or reduce required hardware in the same design. The proposed open source design and verification tool converts fixed point numbers to floating point numbers, and floating point numbers to fixed point numbers using the IEEE-754 floating point number standard. This open source design tool generates HDL code and its test bench that can be implemented in FPGA and VLSI systems. The design can be compiled and simulated using open source Iverilog/GTKWave and verified using Octave. A high level synthesis tool and GUI are designed using C#. The proposed design tool can increase productivity by reducing the design and verification time, as well as reduce the development cost due to the open source nature of the design tool. The proposed design tool can be used as a standalone block generator or implemented into current designs to improve range, accuracy, and reduce the development cost. The generated design has been implemented on Xilinx FPGAs.展开更多
Open source software (OSS) has become an indispensable part of society, not only for personal use but also for corporate use. Projects developed and operated by OSS are called open source projects, and the number of s...Open source software (OSS) has become an indispensable part of society, not only for personal use but also for corporate use. Projects developed and operated by OSS are called open source projects, and the number of such projects is increasing. On the other hand, because anyone can participate in an open source project, the progress of the project is uncertain due to differences in project members’ skills, development environments, and time zones of activity. Therefore, many users and companies need to understand the development and operation status of open source project. Then, the developers carefully make decisions on upgrading or installing new OSS. In this paper, we focus on the maintenance effort estimation for open source projects considering uncertainty. Also, we evaluate the project quantitatively using Earned Value Management (EVM). Moreover, we examine the appropriateness of the model for predicting the maintenance effort expeditures. Furthermore, we discuss the appropriateness of this EVM method.展开更多
Advancements in neuroscience research present opportunities and challenges,requiring substantial resources and funding.To address this,we describe here“Poke And Delayed Drink Intertemporal Choice Task(POKE-ADDICT)”,...Advancements in neuroscience research present opportunities and challenges,requiring substantial resources and funding.To address this,we describe here“Poke And Delayed Drink Intertemporal Choice Task(POKE-ADDICT)”,an open-source,versatile,and cost-effective apparatus for intertemporal choice testing in rodents.This allows quantification of delay discounting(DD),a cross-species phenomenon observed in decision making which provides valuable insights into higher-order cognitive functioning.In DD,the subjective value of a delayed reward is reduced as a function of the delay for its receipt.Using our apparatus,we implemented an effective intertemporal choice paradigm for the quantification of DD based on an adjusting delayed amount(ADA)algorithm using mango juice as a reward.Our paradigm requires limited training,a few 3D-printed parts and inexpensive electrical components,including a Raspberry Pi control unit.Furthermore,it is compatible with several in vivo procedures and the use of nose pokes instead of levers allows for faster task learning.Besides the main application described here,the apparatus can be further extended to implement other behavioral tests and protocols,including standard operant conditioning.In conclusion,we describe a versatile and cost-effective design based on Raspberry Pi that can support research in animal behavior,decision making and,more specifically,delay discounting.展开更多
The open-source movement profoundly impacts the development of computer education.The current requirements for postgraduate cultivation in Chinese universities mainly include publishing papers,applying for patents,win...The open-source movement profoundly impacts the development of computer education.The current requirements for postgraduate cultivation in Chinese universities mainly include publishing papers,applying for patents,winning awards,and conducting research projects,which demonstrate the capabilities of students when they graduate from university.However,in today’s prevalent open-source culture,these types of assessments are still not comprehensive enough for postgraduate cultivation,especially for professional postgraduate degrees.For this reason,Zhejiang University takes the lead in proposing educational reforms for postgraduate cultivation based on the open-source ecosystem.It has implemented a new“trinity”mechanism(i.e.,open-source course,open-source training,and open-source capability evaluation)for graduate training centered on open source,serving as a novel supplement to the traditional methods of postgraduate cultivation.After a year of pilot operation,this new approach,deeply loved by teachers and students,has achieved good results and positive feedback.展开更多
Robust face representation is imperative to highly accurate face recognition. In this work, we propose an open source face recognition method with deep representation named as VIPLFaceNet, which is a lO-layer deep con...Robust face representation is imperative to highly accurate face recognition. In this work, we propose an open source face recognition method with deep representation named as VIPLFaceNet, which is a lO-layer deep convolu- tional neural network with seven convolutional layers and three fully-connected layers. Compared with the well-known AlexNet, our VIPLFaceNet takes only 20% training time and 60% testing time, but achieves 40% drop in error rate on the real-world face recognition benchmark LFW. Our VIPLFaceNet achieves 98.60% mean accuracy on LFW us- ing one single network. An open-source C++ SDK based on VIPLFaceNet is released under BSD license. The SDK takes about 150ms to process one face image in a single thread on an i7 desktop CPU. VIPLFaceNet provides a state-of-the-art start point for both academic and industrial face recognition applications.展开更多
An open source software (OSS) ecosystem refers to an OSS development community composed of many software projects and developers contributing to these projects. The projects and developers co-evolve in an ecosystem....An open source software (OSS) ecosystem refers to an OSS development community composed of many software projects and developers contributing to these projects. The projects and developers co-evolve in an ecosystem. To keep healthy evolution of such OSS ecosystems, there is a need of attracting and retaining developers, particularly project leaders and core developers who have major impact on the project and the whole team. Therefore, it is important to figure out the factors that influence developers' chance to evolve into project leaders and core developers. To identify such factors, we conducted a case study on the GNOME ecosystem. First, we collected indicators reflecting developers' subjective willingness to contribute to the project and the project environment that they stay in. Second, we calculated such indicators based on the GNOME dataset. Then, we fitted logistic regression models by taking as independent variables the resulting indicators after eliminating the most collinear ones, and taking as a dependent variable the future developer role (the core developer or project leader). The results showed that part of such indicators (e.g., the total number of projects that a developer joined) of subjective willingness and project environment significantly influenced the developers' chance to evolve into core developers and project leaders. With different validation methods, our obtained model performs well on predicting developmental core developers, resulting in stable prediction performance (0.770, F-value).展开更多
Nowadays open source software becomes highly popular and is of great importance for most software engi- neering activities. To facilitate software organization and re- trieval, tagging is extensively used in open sour...Nowadays open source software becomes highly popular and is of great importance for most software engi- neering activities. To facilitate software organization and re- trieval, tagging is extensively used in open source communi- ties. However, finding the desired software through tags in these communities such as Freecode and ohloh is still chal- lenging because of tag insufficiency. In this paper, we propose TRG (tag recommendation based on semantic graph), a novel approach to discovering and enriching tags of open source software. Firstly, we propose a semantic graph to model the semantic correlations between tags and the words in software descriptions. Then based on the graph, we design an effec- tive algorithm to recommend tags for software. With com- prehensive experiments on large-scale open source software datasets by comparing with several typical related works, we demonstrate the effectiveness and efficiency of our method in recommending proper tags.展开更多
Geoportals have been the primary source of spatial information to researchers in diverse fields.Recent years have seen a growing trend to integrate spatial analysis and geovisual analytics inside Geoportals.Researcher...Geoportals have been the primary source of spatial information to researchers in diverse fields.Recent years have seen a growing trend to integrate spatial analysis and geovisual analytics inside Geoportals.Researchers could use the Geoportal to conduct basic analysis without offline processing.In practice,domain-specific analysis often requires researchers to integrate heterogeneous data sources,leverage new statistical models,or build their own customized models.These tasks are increasingly being tackled with open source tools in programming languages such as Python or R.However,it is unrealistic to incorporate the numerous open source tools in a Geoportal platform for data processing and analysis.This work provides an exploratory effort to bridge Geoportals and open source tools through Python scripting.The Geoportal demonstrated in this work is the Urban and Regional Explorer for China studies.A python package is provided to manipulate this platform in the local programming environment.The server side of the Geoportal implements a set of service endpoints that allows the package to upload,transform,and process user data and seamlessly integrate them into the existing datasets.A case study is provided that illustrated the use of this package to conduct integrated analyses of search engine data and baseline census data.This work attempts a new direction in Geoportal development,which could further promote the transformation of Geoportals into online analytical workbenches.展开更多
The development,integration,and distribution of the information and spatial data infrastructure(i.e.Digital Earth;DE)necessary to support the vision and goals of Future Earth(FE)will occur in a distributed fashion,in ...The development,integration,and distribution of the information and spatial data infrastructure(i.e.Digital Earth;DE)necessary to support the vision and goals of Future Earth(FE)will occur in a distributed fashion,in very diverse technological,institutional,socio-cultural,and economic contexts around the world.This complex context and ambitious goals require bringing to bear not only the best minds,but also the best science and technologies available.Free and Open Source Software for Geospatial Applications(FOSS4G)offers mature,capable and reliable software to contribute to the creation of this infrastructure.In this paper we point to a selected set of some of the most mature and reliable FOSS4G solutions that can be used to develop the functionality required as part of DE and FE.We provide examples of large-scale,sophisticated,mission-critical applications of each software to illustrate their power and capabilities in systems where they perform roles or functionality similar to the ones they could perform as part of DE and FE.We provide information and resources to assist the readers in carrying out their own assessments to select the best FOSS4G solutions for their particular contexts and system development needs.展开更多
Purpose–Data science is the study of the generalizable extraction of knowledge from data.It includes a variety of components and develops on methods and concepts from many domains,containing mathematics,probability m...Purpose–Data science is the study of the generalizable extraction of knowledge from data.It includes a variety of components and develops on methods and concepts from many domains,containing mathematics,probability models,machine learning,statistical learning,computer programming,data engineering,pattern recognition and learning,visualization and data warehousing aiming to extract value from data.The purpose of this paper is to provide an overview of open source(OS)data science tools,proposing a classification scheme that can be used to study OS data science software.Design/methodology/approach–The proposed classification scheme is based on general characteristics,project activity,operational characteristics and data mining characteristics.The authors then use the proposed scheme to examine 70 identified Open Source Software.From this the authors provide insight about the current status of OS data science tools and reveal the state-of-the-art tools.Findings–The features of 70 OS tools are recorded based on the criteria of the four group characteristics,general characteristics,project activity,operational characteristics and data mining characteristics.Interesting results came from the analysis of these features and are recorded here.Originality/value–The contribution of this survey is development of a new classification scheme for examination and study of OS data science tools.In parallel,this study provides an overview of existing OS data science tools.展开更多
This research paper compares Excel and R language for data analysis and concludes that R language is more suitable for complex data analysis tasks.R language’s open-source nature makes it accessible to everyone,and i...This research paper compares Excel and R language for data analysis and concludes that R language is more suitable for complex data analysis tasks.R language’s open-source nature makes it accessible to everyone,and its powerful data management and analysis tools make it suitable for handling complex data analysis tasks.It is also highly customizable,allowing users to create custom functions and packages to meet their specific needs.Additionally,R language provides high reproducibility,making it easy to replicate and verify research results,and it has excellent collaboration capabilities,enabling multiple users to work on the same project simultaneously.These advantages make R language a more suitable choice for complex data analysis tasks,particularly in scientific research and business applications.The findings of this study will help people understand that R is not just a language that can handle more data than Excel and demonstrate that r is essential to the field of data analysis.At the same time,it will also help users and organizations make informed decisions regarding their data analysis needs and software preferences.展开更多
基金support of National Social Science Fund(NSSF)under Grant(No.22BTQ033).
文摘Currently, open-source software is gradually being integrated into industrial software, while industry protocolsin industrial software are also gradually transferred to open-source community development. Industrial protocolstandardization organizations are confronted with fragmented and numerous code PR (Pull Request) and informalproposals, and differentworkflowswill lead to increased operating costs. The open-source community maintenanceteam needs software that is more intelligent to guide the identification and classification of these issues. To solvethe above problems, this paper proposes a PR review prediction model based on multi-dimensional features. Weextract 43 features of PR and divide them into five dimensions: contributor, reviewer, software project, PR, andsocial network of developers. The model integrates the above five-dimensional features, and a prediction model isbuilt based on a Random Forest Classifier to predict the review results of PR. On the other hand, to improve thequality of rejected PRs, we focus on problems raised in the review process and review comments of similar PRs.Wepropose a PR revision recommendation model based on the PR review knowledge graph. Entity information andrelationships between entities are extracted from text and code information of PRs, historical review comments,and related issues. PR revisions will be recommended to code contributors by graph-based similarity calculation.The experimental results illustrate that the above twomodels are effective and robust in PR review result predictionand PR revision recommendation.
基金Supported by the National Natural Science Foundation of China (60425206, 60633010)the High Technology Research Project of Jiangsu Province (BG2005032)the Specialized Research Fund for the Doctoral Program of Higher Education of China (20060286020)
文摘This paper empirically investigates the relationships between 15 design metrics and maintainability of 148 Java open source software. The results show that size and complexity metrics are strongly related to the maintainability of open source software. However, cohesion and coupling, as currently captured by existing metrics, do not seem to have a significant impact on maintainability. When used together, these metrics can predict system maintainability fairly accurately (mean MREs below 30%).
文摘This paper gave a general evaluation on existing three popular free and open source desktop GIS projects,according to the selected evaluation criteria.To further the understanding of the open source software,this paper also presented a customization example of QGIS with python and PyQT.
文摘This research describes a quantitative,rapid,and low-cost methodology for debris flow susceptibility evaluation at the basin scale using open-access data and geodatabases.The proposed approach can aid decision makers in land management and territorial planning,by first screening for areas with a higher debris flow susceptibility.Five environmental predisposing factors,namely,bedrock lithology,fracture network,quaternary deposits,slope inclination,and hydrographic network,were selected as independent parameters and their mutual interactions were described and quantified using the Rock Engineering System(RES)methodology.For each parameter,specific indexes were proposed,aiming to provide a final synthetic and representative index of debris flow susceptibility at the basin scale.The methodology was tested in four basins located in the Upper Susa Valley(NW Italian Alps)where debris flow events are the predominant natural hazard.The proposed matrix can represent a useful standardized tool,universally applicable,since it is independent of type and characteristic of the basin.
文摘With the rapid development of lnternet technology, the volume of data has increased exponentially. As the large amounts of data are no longer easy to be managed and secured by the owners, big data security and privacy has become a hot issue. One of the most popular research fields for solving the data security and data privacy is within the scope of big data governance and security, In this paper, we introduce the basic concepts of data governance and security. Then, all the state-of-the-art open source frameworks for data governance and security, including Apache Falcon, Apache Atlas, Apache Ranger, Apache Sentry and Kerberos, are detailed and discussed with descriptions of their implementation principles and possible applications.
文摘An open source high level synthesis fixed-to-floating and floating-to-fixed conversion tool is presented for embedded design, communication systems, and signal processing applications. Many systems use a fixed point number system. Fixed point numbers often need to be converted to floating point numbers for higher accuracy, dynamic range, fixed-length transmission limitations or end user requirements. A similar conversion system is needed to convert floating point numbers to fixed point numbers due to the advantages that fixed point numbers offer when compared with floating point number systems, such as compact hardware, reduced verification time and design effort. The latest embedded and SoC designs use both number systems together to improve accuracy or reduce required hardware in the same design. The proposed open source design and verification tool converts fixed point numbers to floating point numbers, and floating point numbers to fixed point numbers using the IEEE-754 floating point number standard. This open source design tool generates HDL code and its test bench that can be implemented in FPGA and VLSI systems. The design can be compiled and simulated using open source Iverilog/GTKWave and verified using Octave. A high level synthesis tool and GUI are designed using C#. The proposed design tool can increase productivity by reducing the design and verification time, as well as reduce the development cost due to the open source nature of the design tool. The proposed design tool can be used as a standalone block generator or implemented into current designs to improve range, accuracy, and reduce the development cost. The generated design has been implemented on Xilinx FPGAs.
文摘Open source software (OSS) has become an indispensable part of society, not only for personal use but also for corporate use. Projects developed and operated by OSS are called open source projects, and the number of such projects is increasing. On the other hand, because anyone can participate in an open source project, the progress of the project is uncertain due to differences in project members’ skills, development environments, and time zones of activity. Therefore, many users and companies need to understand the development and operation status of open source project. Then, the developers carefully make decisions on upgrading or installing new OSS. In this paper, we focus on the maintenance effort estimation for open source projects considering uncertainty. Also, we evaluate the project quantitatively using Earned Value Management (EVM). Moreover, we examine the appropriateness of the model for predicting the maintenance effort expeditures. Furthermore, we discuss the appropriateness of this EVM method.
文摘Advancements in neuroscience research present opportunities and challenges,requiring substantial resources and funding.To address this,we describe here“Poke And Delayed Drink Intertemporal Choice Task(POKE-ADDICT)”,an open-source,versatile,and cost-effective apparatus for intertemporal choice testing in rodents.This allows quantification of delay discounting(DD),a cross-species phenomenon observed in decision making which provides valuable insights into higher-order cognitive functioning.In DD,the subjective value of a delayed reward is reduced as a function of the delay for its receipt.Using our apparatus,we implemented an effective intertemporal choice paradigm for the quantification of DD based on an adjusting delayed amount(ADA)algorithm using mango juice as a reward.Our paradigm requires limited training,a few 3D-printed parts and inexpensive electrical components,including a Raspberry Pi control unit.Furthermore,it is compatible with several in vivo procedures and the use of nose pokes instead of levers allows for faster task learning.Besides the main application described here,the apparatus can be further extended to implement other behavioral tests and protocols,including standard operant conditioning.In conclusion,we describe a versatile and cost-effective design based on Raspberry Pi that can support research in animal behavior,decision making and,more specifically,delay discounting.
基金supported by the Fundamental Research Funds for the Central Universities(No.226202200064)the National Natural Science Foundation of China(No.62202419)+1 种基金the Ningbo Natural Science Foundation(No.2022J184)the State Street Zhejiang University Technology Center。
文摘The open-source movement profoundly impacts the development of computer education.The current requirements for postgraduate cultivation in Chinese universities mainly include publishing papers,applying for patents,winning awards,and conducting research projects,which demonstrate the capabilities of students when they graduate from university.However,in today’s prevalent open-source culture,these types of assessments are still not comprehensive enough for postgraduate cultivation,especially for professional postgraduate degrees.For this reason,Zhejiang University takes the lead in proposing educational reforms for postgraduate cultivation based on the open-source ecosystem.It has implemented a new“trinity”mechanism(i.e.,open-source course,open-source training,and open-source capability evaluation)for graduate training centered on open source,serving as a novel supplement to the traditional methods of postgraduate cultivation.After a year of pilot operation,this new approach,deeply loved by teachers and students,has achieved good results and positive feedback.
基金This work was partially supported by the National Basic Research Program of China (973 Program) (2015CB351802), and the National Natural Science Foundation of China (Grant Nos. 61402443, 61390511, 61379083, 61222211).
文摘Robust face representation is imperative to highly accurate face recognition. In this work, we propose an open source face recognition method with deep representation named as VIPLFaceNet, which is a lO-layer deep convolu- tional neural network with seven convolutional layers and three fully-connected layers. Compared with the well-known AlexNet, our VIPLFaceNet takes only 20% training time and 60% testing time, but achieves 40% drop in error rate on the real-world face recognition benchmark LFW. Our VIPLFaceNet achieves 98.60% mean accuracy on LFW us- ing one single network. An open-source C++ SDK based on VIPLFaceNet is released under BSD license. The SDK takes about 150ms to process one face image in a single thread on an i7 desktop CPU. VIPLFaceNet provides a state-of-the-art start point for both academic and industrial face recognition applications.
基金This work is supported by the National Key Research and Development Program of China under Grant No. 2016YFB0800400, the National Basic Research 973 Program of China under Grant No. 2014CB340404, the National Natural Science Foundation of China under Grant Nos. 61572371, 61273216, and 61272111, the China Postdoctoral Science Foundation (CPSF) under Grant No. 2015M582272, the Natural Science Foundation of Hubei Province of China under Grant No. 2016CFB158, and the Fundamental Research Funds for the Central Universities of China under Grant No. 2042016kf0033.
文摘An open source software (OSS) ecosystem refers to an OSS development community composed of many software projects and developers contributing to these projects. The projects and developers co-evolve in an ecosystem. To keep healthy evolution of such OSS ecosystems, there is a need of attracting and retaining developers, particularly project leaders and core developers who have major impact on the project and the whole team. Therefore, it is important to figure out the factors that influence developers' chance to evolve into project leaders and core developers. To identify such factors, we conducted a case study on the GNOME ecosystem. First, we collected indicators reflecting developers' subjective willingness to contribute to the project and the project environment that they stay in. Second, we calculated such indicators based on the GNOME dataset. Then, we fitted logistic regression models by taking as independent variables the resulting indicators after eliminating the most collinear ones, and taking as a dependent variable the future developer role (the core developer or project leader). The results showed that part of such indicators (e.g., the total number of projects that a developer joined) of subjective willingness and project environment significantly influenced the developers' chance to evolve into core developers and project leaders. With different validation methods, our obtained model performs well on predicting developmental core developers, resulting in stable prediction performance (0.770, F-value).
文摘Nowadays open source software becomes highly popular and is of great importance for most software engi- neering activities. To facilitate software organization and re- trieval, tagging is extensively used in open source communi- ties. However, finding the desired software through tags in these communities such as Freecode and ohloh is still chal- lenging because of tag insufficiency. In this paper, we propose TRG (tag recommendation based on semantic graph), a novel approach to discovering and enriching tags of open source software. Firstly, we propose a semantic graph to model the semantic correlations between tags and the words in software descriptions. Then based on the graph, we design an effec- tive algorithm to recommend tags for software. With com- prehensive experiments on large-scale open source software datasets by comparing with several typical related works, we demonstrate the effectiveness and efficiency of our method in recommending proper tags.
文摘Geoportals have been the primary source of spatial information to researchers in diverse fields.Recent years have seen a growing trend to integrate spatial analysis and geovisual analytics inside Geoportals.Researchers could use the Geoportal to conduct basic analysis without offline processing.In practice,domain-specific analysis often requires researchers to integrate heterogeneous data sources,leverage new statistical models,or build their own customized models.These tasks are increasingly being tackled with open source tools in programming languages such as Python or R.However,it is unrealistic to incorporate the numerous open source tools in a Geoportal platform for data processing and analysis.This work provides an exploratory effort to bridge Geoportals and open source tools through Python scripting.The Geoportal demonstrated in this work is the Urban and Regional Explorer for China studies.A python package is provided to manipulate this platform in the local programming environment.The server side of the Geoportal implements a set of service endpoints that allows the package to upload,transform,and process user data and seamlessly integrate them into the existing datasets.A case study is provided that illustrated the use of this package to conduct integrated analyses of search engine data and baseline census data.This work attempts a new direction in Geoportal development,which could further promote the transformation of Geoportals into online analytical workbenches.
文摘The development,integration,and distribution of the information and spatial data infrastructure(i.e.Digital Earth;DE)necessary to support the vision and goals of Future Earth(FE)will occur in a distributed fashion,in very diverse technological,institutional,socio-cultural,and economic contexts around the world.This complex context and ambitious goals require bringing to bear not only the best minds,but also the best science and technologies available.Free and Open Source Software for Geospatial Applications(FOSS4G)offers mature,capable and reliable software to contribute to the creation of this infrastructure.In this paper we point to a selected set of some of the most mature and reliable FOSS4G solutions that can be used to develop the functionality required as part of DE and FE.We provide examples of large-scale,sophisticated,mission-critical applications of each software to illustrate their power and capabilities in systems where they perform roles or functionality similar to the ones they could perform as part of DE and FE.We provide information and resources to assist the readers in carrying out their own assessments to select the best FOSS4G solutions for their particular contexts and system development needs.
基金The research leading to the results presented in this paper has received funding from the European Union Seventh Framework Programme(FP7-2012-NMP-ICT-FoF)under Grant Agreement No.314364.
文摘Purpose–Data science is the study of the generalizable extraction of knowledge from data.It includes a variety of components and develops on methods and concepts from many domains,containing mathematics,probability models,machine learning,statistical learning,computer programming,data engineering,pattern recognition and learning,visualization and data warehousing aiming to extract value from data.The purpose of this paper is to provide an overview of open source(OS)data science tools,proposing a classification scheme that can be used to study OS data science software.Design/methodology/approach–The proposed classification scheme is based on general characteristics,project activity,operational characteristics and data mining characteristics.The authors then use the proposed scheme to examine 70 identified Open Source Software.From this the authors provide insight about the current status of OS data science tools and reveal the state-of-the-art tools.Findings–The features of 70 OS tools are recorded based on the criteria of the four group characteristics,general characteristics,project activity,operational characteristics and data mining characteristics.Interesting results came from the analysis of these features and are recorded here.Originality/value–The contribution of this survey is development of a new classification scheme for examination and study of OS data science tools.In parallel,this study provides an overview of existing OS data science tools.
文摘This research paper compares Excel and R language for data analysis and concludes that R language is more suitable for complex data analysis tasks.R language’s open-source nature makes it accessible to everyone,and its powerful data management and analysis tools make it suitable for handling complex data analysis tasks.It is also highly customizable,allowing users to create custom functions and packages to meet their specific needs.Additionally,R language provides high reproducibility,making it easy to replicate and verify research results,and it has excellent collaboration capabilities,enabling multiple users to work on the same project simultaneously.These advantages make R language a more suitable choice for complex data analysis tasks,particularly in scientific research and business applications.The findings of this study will help people understand that R is not just a language that can handle more data than Excel and demonstrate that r is essential to the field of data analysis.At the same time,it will also help users and organizations make informed decisions regarding their data analysis needs and software preferences.