Due to outstanding performance in cheminformatics,machine learning algorithms have been increasingly used to mine molecular properties and biomedical big data.The performance of machine learning models is known to cri...Due to outstanding performance in cheminformatics,machine learning algorithms have been increasingly used to mine molecular properties and biomedical big data.The performance of machine learning models is known to critically depend on the selection of the hyper-parameter configuration.However,many studies either explored the optimal hyper-parameters per the grid searching method or employed arbitrarily selected hyper-parameters,which can easily lead to achieving a suboptimal hyper-parameter configuration.In this study,Hyperopt library embedding with the Bayesian optimization is employed to find optimal hyper-parameters for different machine learning algorithms.Six drug discovery datasets,including solubility,probe-likeness,h ERG,Chagas disease,tuberculosis,and malaria,are used to compare different machine learning algorithms with ECFP6 fingerprints.This contribution aims to evaluate whether the Bernoulli Na?ve Bayes,logistic linear regression,Ada Boost decision tree,random forest,support vector machine,and deep neural networks algorithms with optimized hyper-parameters can offer any improvement in testing as compared with the referenced models assessed by an array of metrics including AUC,F1-score,Cohen’s kappa,Matthews correlation coefficient,recall,precision,and accuracy.Based on the rank normalized score approach,the Hyperopt models achieve better or comparable performance on 33 out 36 models for different drug discovery datasets,showing significant improvement achieved by employing the Hyperopt library.The open-source code of all the 6 machine learning frameworks employed in the Hyperopt python package is provided to make this approach accessible to more scientists,who are not familiar with writing code.展开更多
Copyright and its international complications have presented a significant barrier to the Universal Digital Library (UDL)'s mission to digitize all the published works of mankind and make them available throughout ...Copyright and its international complications have presented a significant barrier to the Universal Digital Library (UDL)'s mission to digitize all the published works of mankind and make them available throughout the world. We discuss the effect of existing copyright treaties and various proposals, such as compulsory licensing and the public lending fight that would allow access to copyrighted works without requiring permission of their owners. We argue that these schemes are ineffective for purposes of the UDL. Instead, making use of the international consensus that copyright does not protect facts, information or processes, we propose to scan works digitally to extract their intellectual content, and then generate by machine synthetic works that capture this content, and then translate the generated works automatically into multiple languages and distribute them free of copyright restriction.展开更多
This article introduces several of the more significant service delivery innovations and their resulting accomplishments instigated by Dongguan Library(thereafter abbreviated as DGL) in recent years. The textual expos...This article introduces several of the more significant service delivery innovations and their resulting accomplishments instigated by Dongguan Library(thereafter abbreviated as DGL) in recent years. The textual exposition of this paper is based on a case study of DGL by this author about its user-centered vision, mission, immediate objectives and the exhibited service performance within a contextual environment of collegial support from its professional peers. After a five-year period of intensive efforts on such focused professional development and practice, DGL has completed its information service mapping and information delivery for the entire municipality of Dongguan on a 7-day and twentyfour-hour(7/24) basis. This singular feast of accomplishment seems to suggest the moral that any significant development of scale of a municipal library has to be in keeping pace closely with the progress of the society at large in general and with the changing information demands of its local clientele in particular.展开更多
This research examines industry-based dissertation research in a doctoralcomputing program through the lens of machine learning algorithms todetermine if natural language processing-based categorization on abstractsal...This research examines industry-based dissertation research in a doctoralcomputing program through the lens of machine learning algorithms todetermine if natural language processing-based categorization on abstractsalone is adequate for classification. This research categorizes dissertationby both their abstracts and by their full-text using the GraphLabCreate library from Apple’s Turi to identify if abstract analysis is anadequate measure of content categorization, which we found was not. Wealso compare the dissertation categorizations using IBM’s Watson Discoverydeep machine learning tool. Our research provides perspectiveson the practicality of the manual classification of technical documents;and, it provides insights into the: (1) categories of academic work createdby experienced fulltime working professionals in a Computing doctoralprogram, (2) viability and performance of automated categorization of theabstract analysis against the fulltext dissertation analysis, and (3) natuallanguage processing versus human manual text classification abstraction.展开更多
Ni-Ti-based shape memory alloys(SMAs)have found widespread use in the last 70 years,but improving their functional stability remains a key quest for more robust and advanced applications.Named for their ability to ret...Ni-Ti-based shape memory alloys(SMAs)have found widespread use in the last 70 years,but improving their functional stability remains a key quest for more robust and advanced applications.Named for their ability to retain their processed shape as a result of a reversible martensitic transformation,SMAs are highly sensitive to compositional variations.Alloying with ternary and quaternary elements to finetune the lattice parameters and the thermal hysteresis of an SMA,therefore,becomes a challenge in materials exploration.Combinatorial materials science allows streamlining of the synthesis process and data management from multiple characterization techniques.In this study,a composition spread of Ni-Ti-Cu-V thin-film library was synthesized by magnetron co-sputtering on a thermally oxidized Si wafer.Composition-dependent phase transformation temperature and microstructure were investigated and determined using high-throughput wavelength dispersive spectroscopy,synchrotron X-ray diffraction,and temperature-dependent resistance measurements.Of the 177 compositions in the materials library,32 were observed to have shape memory effect,of which five had zero or near-zero thermal hysteresis.These compositions provide flexibility in the operating temperature regimes that they can be used in.A phase map for the quaternary system and correlations of functional properties are discussed w让h respect to the local microstructure and composition of the thin-film library.展开更多
Software component library is the essential part of reuse-based softwaredevelopment. It is shown that making use of a single component library to store all kinds ofcomponents and from which components are searched is ...Software component library is the essential part of reuse-based softwaredevelopment. It is shown that making use of a single component library to store all kinds ofcomponents and from which components are searched is very inefficient. We construct multi-librariesto support software reuse and use PVM as development environments to imitate large-scale computer,which is expected to fulfill distributed storage and parallel search of components efficiently andimprove software reuse.展开更多
基于高校图书馆大数据的大学生成绩预测对于推动高校图书馆的服务创新和高等教育数字化转型具有重要意义。文章针对鲜有图书馆利用数据用于大学生成绩预测模型构建的现状,结合高校教务处学业数据和图书馆利用数据,基于机器学习方法构建...基于高校图书馆大数据的大学生成绩预测对于推动高校图书馆的服务创新和高等教育数字化转型具有重要意义。文章针对鲜有图书馆利用数据用于大学生成绩预测模型构建的现状,结合高校教务处学业数据和图书馆利用数据,基于机器学习方法构建了大学生成绩预测模型。实验结果表明,对逻辑思维要求较高的科目对学生成绩有显著的正相关性;图书馆利用数据(如图书借阅、入馆次数等)与平均学分绩点(Grade Point Average,GPA)呈现明显的正相关关系。该研究旨在为高校图书馆精准化服务提供有力支持,并为高等教育数字化转型提供有益参考。展开更多
基金financial support provided by the National Key Research and Development Project(2019YFC0214403)Chongqing Joint Chinese Medicine Scientific Research Project(2021ZY023984)。
文摘Due to outstanding performance in cheminformatics,machine learning algorithms have been increasingly used to mine molecular properties and biomedical big data.The performance of machine learning models is known to critically depend on the selection of the hyper-parameter configuration.However,many studies either explored the optimal hyper-parameters per the grid searching method or employed arbitrarily selected hyper-parameters,which can easily lead to achieving a suboptimal hyper-parameter configuration.In this study,Hyperopt library embedding with the Bayesian optimization is employed to find optimal hyper-parameters for different machine learning algorithms.Six drug discovery datasets,including solubility,probe-likeness,h ERG,Chagas disease,tuberculosis,and malaria,are used to compare different machine learning algorithms with ECFP6 fingerprints.This contribution aims to evaluate whether the Bernoulli Na?ve Bayes,logistic linear regression,Ada Boost decision tree,random forest,support vector machine,and deep neural networks algorithms with optimized hyper-parameters can offer any improvement in testing as compared with the referenced models assessed by an array of metrics including AUC,F1-score,Cohen’s kappa,Matthews correlation coefficient,recall,precision,and accuracy.Based on the rank normalized score approach,the Hyperopt models achieve better or comparable performance on 33 out 36 models for different drug discovery datasets,showing significant improvement achieved by employing the Hyperopt library.The open-source code of all the 6 machine learning frameworks employed in the Hyperopt python package is provided to make this approach accessible to more scientists,who are not familiar with writing code.
文摘Copyright and its international complications have presented a significant barrier to the Universal Digital Library (UDL)'s mission to digitize all the published works of mankind and make them available throughout the world. We discuss the effect of existing copyright treaties and various proposals, such as compulsory licensing and the public lending fight that would allow access to copyrighted works without requiring permission of their owners. We argue that these schemes are ineffective for purposes of the UDL. Instead, making use of the international consensus that copyright does not protect facts, information or processes, we propose to scan works digitally to extract their intellectual content, and then generate by machine synthetic works that capture this content, and then translate the generated works automatically into multiple languages and distribute them free of copyright restriction.
文摘This article introduces several of the more significant service delivery innovations and their resulting accomplishments instigated by Dongguan Library(thereafter abbreviated as DGL) in recent years. The textual exposition of this paper is based on a case study of DGL by this author about its user-centered vision, mission, immediate objectives and the exhibited service performance within a contextual environment of collegial support from its professional peers. After a five-year period of intensive efforts on such focused professional development and practice, DGL has completed its information service mapping and information delivery for the entire municipality of Dongguan on a 7-day and twentyfour-hour(7/24) basis. This singular feast of accomplishment seems to suggest the moral that any significant development of scale of a municipal library has to be in keeping pace closely with the progress of the society at large in general and with the changing information demands of its local clientele in particular.
文摘This research examines industry-based dissertation research in a doctoralcomputing program through the lens of machine learning algorithms todetermine if natural language processing-based categorization on abstractsalone is adequate for classification. This research categorizes dissertationby both their abstracts and by their full-text using the GraphLabCreate library from Apple’s Turi to identify if abstract analysis is anadequate measure of content categorization, which we found was not. Wealso compare the dissertation categorizations using IBM’s Watson Discoverydeep machine learning tool. Our research provides perspectiveson the practicality of the manual classification of technical documents;and, it provides insights into the: (1) categories of academic work createdby experienced fulltime working professionals in a Computing doctoralprogram, (2) viability and performance of automated categorization of theabstract analysis against the fulltext dissertation analysis, and (3) natuallanguage processing versus human manual text classification abstraction.
基金The author thanks Tieren Gao,Peer Decker,Alan Savan,and Manfred Wuttig for fruitful discussions.The authors gratefully acknowledge funding support by the National Science Foundation Graduate Research Fellowship Program(DGE 1322106).
文摘Ni-Ti-based shape memory alloys(SMAs)have found widespread use in the last 70 years,but improving their functional stability remains a key quest for more robust and advanced applications.Named for their ability to retain their processed shape as a result of a reversible martensitic transformation,SMAs are highly sensitive to compositional variations.Alloying with ternary and quaternary elements to finetune the lattice parameters and the thermal hysteresis of an SMA,therefore,becomes a challenge in materials exploration.Combinatorial materials science allows streamlining of the synthesis process and data management from multiple characterization techniques.In this study,a composition spread of Ni-Ti-Cu-V thin-film library was synthesized by magnetron co-sputtering on a thermally oxidized Si wafer.Composition-dependent phase transformation temperature and microstructure were investigated and determined using high-throughput wavelength dispersive spectroscopy,synchrotron X-ray diffraction,and temperature-dependent resistance measurements.Of the 177 compositions in the materials library,32 were observed to have shape memory effect,of which five had zero or near-zero thermal hysteresis.These compositions provide flexibility in the operating temperature regimes that they can be used in.A phase map for the quaternary system and correlations of functional properties are discussed w让h respect to the local microstructure and composition of the thin-film library.
基金Supported by the National High Performance Computation Foundation(984057)
文摘Software component library is the essential part of reuse-based softwaredevelopment. It is shown that making use of a single component library to store all kinds ofcomponents and from which components are searched is very inefficient. We construct multi-librariesto support software reuse and use PVM as development environments to imitate large-scale computer,which is expected to fulfill distributed storage and parallel search of components efficiently andimprove software reuse.
文摘基于高校图书馆大数据的大学生成绩预测对于推动高校图书馆的服务创新和高等教育数字化转型具有重要意义。文章针对鲜有图书馆利用数据用于大学生成绩预测模型构建的现状,结合高校教务处学业数据和图书馆利用数据,基于机器学习方法构建了大学生成绩预测模型。实验结果表明,对逻辑思维要求较高的科目对学生成绩有显著的正相关性;图书馆利用数据(如图书借阅、入馆次数等)与平均学分绩点(Grade Point Average,GPA)呈现明显的正相关关系。该研究旨在为高校图书馆精准化服务提供有力支持,并为高等教育数字化转型提供有益参考。