Removal of the length ef fect in otolith shape analysis for stock identification using length scaling is an important issue; however, few studies have attempted to investigate the ef fectiveness or weakness of this me...Removal of the length ef fect in otolith shape analysis for stock identification using length scaling is an important issue; however, few studies have attempted to investigate the ef fectiveness or weakness of this methodology in application. The aim of this study was to evaluate whether commonly used size scaling methods and normalized elliptic Fourier descriptors(NEFDs) could ef fectively remove the size ef fect of fish in stock discrimination. To achieve this goal, length groups from two known geographical stocks of yellow croaker, L arimichthys polyactis, along the Chinese coast(five groups from the Changjiang River estuary of the East China Sea and three groups from the Bohai Sea) were subjected to otolith shape analysis. The results indicated that the variation of otolith shape caused by intra-stock fish length might exceed that due to inter-stock geographical separation, even when otolith shape variables are standardized with length scaling methods. This variation could easily result in misleading stock discrimination through otolith shape analysis. Therefore, conclusions about fish stock structure should be carefully drawn from otolith shape analysis because the observed discrimination may primarily be due to length ef fects, rather than dif ferences among stocks. The application of multiple methods, such as otoliths shape analysis combined with elemental fingering, tagging or genetic analysis, is recommended for sock identification.展开更多
Data is a key asset for digital platforms,and mergers and acquisitions(M&As)are an important way for platform enterprises to acquire it.The types of data obtained from intra-industry and cross-sector M&As diff...Data is a key asset for digital platforms,and mergers and acquisitions(M&As)are an important way for platform enterprises to acquire it.The types of data obtained from intra-industry and cross-sector M&As differ,as does the extent to which they interact within or between platforms.The impact of such data on corporate market performance is an important question to consider when selecting strategies for digital platform M&As.Based on our research on advertising-driven platforms,we developed a two-stage Hotelling game model for comparing the market performance effects of intra-industry M&As and cross-sector M&As for digital platforms.We carried out an empirical test using relevant data from advertising-driven digital platforms between 2009 and 2021,as well as a case study on Baidu’s M&A activities.Our research discovered that intra-industry M&As driven by“data economies of scale”and cross-sector M&As driven by“data economies of scope”are both beneficial to the market performance of platform enterprises.Intra-industry M&As have a more significant positive effect on the market performance of platform enterprises because the same types of data are easier to integrate and develop the“network effect of data scale”.From a data factor perspective,this paper reveals the inherent economic logic by which different types of M&As influence the market performance of digital platforms,as well as policymaking recommendations for all digital platforms to select M&A strategies based on data scale,data scope,and the network effect of data.展开更多
An analytic massive total cross section of photon proton scattering is derived, which has geometric scaling. A geometric scaling is used to perform a global analysis of the deep inelastic scattering data on inclusive ...An analytic massive total cross section of photon proton scattering is derived, which has geometric scaling. A geometric scaling is used to perform a global analysis of the deep inelastic scattering data on inclusive structure function F2 measured in lepton-hadron scattering experiments at small values of Bjorken x. It is shown that the descriptions of the inclusive structure function F2 and longitudinal structure function FL are improved with the massive analytic structure function, which may imply the gluon saturation effect dominating the parton evolution process at HERA. The inclusion of the heavy quarks prevent the divergence of the lepton-hadron cross section, which plays a significant role in the description of the photoproduction region.展开更多
Recently a new clustering algorithm called 'affinity propagation' (AP) has been proposed, which efficiently clustered sparsely related data by passing messages between data points. However, we want to cluster ...Recently a new clustering algorithm called 'affinity propagation' (AP) has been proposed, which efficiently clustered sparsely related data by passing messages between data points. However, we want to cluster large scale data where the similarities are not sparse in many cases. This paper presents two variants of AP for grouping large scale data with a dense similarity matrix. The local approach is partition affinity propagation (PAP) and the global method is landmark affinity propagation (LAP). PAP passes messages in the subsets of data first and then merges them as the number of initial step of iterations; it can effectively reduce the number of iterations of clustering. LAP passes messages between the landmark data points first and then clusters non-landmark data points; it is a large global approximation method to speed up clustering. Experiments are conducted on many datasets, such as random data points, manifold subspaces, images of faces and Chinese calligraphy, and the results demonstrate that the two ap-proaches are feasible and practicable.展开更多
Metabolomics as a research field and a set of techniques is to study the entire small molecules in biological samples.Metabolomics is emerging as a powerful tool generally for pre-cision medicine.Particularly,integrat...Metabolomics as a research field and a set of techniques is to study the entire small molecules in biological samples.Metabolomics is emerging as a powerful tool generally for pre-cision medicine.Particularly,integration of microbiome and metabolome has revealed the mechanism and functionality of microbiome in human health and disease.However,metabo-lomics data are very complicated.Preprocessing/pretreating and normalizing procedures on metabolomics data are usually required before statistical analysis.In this review article,we comprehensively review various methods that are used to preprocess and pretreat metabolo-mics data,including MS-based data and NMR-based data preprocessing,dealing with zero and/or missing values and detecting outliers,data normalization,data centering and scaling,data transformation.We discuss the advantages and limitations of each method.The choice for a suitable preprocessing method is determined by the biological hypothesis,the characteristics of the data set,and the selected statistical data analysis method.We then provide the perspective of their applications in the microbiome and metabolome research.展开更多
A method is presented in this work that integrates both emerging and mature data sources to estimate the operational travel demand in fine spatial and temporal resolutions.By analyzing individuals’mobility patterns r...A method is presented in this work that integrates both emerging and mature data sources to estimate the operational travel demand in fine spatial and temporal resolutions.By analyzing individuals’mobility patterns revealed from their mobile phones,researchers and practitioners are now equipped to derive the largest trip samples for a region.Because of its ubiquitous use,extensive coverage of telecommunication services and high penetration rates,travel demand can be studied continuously in fine spatial and temporal resolutions.The derived sample or seed trip matrices are coupled with surveyed commute flow data and prevalent travel demand modeling techniques to provide estimates of the total regional travel demand in the form of origindestination(OD)matrices.The methodology is evaluated in a series of real world transportation planning studies and proved its potentials in application areas such as dynamic traffic assignment modeling,integrated corridor management and online traffic simulations.展开更多
Local diversity AdaBoost support vector machine(LDAB-SVM) is proposed for large scale dataset classification problems.The training dataset is split into several blocks firstly, and some models based on these dataset...Local diversity AdaBoost support vector machine(LDAB-SVM) is proposed for large scale dataset classification problems.The training dataset is split into several blocks firstly, and some models based on these dataset blocks are built.In order to obtain a better performance, AdaBoost is used in each model building.In the boosting iteration step, the component learners which have higher diversity and accuracy are collected via the kernel parameters adjusting.Then the local models via voting method are integrated.The experimental study shows that LDAB-SVM can deal with large scale dataset efficiently without reducing the performance of the classifier.展开更多
Many websites use verification codes to prevent users from using the machine automatically to register,login,malicious vote or irrigate but it brought great burden to the enterprises involved in internet marketing as ...Many websites use verification codes to prevent users from using the machine automatically to register,login,malicious vote or irrigate but it brought great burden to the enterprises involved in internet marketing as entering the verification code manually.Improving the verification code security system needs the identification method as the corresponding testing system.We propose an anisotropic heat kernel equation group which can generate a heat source scale space during the kernel evolution based on infinite heat source axiom,design a multi-step anisotropic verification code identification algorithm which includes core procedure of building anisotropic heat kernel,settingwave energy information parameters,combing outverification codccharacters and corresponding peripheral procedure of gray scaling,binarizing,denoising,normalizing,segmenting and identifying,give out the detail criterion and parameter set.Actual test show the anisotropic heat kernel identification algorithm can be used on many kinds of verification code including text characters,mathematical,Chinese,voice,3D,programming,video,advertising,it has a higher rate of 25%and 50%than neural network and context matching algorithm separately for Yahoo site,49%and 60%for Captcha site,20%and 52%for Baidu site,60%and 65%for 3DTakers site,40%,and 51%.for MDP site.展开更多
Temporal and spatial scales play important roles in fishery ecology,and an inappropriate spatio-temporal scale may result in large errors in modeling fish distribution.The objective of this study is to evaluate the ro...Temporal and spatial scales play important roles in fishery ecology,and an inappropriate spatio-temporal scale may result in large errors in modeling fish distribution.The objective of this study is to evaluate the roles of spatio-temporal scales in habitat suitability modeling,with the western stock of winter-spring cohort of neon flying squid (Ornmastrephes bartramii) in the northwest Pacific Ocean as an example.In this study,the fishery-dependent data from the Chinese Mainland Squid Jigging Technical Group and sea surface temperature (SST) from remote sensing during August to October of 2003-2008 were used.We evaluated the differences in a habitat suitability index model resulting from aggregating data with 36 different spatial scales with a combination of three latitude scales (0.5°,1 ° and 2°),four longitude scales (0.5°,1°,2° and 4°),and three temporal scales (week,fortnight,and month).The coefficients of variation (CV) of the weekly,biweekly and monthly suitability index (SI) were compared to determine which temporal and spatial scales of SI model are more precise.This study shows that the optimal temporal and spatial scales with the lowest CV are month,and 0.5° latitude and 0.5° longitude for O.bartramii in the northwest Pacific Ocean.This suitability index model developed with an optimal scale can be cost-effective in improving forecasting fishing ground and requires no excessive sampling efforts.We suggest that the uncertainty associated with spatial and temporal scales used in data aggregations needs to be considered in habitat suitability modeling.展开更多
Bus reliability has long attracted attention and been extensively studied to enhance service quality.However,existing research generally evaluates bus reliability of specific routes or stops.To this end,this study exp...Bus reliability has long attracted attention and been extensively studied to enhance service quality.However,existing research generally evaluates bus reliability of specific routes or stops.To this end,this study explores en-route bus reliability with real-time data at network scale.Drawing on data of bus automatic vehicle location and smart card usage in Ningbo,China,this study calculates headway-based reliability with the difference between actual and scheduled headway at each stop.To demonstrate the trend of stop-level reliability along a bus route,reliability is graded and visualized on a map with ridership at each stop,which is then weighted with passenger-boarding volume.Route-level reliability is then quantified and mapped,where unreliable service basically concentrates in or extends through the centre area.With respect to network-level reliability,temporal changes are demonstrated with ridership on weekdays and at the weekend.It is observed that on weekdays,the reliability trend is similar to that of ridership,implying a causal relationship between bus travel-time variation and bus waiting-time at stops.Furthermore,a reliability comparison between weekdays in December and October shows the necessity of evaluating periodically and around important events to avoid negative riding experiences that discourage public transport usage.This research provides insights for bus agencies to systematically evaluate service reliability both spatially and temporarily,in order to identify and prioritize the routes and stops where the scope for reliability improvement and the expected benefit are greatest.展开更多
Hierarchical clustering algorithm has been applied to identify the X-ray diffraction(XRD)patterns from a high-throughput characterization of the combinatorial materials chips.As data quality is usually correlated with...Hierarchical clustering algorithm has been applied to identify the X-ray diffraction(XRD)patterns from a high-throughput characterization of the combinatorial materials chips.As data quality is usually correlated with acquisition time,it is important to study the hierarchical clustering performance as a function of data quality in order to optimize the efficiency of high-throughput experiments.This work investigated the effects of signal-to-noise ratio on the performance of hier-archical clustering using 29 distance metrics for the XRD patterns from Fe−Co−Ni ternary combinatorial materials chip.It is found that the clustering accuracies evaluated by the F1 score only fluctuate slightly with signal-to-noise ratio varying from 15.5 to 22.3(dB)under the experimental condition.This suggests that although it may take 40-50 s to collect a visually high-quality diffraction pattern,the measurement time could be significantly reduced to as low as 4 s without substantial loss in phase identification accuracy by hierarchical clustering.Among the 29 distance metrics,Pearsonχ^(2)shows the highest mean F1 score of 0.77 and lowest standard deviation of 0.008.It shows that the distance matrixes calculated by Pearsonχ^(2)are mainly controlled by the XRD peak shifting characteristics and visualized by the metric multidimensional data scaling.展开更多
基金Supported by the National Basic Research Program of China(973 Program)(No.2015CB453302)the NSFC-Shandong Joint Fund for Marine Science Research Centre(No.U1606404)the Aoshan Science and Technology Innovation Project(No.2015ASKJ02-04)
文摘Removal of the length ef fect in otolith shape analysis for stock identification using length scaling is an important issue; however, few studies have attempted to investigate the ef fectiveness or weakness of this methodology in application. The aim of this study was to evaluate whether commonly used size scaling methods and normalized elliptic Fourier descriptors(NEFDs) could ef fectively remove the size ef fect of fish in stock discrimination. To achieve this goal, length groups from two known geographical stocks of yellow croaker, L arimichthys polyactis, along the Chinese coast(five groups from the Changjiang River estuary of the East China Sea and three groups from the Bohai Sea) were subjected to otolith shape analysis. The results indicated that the variation of otolith shape caused by intra-stock fish length might exceed that due to inter-stock geographical separation, even when otolith shape variables are standardized with length scaling methods. This variation could easily result in misleading stock discrimination through otolith shape analysis. Therefore, conclusions about fish stock structure should be carefully drawn from otolith shape analysis because the observed discrimination may primarily be due to length ef fects, rather than dif ferences among stocks. The application of multiple methods, such as otoliths shape analysis combined with elemental fingering, tagging or genetic analysis, is recommended for sock identification.
基金supported by the National Natural Science Foundation of China“Research on Cross-sector Competition Effect and Regulatory Policy of Digital Platforms Based on Inter-platform Network Externalities”(Grant No.72103085).
文摘Data is a key asset for digital platforms,and mergers and acquisitions(M&As)are an important way for platform enterprises to acquire it.The types of data obtained from intra-industry and cross-sector M&As differ,as does the extent to which they interact within or between platforms.The impact of such data on corporate market performance is an important question to consider when selecting strategies for digital platform M&As.Based on our research on advertising-driven platforms,we developed a two-stage Hotelling game model for comparing the market performance effects of intra-industry M&As and cross-sector M&As for digital platforms.We carried out an empirical test using relevant data from advertising-driven digital platforms between 2009 and 2021,as well as a case study on Baidu’s M&A activities.Our research discovered that intra-industry M&As driven by“data economies of scale”and cross-sector M&As driven by“data economies of scope”are both beneficial to the market performance of platform enterprises.Intra-industry M&As have a more significant positive effect on the market performance of platform enterprises because the same types of data are easier to integrate and develop the“network effect of data scale”.From a data factor perspective,this paper reveals the inherent economic logic by which different types of M&As influence the market performance of digital platforms,as well as policymaking recommendations for all digital platforms to select M&A strategies based on data scale,data scope,and the network effect of data.
基金Supported by the National Natural Science Foundation of China under Grant Nos 11305040,11375071 and 11447203the Education Department of Guizhou Province Innovation Talent Fund under Grant No[2015]5508+2 种基金the Education Department of Guizhou Province Innovation Team Fund under Grant No[2014]35the Guizhou Province Science Technology Foundation under Grant No[2015]2114the Guizhou Province Innovation Talent Team Fund under Grant No[2015]4015
文摘An analytic massive total cross section of photon proton scattering is derived, which has geometric scaling. A geometric scaling is used to perform a global analysis of the deep inelastic scattering data on inclusive structure function F2 measured in lepton-hadron scattering experiments at small values of Bjorken x. It is shown that the descriptions of the inclusive structure function F2 and longitudinal structure function FL are improved with the massive analytic structure function, which may imply the gluon saturation effect dominating the parton evolution process at HERA. The inclusion of the heavy quarks prevent the divergence of the lepton-hadron cross section, which plays a significant role in the description of the photoproduction region.
基金the National Natural Science Foundation of China (Nos. 60533090 and 60603096)the National Hi-Tech Research and Development Program (863) of China (No. 2006AA010107)+2 种基金the Key Technology R&D Program of China (No. 2006BAH02A13-4)the Program for Changjiang Scholars and Innovative Research Team in University of China (No. IRT0652)the Cultivation Fund of the Key Scientific and Technical Innovation Project of MOE, China (No. 706033)
文摘Recently a new clustering algorithm called 'affinity propagation' (AP) has been proposed, which efficiently clustered sparsely related data by passing messages between data points. However, we want to cluster large scale data where the similarities are not sparse in many cases. This paper presents two variants of AP for grouping large scale data with a dense similarity matrix. The local approach is partition affinity propagation (PAP) and the global method is landmark affinity propagation (LAP). PAP passes messages in the subsets of data first and then merges them as the number of initial step of iterations; it can effectively reduce the number of iterations of clustering. LAP passes messages between the landmark data points first and then clusters non-landmark data points; it is a large global approximation method to speed up clustering. Experiments are conducted on many datasets, such as random data points, manifold subspaces, images of faces and Chinese calligraphy, and the results demonstrate that the two ap-proaches are feasible and practicable.
基金supported by the Crohn's&Colitis Foundation Senior Research Award(No.902766 to J.S.)The National Institute of Diabetes and Digestive and Kidney Diseases(No.R01DK105118-01 and R01DK114126 to J.S.)+1 种基金United States Department of Defense Congressionally Directed Medical Research Programs(No.BC191198 to J.S.)VA Merit Award BX-19-00 to J.S.
文摘Metabolomics as a research field and a set of techniques is to study the entire small molecules in biological samples.Metabolomics is emerging as a powerful tool generally for pre-cision medicine.Particularly,integration of microbiome and metabolome has revealed the mechanism and functionality of microbiome in human health and disease.However,metabo-lomics data are very complicated.Preprocessing/pretreating and normalizing procedures on metabolomics data are usually required before statistical analysis.In this review article,we comprehensively review various methods that are used to preprocess and pretreat metabolo-mics data,including MS-based data and NMR-based data preprocessing,dealing with zero and/or missing values and detecting outliers,data normalization,data centering and scaling,data transformation.We discuss the advantages and limitations of each method.The choice for a suitable preprocessing method is determined by the biological hypothesis,the characteristics of the data set,and the selected statistical data analysis method.We then provide the perspective of their applications in the microbiome and metabolome research.
文摘A method is presented in this work that integrates both emerging and mature data sources to estimate the operational travel demand in fine spatial and temporal resolutions.By analyzing individuals’mobility patterns revealed from their mobile phones,researchers and practitioners are now equipped to derive the largest trip samples for a region.Because of its ubiquitous use,extensive coverage of telecommunication services and high penetration rates,travel demand can be studied continuously in fine spatial and temporal resolutions.The derived sample or seed trip matrices are coupled with surveyed commute flow data and prevalent travel demand modeling techniques to provide estimates of the total regional travel demand in the form of origindestination(OD)matrices.The methodology is evaluated in a series of real world transportation planning studies and proved its potentials in application areas such as dynamic traffic assignment modeling,integrated corridor management and online traffic simulations.
基金supported by the National Natural Science Foundation of China (60603098)
文摘Local diversity AdaBoost support vector machine(LDAB-SVM) is proposed for large scale dataset classification problems.The training dataset is split into several blocks firstly, and some models based on these dataset blocks are built.In order to obtain a better performance, AdaBoost is used in each model building.In the boosting iteration step, the component learners which have higher diversity and accuracy are collected via the kernel parameters adjusting.Then the local models via voting method are integrated.The experimental study shows that LDAB-SVM can deal with large scale dataset efficiently without reducing the performance of the classifier.
基金The national natural science foundation(61273290,61373147)Xiamen Scientific Plan Project(2014S0048,3502Z20123037)+1 种基金Fujian Scientific Plan Project(2013HZ0004-1)FuJian provincial education office A-class project(-JA13238)
文摘Many websites use verification codes to prevent users from using the machine automatically to register,login,malicious vote or irrigate but it brought great burden to the enterprises involved in internet marketing as entering the verification code manually.Improving the verification code security system needs the identification method as the corresponding testing system.We propose an anisotropic heat kernel equation group which can generate a heat source scale space during the kernel evolution based on infinite heat source axiom,design a multi-step anisotropic verification code identification algorithm which includes core procedure of building anisotropic heat kernel,settingwave energy information parameters,combing outverification codccharacters and corresponding peripheral procedure of gray scaling,binarizing,denoising,normalizing,segmenting and identifying,give out the detail criterion and parameter set.Actual test show the anisotropic heat kernel identification algorithm can be used on many kinds of verification code including text characters,mathematical,Chinese,voice,3D,programming,video,advertising,it has a higher rate of 25%and 50%than neural network and context matching algorithm separately for Yahoo site,49%and 60%for Captcha site,20%and 52%for Baidu site,60%and 65%for 3DTakers site,40%,and 51%.for MDP site.
基金funded by National High Technology Research and Development Program of China (863 Program,2012AA092303)Project of Shanghai Science and Technology Innovation (12231203900)+2 种基金Industrialization Program of National Development and Reform Commission (2159999)National Science and Technology Support Program (2013BAD13B01)Shanghai Leading Academic Discipline Project
文摘Temporal and spatial scales play important roles in fishery ecology,and an inappropriate spatio-temporal scale may result in large errors in modeling fish distribution.The objective of this study is to evaluate the roles of spatio-temporal scales in habitat suitability modeling,with the western stock of winter-spring cohort of neon flying squid (Ornmastrephes bartramii) in the northwest Pacific Ocean as an example.In this study,the fishery-dependent data from the Chinese Mainland Squid Jigging Technical Group and sea surface temperature (SST) from remote sensing during August to October of 2003-2008 were used.We evaluated the differences in a habitat suitability index model resulting from aggregating data with 36 different spatial scales with a combination of three latitude scales (0.5°,1 ° and 2°),four longitude scales (0.5°,1°,2° and 4°),and three temporal scales (week,fortnight,and month).The coefficients of variation (CV) of the weekly,biweekly and monthly suitability index (SI) were compared to determine which temporal and spatial scales of SI model are more precise.This study shows that the optimal temporal and spatial scales with the lowest CV are month,and 0.5° latitude and 0.5° longitude for O.bartramii in the northwest Pacific Ocean.This suitability index model developed with an optimal scale can be cost-effective in improving forecasting fishing ground and requires no excessive sampling efforts.We suggest that the uncertainty associated with spatial and temporal scales used in data aggregations needs to be considered in habitat suitability modeling.
文摘Bus reliability has long attracted attention and been extensively studied to enhance service quality.However,existing research generally evaluates bus reliability of specific routes or stops.To this end,this study explores en-route bus reliability with real-time data at network scale.Drawing on data of bus automatic vehicle location and smart card usage in Ningbo,China,this study calculates headway-based reliability with the difference between actual and scheduled headway at each stop.To demonstrate the trend of stop-level reliability along a bus route,reliability is graded and visualized on a map with ridership at each stop,which is then weighted with passenger-boarding volume.Route-level reliability is then quantified and mapped,where unreliable service basically concentrates in or extends through the centre area.With respect to network-level reliability,temporal changes are demonstrated with ridership on weekdays and at the weekend.It is observed that on weekdays,the reliability trend is similar to that of ridership,implying a causal relationship between bus travel-time variation and bus waiting-time at stops.Furthermore,a reliability comparison between weekdays in December and October shows the necessity of evaluating periodically and around important events to avoid negative riding experiences that discourage public transport usage.This research provides insights for bus agencies to systematically evaluate service reliability both spatially and temporarily,in order to identify and prioritize the routes and stops where the scope for reliability improvement and the expected benefit are greatest.
基金funded by the National Key Research and Development Program of China(Grant Nos.2021YFB370-2102 and 2017YFB0701900).
文摘Hierarchical clustering algorithm has been applied to identify the X-ray diffraction(XRD)patterns from a high-throughput characterization of the combinatorial materials chips.As data quality is usually correlated with acquisition time,it is important to study the hierarchical clustering performance as a function of data quality in order to optimize the efficiency of high-throughput experiments.This work investigated the effects of signal-to-noise ratio on the performance of hier-archical clustering using 29 distance metrics for the XRD patterns from Fe−Co−Ni ternary combinatorial materials chip.It is found that the clustering accuracies evaluated by the F1 score only fluctuate slightly with signal-to-noise ratio varying from 15.5 to 22.3(dB)under the experimental condition.This suggests that although it may take 40-50 s to collect a visually high-quality diffraction pattern,the measurement time could be significantly reduced to as low as 4 s without substantial loss in phase identification accuracy by hierarchical clustering.Among the 29 distance metrics,Pearsonχ^(2)shows the highest mean F1 score of 0.77 and lowest standard deviation of 0.008.It shows that the distance matrixes calculated by Pearsonχ^(2)are mainly controlled by the XRD peak shifting characteristics and visualized by the metric multidimensional data scaling.