期刊文献+
共找到1,643篇文章
< 1 2 83 >
每页显示 20 50 100
Connes' Distance of One-Dimensional Lattices:General Cases
1
作者 DAI Jian SONG Xing-Chang 《Communications in Theoretical Physics》 SCIE CAS CSCD 2001年第11期519-522,共4页
Connes' distance formula is applied to endow linear metric to three 1D lattices of different topologies with a generalization of lattice Dirac operator written down by Dimakis et al.to contain a non-unitary link-v... Connes' distance formula is applied to endow linear metric to three 1D lattices of different topologies with a generalization of lattice Dirac operator written down by Dimakis et al.to contain a non-unitary link-variable.Geometric interpretation of this link-variable is lattice spacing and parallel transport. 展开更多
关键词 Connes' distance one-dimensional lattice Dirac operator link-variable lattice spacing parallel transport
下载PDF
Data processing of small samples based on grey distance information approach 被引量:14
2
作者 Ke Hongfa, Chen Yongguang & Liu Yi 1. Coll. of Electronic Science and Engineering, National Univ. of Defense Technology, Changsha 410073, P. R. China 2. Unit 63880, Luoyang 471003, P. R. China 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2007年第2期281-289,共9页
Data processing of small samples is an important and valuable research problem in the electronic equipment test. Because it is difficult and complex to determine the probability distribution of small samples, it is di... Data processing of small samples is an important and valuable research problem in the electronic equipment test. Because it is difficult and complex to determine the probability distribution of small samples, it is difficult to use the traditional probability theory to process the samples and assess the degree of uncertainty. Using the grey relational theory and the norm theory, the grey distance information approach, which is based on the grey distance information quantity of a sample and the average grey distance information quantity of the samples, is proposed in this article. The definitions of the grey distance information quantity of a sample and the average grey distance information quantity of the samples, with their characteristics and algorithms, are introduced. The correlative problems, including the algorithm of estimated value, the standard deviation, and the acceptance and rejection criteria of the samples and estimated results, are also proposed. Moreover, the information whitening ratio is introduced to select the weight algorithm and to compare the different samples. Several examples are given to demonstrate the application of the proposed approach. The examples show that the proposed approach, which has no demand for the probability distribution of small samples, is feasible and effective. 展开更多
关键词 data processing Grey theory Norm theory Small samples Uncertainty assessments Grey distance measure Information whitening ratio.
下载PDF
DISTINIT:Data poISoning atTacks dectectIon usiNg optIized jaCcard disTance
3
作者 Maria Sameen Seong Oun Hwang 《Computers, Materials & Continua》 SCIE EI 2022年第12期4559-4576,共18页
Machine Learning(ML)systems often involve a re-training process to make better predictions and classifications.This re-training process creates a loophole and poses a security threat for ML systems.Adversaries leverag... Machine Learning(ML)systems often involve a re-training process to make better predictions and classifications.This re-training process creates a loophole and poses a security threat for ML systems.Adversaries leverage this loophole and design data poisoning attacks against ML systems.Data poisoning attacks are a type of attack in which an adversary manipulates the training dataset to degrade the ML system’s performance.Data poisoning attacks are challenging to detect,and even more difficult to respond to,particularly in the Internet of Things(IoT)environment.To address this problem,we proposed DISTINIT,the first proactive data poisoning attack detection framework using distancemeasures.We found that Jaccard Distance(JD)can be used in the DISTINIT(among other distance measures)and we finally improved the JD to attain an Optimized JD(OJD)with lower time and space complexity.Our security analysis shows that the DISTINIT is secure against data poisoning attacks by considering key features of adversarial attacks.We conclude that the proposed OJD-based DISTINIT is effective and efficient against data poisoning attacks where in-time detection is critical for IoT applications with large volumes of streaming data. 展开更多
关键词 data poisoning attacks detection framework jaccard distance(JD) optimized jaccard distance(OJD) security analysis
下载PDF
Multi-Attribute Couplings-Based Euclidean and Nominal Distances for Unlabeled Nominal Data
4
作者 Lei Gu Furong Zhang Li Ma 《Computers, Materials & Continua》 SCIE EI 2023年第6期5911-5928,共18页
Learning unlabeled data is a significant challenge that needs to han-dle complicated relationships between nominal values and attributes.Increas-ingly,recent research on learning value relations within and between att... Learning unlabeled data is a significant challenge that needs to han-dle complicated relationships between nominal values and attributes.Increas-ingly,recent research on learning value relations within and between attributes has shown significant improvement in clustering and outlier detection,etc.However,typical existing work relies on learning pairwise value relations but weakens or overlooks the direct couplings between multiple attributes.This paper thus proposes two novel and flexible multi-attribute couplings-based distance(MCD)metrics,which learn the multi-attribute couplings and their strengths in nominal data based on information theories:self-information,entropy,and mutual information,for measuring both numerical and nominal distances.MCD enables the application of numerical and nominal clustering methods on nominal data and quantifies the influence of involving and filtering multi-attribute couplings on distance learning and clustering perfor-mance.Substantial experiments evidence the above conclusions on 15 data sets against seven state-of-the-art distance measures with various feature selection methods for both numerical and nominal clustering. 展开更多
关键词 Nominal data distance metrics attribute couplings dissimilarity measures
下载PDF
Determination of Kolmogorov Entropy of Chaotic Attractor Included in One-Dimensional Time Series of Meteorological Data
5
作者 严绍瑾 彭永清 王建中 《Advances in Atmospheric Sciences》 SCIE CAS CSCD 1991年第2期243-250,共8页
The 1970-1985 day to day averaged pressure dataset of Shanghai and the extension method in phase space are used to calculate the correlation dimension D and the second-order Renyi entropy K2 of the approximation of Ko... The 1970-1985 day to day averaged pressure dataset of Shanghai and the extension method in phase space are used to calculate the correlation dimension D and the second-order Renyi entropy K2 of the approximation of Kolmogorov's entropy, the fractional dimension D = 7.7-7.9 and the positive value K2 - 0.1 are obtained. This shows that the attractor for the short-term weather evolution in the monsoon region of China exhibits a chaotic motion. The estimate of K2 yields a predictable time scale of about ten days. This result is in agreement with that obtained earlier by the dynamic-statistical approach.The effects of the lag time i on the estimate of D and K2 are investigated. The results show that D and K2 are convergent with respect to i. The day to day averaged pressure series used in this paper are treated for the extensive phase space with T = 5, the coordinate components are independent of each other; therefore, the dynamical character quantities of the system are stable and reliable. 展开更多
关键词 Determination of Kolmogorov Entropy of Chaotic Attractor Included in one-dimensional Time Series of Meteorological data
下载PDF
pgi-distance:一种高效的并行KNN-join处理方法 被引量:3
6
作者 何洪辉 王丽珍 周丽华 《计算机研究与发展》 EI CSCD 北大核心 2007年第10期1774-1781,共8页
KNN-join是一种新近才提出的操作,它在数据挖掘中有着广泛的应用.利用KNN-join的"一次一个集合"的性质,一些数据挖掘任务,例如分类、例外挖掘和聚类等,就会更加容易地进行.MuX和Goreder则是两种专为KNN-join设计的算法.为了... KNN-join是一种新近才提出的操作,它在数据挖掘中有着广泛的应用.利用KNN-join的"一次一个集合"的性质,一些数据挖掘任务,例如分类、例外挖掘和聚类等,就会更加容易地进行.MuX和Goreder则是两种专为KNN-join设计的算法.为了综合利用这两种方法的优点,一种新的KNN-join并行处理方法——pgi-distance(parallel grid index-distance)——被提了出来.pgi-distance使用双层结构,可以对I/O和CPU进行同时优化;基于距离的索引能够让它更好地适应数据维度和分布的变化.由于采用的是各DBMS厂商广泛支持的B+树索引,这让pgi-distance得以成为一种更为实用的KNN-join处理方法.在合成数据集和真实数据集上的测试也表明pgi-distance是实用的和高效的. 展开更多
关键词 KNN-join 数据挖掘 分类 基于距离的索引 B+树
下载PDF
A new fusion approach based on distance of evidences 被引量:4
7
作者 陈良洲 施文康 +1 位作者 邓勇 朱振福 《Journal of Zhejiang University-Science A(Applied Physics & Engineering)》 SCIE EI CAS CSCD 2005年第5期476-482,共7页
Based on the framework of evidence theory, data fusion aims at obtaining a single Basic Probability Assignment (BPA) function by combining several belief functions from distinct information sources. Dempster’s rule o... Based on the framework of evidence theory, data fusion aims at obtaining a single Basic Probability Assignment (BPA) function by combining several belief functions from distinct information sources. Dempster’s rule of combination is the most popular rule of combinations, but it is a poor solution for the management of the conflict between various information sources at the normalization step. Even when it faces high conflict information, the classical Dempster-Shafer’s (D-S) evidence theory can involve counter-intuitive results. This paper presents a modified averaging method to combine conflicting evidence based on the distance of evidences; and also gives the weighted average of the evidence in the system. Numerical examples showed that the proposed method can realize the modification ideas and also will provide reasonable results with good convergence efficiency. 展开更多
关键词 data fusion Evidence distance Conflicting evidence Evidence credibility Combination rules
下载PDF
One-Dimensional Variational Retrieval of Temperature and Humidity Profiles from the FY4A GIIRS 被引量:4
8
作者 Qiumeng XUE Li GUAN Xiaoning SHI 《Advances in Atmospheric Sciences》 SCIE CAS CSCD 2022年第3期471-486,共16页
A physical retrieval approach based on the one-dimensional variational(1 D-Var) algorithm is applied in this paper to simultaneously retrieve atmospheric temperature and humidity profiles under both clear-sky and part... A physical retrieval approach based on the one-dimensional variational(1 D-Var) algorithm is applied in this paper to simultaneously retrieve atmospheric temperature and humidity profiles under both clear-sky and partly cloudy conditions from FY-4 A GIIRS(geostationary interferometric infrared sounder) observations. Radiosonde observations from upper-air stations in China and level-2 operational products from the Chinese National Satellite Meteorological Center(NSMC)during the periods from December 2019 to January 2020(winter) and from July 2020 to August 2020(summer) are used to validate the accuracies of the retrieved temperature and humidity profiles. Comparing the 1 D-Var-retrieved profiles to radiosonde data, the accuracy of the temperature retrievals at each vertical level of the troposphere is characterized by a root mean square error(RMSE) within 2 K, except for at the bottom level of the atmosphere under clear conditions. The RMSE increases slightly for the higher atmospheric layers, owing to the lack of temperature sounding channels there.Under partly cloudy conditions, the temperature at each vertical level can be obtained, while the level-2 operational products obtain values only at altitudes above the cloud top. In addition, the accuracy of the retrieved temperature profiles is greatly improved compared with the accuracies of the operational products. For the humidity retrievals, the mean RMSEs in the troposphere in winter and summer are both within 2 g kg^(–1). Moreover, the retrievals performed better compared with the ERA5 reanalysis data between 800 h Pa and 300 h Pa both in summer and winter in terms of RMSE. 展开更多
关键词 temperature and humidity profiles one-dimensional variational(1D-Var) GIIRS hyperspectral data
下载PDF
One-dimensional horizontal infiltration experiment for determining permeability coefficient of loamy sand 被引量:3
9
作者 HU Shunjun ZHU Hai CHEN Yongbao 《Journal of Arid Land》 SCIE CSCD 2017年第1期27-37,共11页
A knowledge of soil permeability is essential to evaluate hydrologic characteristics of soil, such as water storage and water movement, and soil permeability coefficient is an important parameter that reflects soil pe... A knowledge of soil permeability is essential to evaluate hydrologic characteristics of soil, such as water storage and water movement, and soil permeability coefficient is an important parameter that reflects soil permeability. In order to confirm the acceptability of the one-dimensional horizontal infiltration method(one-D method) for simultaneously determining both the saturated and unsaturated permeability coefficients of loamy sand, we first measured the cumulative infiltration and the wetting front distance under various infiltration heads through a series of one-dimensional horizontal infiltration experiments, and then analyzed the relationships of the cumulative horizontal infiltration with the wetting front distance and the square root of infiltration time. We finally compared the permeability results from Gardner model based on the one-D method with the results from other two commonly-used methods(i.e., constant head method and van Genuchten model) to evaluate the acceptability and applicability of the one-D method. The results showed that there was a robust linear relationship between the cumulative horizontal infiltration and the wetting front distance, suggesting that it is more appropriate to take the soil moisture content after infiltration in the entire wetted zone as the average soil moisture content than as the saturated soil moisture content. The results also showed that there was a robust linear relationship between the cumulative horizontal infiltration and the square root of infiltration time, suggesting that the Philip infiltration formula can better reflect the characteristics of cumulative horizontal infiltration under different infiltration heads. The following two facts indicate that it is feasible to use the one-D method for simultaneously determining the saturated and unsaturated permeability coefficients of loamy sand. First, the saturated permeability coefficient(prescribed in the Gardner model) of loamy sand obtained from the one-D method well agreed with the value obtained from the constant head method. Second, the relationship of unsaturated permeability coefficient with soil water suction for loamy sand calculated using Gardner model based on the one-D method was nearly identical with the same relationship calculated using van Genuchten model. 展开更多
关键词 permeability coefficient one-dimensional horizontal infiltration cumulative horizontal infiltration wetting front distance Philip infiltration formula Gardner model
下载PDF
Multiple targets vector miss distance measurement accuracy based on 2-D assignment algorithms 被引量:1
10
作者 Fang Bingyi Wu Siliang 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2008年第1期76-80,共5页
An extension of 2-D assignment approach is proposed for measurement-to-target association for improving multiple targets vector miss distance measurement accuracy. When the multiple targets move so closely, the measur... An extension of 2-D assignment approach is proposed for measurement-to-target association for improving multiple targets vector miss distance measurement accuracy. When the multiple targets move so closely, the measurements can not be fully resolved due to finite resolution. The proposed method adopts an auction algorithm to compute the feasible measurement-to-target assignment with unresolved measurements for solving this 2-D assignment problem. Computer simulation results demonstrate the effectiveness and feasibility of this method. 展开更多
关键词 miss distance 2-D assignment auction algorithm data association
下载PDF
A DTW distance-based seismic waveform clustering method for layers of varying thickness 被引量:1
11
作者 Hong Zhong Li Kun-Hong +4 位作者 Su Ming-Jun Hu Guang-Min Yang Jun Gao Gai Hao Bin 《Applied Geophysics》 SCIE CSCD 2020年第2期171-181,314,共12页
Seismic waveform clustering is a useful technique for lithologic identification and reservoir characterization.The current seismic waveform clustering algorithms are predominantly based on a fixed time window,which is... Seismic waveform clustering is a useful technique for lithologic identification and reservoir characterization.The current seismic waveform clustering algorithms are predominantly based on a fixed time window,which is applicable for layers of stable thickness.When a layer exhibits variable thickness in the seismic response,a fixed time window cannot provide comprehensive geologic information for the target interval.Therefore,we propose a novel approach for a waveform clustering workfl ow based on a variable time window to enable broader applications.The dynamic time warping(DTW)distance is fi rst introduced to effectively measure the similarities between seismic waveforms with various lengths.We develop a DTW distance-based clustering algorithm to extract centroids,and we then determine the class of all seismic traces according to the DTW distances from centroids.To greatly reduce the computational complexity in seismic data application,we propose a superpixel-based seismic data thinning approach.We further propose an integrated workfl ow that can be applied to practical seismic data by incorporating the DTW distance-based clustering and seismic data thinning algorithms.We evaluated the performance by applying the proposed workfl ow to synthetic seismograms and seismic survey data.Compared with the the traditional waveform clustering method,the synthetic seismogram results demonstrate the enhanced capability of the proposed workfl ow to detect boundaries of diff erent lithologies or lithologic associations with variable thickness.Results from a practical application show that the planar map of seismic waveform clustering obtained by the proposed workfl ow correlates well with the geological characteristics of wells in terms of reservoir thickness. 展开更多
关键词 DTW distance seismic waveform clustering variable time window seismic data thinning
下载PDF
An Intelligent Graph Edit Distance-Based Approach for Finding Business Process Similarities 被引量:1
12
作者 Abid Sohail Ammar Haseeb +2 位作者 Mobashar Rehman Dhanapal Durai Dominic Muhammad Arif Butt 《Computers, Materials & Continua》 SCIE EI 2021年第12期3603-3618,共16页
There are numerous application areas of computing similarity between process models.It includes finding similar models from a repository,controlling redundancy of process models,and finding corresponding activities be... There are numerous application areas of computing similarity between process models.It includes finding similar models from a repository,controlling redundancy of process models,and finding corresponding activities between a pair of process models.The similarity between two process models is computed based on their similarity between labels,structures,and execution behaviors.Several attempts have been made to develop similarity techniques between activity labels,as well as their execution behavior.However,a notable problem with the process model similarity is that two process models can also be similar if there is a structural variation between them.However,neither a benchmark dataset exists for the structural similarity between process models nor there exist an effective technique to compute structural similarity.To that end,we have developed a large collection of process models in which structural changes are handcrafted while preserving the semantics of the models.Furthermore,we have used a machine learning-based approach to compute the similarity between a pair of process models having structural and label differences.Finally,we have evaluated the proposed approach using our generated collection of process models. 展开更多
关键词 Machine learning intelligent data management similarities of process models structural metrics dataSET graph edit distance process matching artificial intelligence
下载PDF
An Imbalanced Data Classification Method Based on Hybrid Resampling and Fine Cost Sensitive Support Vector Machine 被引量:2
13
作者 Bo Zhu Xiaona Jing +1 位作者 Lan Qiu Runbo Li 《Computers, Materials & Continua》 SCIE EI 2024年第6期3977-3999,共23页
When building a classification model,the scenario where the samples of one class are significantly more than those of the other class is called data imbalance.Data imbalance causes the trained classification model to ... When building a classification model,the scenario where the samples of one class are significantly more than those of the other class is called data imbalance.Data imbalance causes the trained classification model to be in favor of the majority class(usually defined as the negative class),which may do harm to the accuracy of the minority class(usually defined as the positive class),and then lead to poor overall performance of the model.A method called MSHR-FCSSVM for solving imbalanced data classification is proposed in this article,which is based on a new hybrid resampling approach(MSHR)and a new fine cost-sensitive support vector machine(CS-SVM)classifier(FCSSVM).The MSHR measures the separability of each negative sample through its Silhouette value calculated by Mahalanobis distance between samples,based on which,the so-called pseudo-negative samples are screened out to generate new positive samples(over-sampling step)through linear interpolation and are deleted finally(under-sampling step).This approach replaces pseudo-negative samples with generated new positive samples one by one to clear up the inter-class overlap on the borderline,without changing the overall scale of the dataset.The FCSSVM is an improved version of the traditional CS-SVM.It considers influences of both the imbalance of sample number and the class distribution on classification simultaneously,and through finely tuning the class cost weights by using the efficient optimization algorithm based on the physical phenomenon of rime-ice(RIME)algorithm with cross-validation accuracy as the fitness function to accurately adjust the classification borderline.To verify the effectiveness of the proposed method,a series of experiments are carried out based on 20 imbalanced datasets including both mildly and extremely imbalanced datasets.The experimental results show that the MSHR-FCSSVM method performs better than the methods for comparison in most cases,and both the MSHR and the FCSSVM played significant roles. 展开更多
关键词 Imbalanced data classification Silhouette value Mahalanobis distance RIME algorithm CS-SVM
下载PDF
Distance function selection in several clustering algorithms
14
作者 LUYu 《Journal of Chongqing University》 CAS 2004年第1期47-50,共4页
Most clustering algorithms need to describe the similarity of objects by a predefined distance function. Three distance functions which are widely used in two traditional clustering algorithms k-means and hierarchical... Most clustering algorithms need to describe the similarity of objects by a predefined distance function. Three distance functions which are widely used in two traditional clustering algorithms k-means and hierarchical clustering were investigated. Both theoretical analysis and detailed experimental results were given. It is shown that a distance function greatly affects clustering results and can be used to detect the outlier of a cluster by the comparison of such different results and give the shape information of clusters. In practice situation, it is suggested to use different distance function separately, compare the clustering results and pick out the 搒wing points? And such points may leak out more information for data analysts. 展开更多
关键词 distance function clustering algorithms K-MEANS DENDROGRAM data mining
下载PDF
CTUNING:A REUSE DISTANCE BASED CACHE PERFORMANCE TUNING TOOL
15
作者 Fu Xiong Wang Ruchuan 《Journal of Electronics(China)》 2009年第4期517-524,共8页
Cache performance tuning tools are conducive to develop program with good locality and fully use cache to decrease the influence caused by speed gap between processor and memory. This paper introduces the design and i... Cache performance tuning tools are conducive to develop program with good locality and fully use cache to decrease the influence caused by speed gap between processor and memory. This paper introduces the design and implementation of a cache performance tuning tool named CTuning, which employs a source level instrumentation method to gather program data access information, and uses a limited reuse distance model to analyze cache behavior. Experiments on 183.equake improve average performance more than 6% and show that CTuning is proficient not only in locating cache performance bottlenecks to guide manual code transformation, but also in analyzing cache behavior relationship among variables, thus to direct manual data reorganization. 展开更多
关键词 Cache behavior Source level instrumentation Reuse distance Code transformation data reorganization
下载PDF
Multi-Class Support Vector Machine Classifier Based on Jeffries-Matusita Distance and Directed Acyclic Graph 被引量:1
16
作者 Miao Zhang Zhen-Zhou Lai +1 位作者 Dan Li Yi Shen 《Journal of Harbin Institute of Technology(New Series)》 EI CAS 2013年第5期113-118,共6页
Based on the framework of support vector machines (SVM) using one-against-one (OAO) strategy, a new multi-class kernel method based on directed aeyclie graph (DAG) and probabilistic distance is proposed to raise... Based on the framework of support vector machines (SVM) using one-against-one (OAO) strategy, a new multi-class kernel method based on directed aeyclie graph (DAG) and probabilistic distance is proposed to raise the multi-class classification accuracies. The topology structure of DAG is constructed by rearranging the nodes' sequence in the graph. DAG is equivalent to guided operating SVM on a list, and the classification performance depends on the nodes' sequence in the graph. Jeffries-Matusita distance (JMD) is introduced to estimate the separability of each class, and the implementation list is initialized with all classes organized according to certain sequence in the list. To testify the effectiveness of the proposed method, numerical analysis is conducted on UCI data and hyperspectral data. Meanwhile, comparative studies using standard OAO and DAG classification methods are also conducted and the results illustrate better performance and higher accuracy of the orooosed JMD-DAG method. 展开更多
关键词 multi-class classification support vector machine directed acyclic graph Jeffries-Matusitadistance hyperspcctral data
下载PDF
The revised distance of supernova remnant G15.4+0.1
17
作者 Hong-Quan Su Meng-Fei Zhang +1 位作者 Hui Zhu Dan Wu 《Research in Astronomy and Astrophysics》 SCIE CAS CSCD 2017年第10期109-112,共4页
We measure the distance to the supernova remnant G15.4±0.1 which is likely associated with TeV source HESS J1818-154. We build the neutral hydrogen (HI) absorption and 13CO spectra for supernova remnant G 15.4&... We measure the distance to the supernova remnant G15.4±0.1 which is likely associated with TeV source HESS J1818-154. We build the neutral hydrogen (HI) absorption and 13CO spectra for supernova remnant G 15.4±0.1 by employing data from the Southern Galactic Plane Survey (SGPS) and the HI/OH/Recombination line survey (THOR). The maximum absorption velocity of about 140 km s-1 constrains the lower limit of its distance to about 8.0 kpc. Further, the fact that the HI emission feature at about 95 km s-1 seems to have no corresponding absorption suggests that G 15.4±0.1 likely has an upper limit for distance of about 10.5 kpc. The 13CO spectrum for the remnant supports our measurement. The new distance provides revised parameters on its associated pulsar wind nebula and TeV source. 展开更多
关键词 ISM: supernova remnants -- methods: data analysis -- stars: distances
下载PDF
Proposal for Design and Application of Business Intelligence as a Decision Support System to the Editorial Sector of Distance Education (DE)
18
作者 Walther Azzolini Junior 《Management Studies》 2016年第2期60-79,共20页
In recent years, industrial and service organizations have invested in improvement projects with emphasis on increasing the performance of processes regarding to the production of manufactured goods and services, appl... In recent years, industrial and service organizations have invested in improvement projects with emphasis on increasing the performance of processes regarding to the production of manufactured goods and services, applying techniques to optimize production time in order to minimize the restrictive effects of the funds invested in processing or obtaining processes in order to reduce the losses of general scope. This paper discusses the impact of the innovation in making use of business intelligence (BI) concepts about production records of a publishing area in a higher education institution (HEI) that promotes distance education (DE) in Brazil, helping the industry in their managements decisions, having a target minimize time on production of learning material through more effective control with the use of cubes in the form of reports for metric queries of delivery delays and metrics on the production tasks financial values, and filtering the processed information so that managers can view information from various angles and managerial perspectives. The objectives of this paper are to demonstrate the impact on using BI concepts in the process of an editorial department of a HEI focusing on the development of teaching materials for the courses of DE and identify the financial cost-benefit ratio for the HEI with the deploying BI in a software fee platform in its publishing department. The sector is responsible by in courseware publishing organizations that usually do not have systems with this emphasis: support on making managerial decisions. 展开更多
关键词 business intelligence data warehouse distance education (DE) free software and open source software open source Pentaho
下载PDF
A Novel Method for Prediction of Protein Domain Using Distance-Based Maximal Entropy
19
作者 Shu-xue Zou Yan-xin Huang Yan Wang Chun-guang Zhou 《Journal of Bionic Engineering》 SCIE EI CSCD 2008年第3期215-223,共9页
Detecting the boundaries of protein domains is an important and challenging task in both experimental and computational structural biology. In this paper, a promising method for detecting the domain structure of a pro... Detecting the boundaries of protein domains is an important and challenging task in both experimental and computational structural biology. In this paper, a promising method for detecting the domain structure of a protein from sequence information alone is presented. The method is based on analyzing multiple sequence alignments derived from a database search. Multiple measures are defined to quantify the domain information content of each position along the sequence. Then they are combined into a single predictor using support vector machine. What is more important, the domain detection is first taken as an imbal- anced data learning problem. A novel undersampling method is proposed on distance-based maximal entropy in the feature space of Support Vector Machine (SVM). The overall precision is about 80%. Simulation results demonstrate that the method can help not only in predicting the complete 3D structure of a protein but also in the machine learning system on general im- balanced datasets. 展开更多
关键词 protein domain boundary SVM imbalanced data learning distance-based maximal entropy
下载PDF
A Study of EM Algorithm as an Imputation Method: A Model-Based Simulation Study with Application to a Synthetic Compositional Data
20
作者 Yisa Adeniyi Abolade Yichuan Zhao 《Open Journal of Modelling and Simulation》 2024年第2期33-42,共10页
Compositional data, such as relative information, is a crucial aspect of machine learning and other related fields. It is typically recorded as closed data or sums to a constant, like 100%. The statistical linear mode... Compositional data, such as relative information, is a crucial aspect of machine learning and other related fields. It is typically recorded as closed data or sums to a constant, like 100%. The statistical linear model is the most used technique for identifying hidden relationships between underlying random variables of interest. However, data quality is a significant challenge in machine learning, especially when missing data is present. The linear regression model is a commonly used statistical modeling technique used in various applications to find relationships between variables of interest. When estimating linear regression parameters which are useful for things like future prediction and partial effects analysis of independent variables, maximum likelihood estimation (MLE) is the method of choice. However, many datasets contain missing observations, which can lead to costly and time-consuming data recovery. To address this issue, the expectation-maximization (EM) algorithm has been suggested as a solution for situations including missing data. The EM algorithm repeatedly finds the best estimates of parameters in statistical models that depend on variables or data that have not been observed. This is called maximum likelihood or maximum a posteriori (MAP). Using the present estimate as input, the expectation (E) step constructs a log-likelihood function. Finding the parameters that maximize the anticipated log-likelihood, as determined in the E step, is the job of the maximization (M) phase. This study looked at how well the EM algorithm worked on a made-up compositional dataset with missing observations. It used both the robust least square version and ordinary least square regression techniques. The efficacy of the EM algorithm was compared with two alternative imputation techniques, k-Nearest Neighbor (k-NN) and mean imputation (), in terms of Aitchison distances and covariance. 展开更多
关键词 Compositional data Linear Regression Model Least Square Method Robust Least Square Method Synthetic data Aitchison distance Maximum Likelihood Estimation Expectation-Maximization Algorithm k-Nearest Neighbor and Mean imputation
下载PDF
上一页 1 2 83 下一页 到第
使用帮助 返回顶部