问卷数据建模前传被引量：231

Preliminary Work for Modeling Questionnaire Data

下载PDF

导出

摘要问卷法是一种常见的实证研究方法。问卷数据建模之前的工作,就像是一栋大楼的奠基工程,基础是否扎实,影响后续的工程质量。本文专门讨论统计建模之前的工作(重点是量表评价),内容包括:处理缺失值、评价量表的结构效度和题目删除的适当性、多维量表需要合成总分时检验同质性并计算合成信度、检验共同方法偏差和评价(变量)区分效度、题目打包、检验自变量的多重共线性,最后也涉及建模理据和无关变量控制等。 Questionnaire data have been frequently employed in empirical studies of psychology, as well as in many other behavioral and social science disciplines. This paper discusses preliminary work for modeling questionnaire data, including the data processing which might affect the analysis result. First of all, the initial processes of raw data are introduced, including data checking, missing value imputation, and the normality test. Then we focus on the questionnaire （scales and items） evaluation based on a measurement model using Confirmatory Factor Analyses （CFA）. The construct validity of the scale is acceptable if the measurement model reflecting the hypothetical construct proposed by the theory fits the data with acceptable fit indexes （CFI and TLI 〉 0.9; RMSEA and SRMR 〈0.08, say）. When the item-factor relationship is examined, some items with low loading （e.g., less than 0.4 in the completely standardized solution） are often deleted. It is necessary to consider and explain that the remaining items of the scale are still a representive item sample to measure the latent variable. For a general test, the measurement errors of items are reasonably uncorrelated. If the Cronbach＇s coefficient ct is high enough to be accepted, then the test reliability is also acceptable. Suppose that the total score of the test is meaningful and employed, it would be better to report the composit reliability with a confidence interval. For a multidimensional test, the total score could be employed only when the homogeneity reliability is not lower than 0.5. For a research with several latent variables, the discriminant validity could be examined by a series of CFA models. The one-factor model is the worst fitted whereas the separated-factor model in which one latent variable corresponds to one factor is the best fitted. The diseriminant validity is verified if the separated-factor model is obviously better fitted than any other competitive model in the series of CFA models. Then a method factor is added to the separated-factor model as a global factor to set up a bifactor model, and the common method bias is not a problem if the bifaetor model is not obviously better fitted than the separated-factor model. Structure equation models are frequently applied to analyze questionnaire data. It is suggested that the sample size be large enough so that it is more than 10 times the nnbmer of the indicators, or 5 times the number of the parameters which are freely estimated. When the sample size is not large enough, item parceling constitutes a technique of improving the quality of indicators and model fit. The prerequisites for parceling are unidimension and homogeneity, and the applicability of parceling is the analysis of structural models, rather than measurement models. If the scale is multidimensional, an internal-consistency approach is recommended so that the items of the same dimension are parceled to one or three indicators for structural equation modeling. When a multiple regression model is involved, multieollinearity could be detected by the tolerance or the variance inflation factor （VIF=1/ tolerance）. Each predictor has a VIF, and a VIF of 5 （or 10） or above indicates a （or serious） multicollinearity problem. A VIF 〉 5 （or 10） is equivalent to the variance of the predictor is explained by more than 80% （or 90%） by all the other predictors, that is, the coefficient of determination of the regression of the predictor on all the other predictors is larger than.8 （or .9）. For a cross-sectional design of study, it may be an issue to propose a hypothesis that one variable is a cause of another. The issue could be addressed by domain theory, literature or commonsense. IfX is more essential （or more stable, or more objective, or more long-standing, etc.） than Y, X is much more likely to act as a cause than Y. Variable control is necessary for causal inference, in order to eliminate the spurious effect when there exists a common cause of X and Y, or remove the unanalyzed effect when there exists a covariate with which X affects Y. Recently the replication crisis attracted attention and discussion. For a questionnaire data set, different results might arise from different methods of processing data before modeling and analyzing. Appropriate data processing could help obtain a reasonable result and raise the repeatability of the result.

作者温忠麟黄彬彬汤丹丹

机构地区华南师范大学心理应用研究中心/心理学院北京师范大学心理学部应用实验心理北京市重点实验室

出处《心理科学》 CSSCI CSCD 北大核心 2018年第1期204-210,共7页 Journal of Psychological Science

基金国家自然科学基金项目(31771245)的资助

关键词问卷数据量表测量模型信度效度 questionnaire data, scale, measurement model, reliability, validity

分类号 B841 [哲学宗教—基础心理学]