Title: | Unidimensional Item Response Theory Modeling |
---|---|
Description: | Fit unidimensional item response theory (IRT) models to a mixture of dichotomous and polytomous data, calibrate online item parameters (i.e., pretest and operational items), estimate examinees' abilities, and examine the IRT model-data fit on item-level in different ways as well as provide useful functions related to IRT analyses such as IRT model-data fit evaluation and differential item functioning analysis. The bring.flexmirt() and write.flexmirt() functions were written by modifying the read.flexmirt() function (Pritikin & Falk (2022) <doi:10.1177/0146621620929431>). The bring.bilog() and bring.parscale() functions were written by modifying the read.bilog() and read.parscale() functions, respectively (Weeks (2010) <doi:10.18637/jss.v035.i12>). The bisection() function was written by modifying the bisection() function (Howard (2017, ISBN:9780367657918)). The code of the inverse test characteristic curve scoring in the est_score() function was written by modifying the irt.eq.tse() function (González (2014) <doi:10.18637/jss.v059.i07>). In est_score() function, the code of weighted likelihood estimation method was written by referring to the Pi(), Ji(), and Ii() functions of the catR package (Magis & Barrada (2017) <doi:10.18637/jss.v076.c01>). |
Authors: | Hwanggyu Lim [aut, cre], Craig S. Wells [ctb], James Howard [ctb], Joshua Pritikin [ctb], Jonathan P Weeks [ctb], Jorge González [ctb], David Magis [ctb] |
Maintainer: | Hwanggyu Lim <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.2.1 |
Built: | 2024-12-24 04:14:06 UTC |
Source: | https://github.com/cran/irtQ |
Fit unidimensional item response theory (IRT) models to a mixture of dichotomous and polytomous data, calibrate online item parameters (i.e., pretest and operational items), estimate examinees' abilities, and provide useful functions related to unidimensional IRT such as IRT model-data fit evaluation and differential item functioning analysis.
For the item parameter estimation, the marginal maximum likelihood estimation via the expectation-maximization (MMLE-EM) algorithm (Bock & Aitkin, 1981) is used. Also, the fixed item parameter calibration (FIPC) method (Kim, 2006) and the fixed ability parameter calibration (FAPC) method, (Ban, Hanson, Wang, Yi, & Harris, 2001; stocking, 1988), often called Stocking's Method A, are provided. For the ability estimation, several popular scoring methods (e.g., ML, EAP, and MAP) are implemented.
In addition, there are many useful functions related to IRT analyses such as evaluating IRT model-data fit, analyzing differential item functioning (DIF), importing item and/or ability parameters from popular IRT software, running flexMIRT (Cai, 2017) through R, generating simulated data, computing the conditional distribution of observed scores using the Lord-Wingersky recursion formula, computing item and test information functions, computing item and test characteristic curve functions, and plotting item and test characteristic curves and item and test information functions.
Package: | irtQ |
Version: | 0.2.1 |
Date: | 2024-08-23 |
Depends: | R (>= 4.1) |
License: | GPL (>= 2) |
Following five sections describe a) how to implement the online item calibration using FIPC, a) how to implement the online item calibration using Method A, b) two illustrations of the online calibration, and c) IRT Models used in irtQ package.
The fixed item parameter calibration (FIPC) is one of useful online item calibration methods for computerized adaptive testing (CAT) to put the parameter estimates of pretest items on the same scale of operational item parameter estimates without post hoc linking/scaling (Ban, Hanson, Wang, Yi, & Harris, 2001; Chen & Wang, 2016). In FIPC, the operational item parameters are fixed to estimate the characteristic of the underlying latent variable prior distribution when calibrating the pretest items. More specifically, the underlying latent variable prior distribution of the operational items is estimated during the calibration of the pretest items to put the item parameters of the pretest items on the scale of the operational item parameters (Kim, 2006). In the irtQ package, FIPC is implemented with two main steps:
Prepare a response data set and the item metadata of the fixed (or operational) items.
Implement FIPC to estimate the item parameters of pretest items using the est_irt
function.
To run the est_irt
function, it requires two data sets:
Item metadata set (i.e., model, score category, and item parameters. see the desciption of the argument x
in the function est_irt
).
Examinees' response data set for the items. It should be a matrix format where a row and column indicate the examinees and the items, respectively. The order of the columns in the response data set must be exactly the same as the order of rows of the item metadata.
When FIPC is implemented in est_irt
function, the pretest item parameters are estimated by fixing the operational item parameters. To estimate the item
parameters, you need to provide the item metadata in the argument x
and the response data in the argument data
.
It is worthwhile to explain about how to prepare the item metadata set in the argument x
. A specific form of a data frame should be used for
the argument x
. The first column should have item IDs, the second column should contain the number of score categories of the items, and the third
column should include IRT models. The available IRT models are "1PLM", "2PLM", "3PLM", and "DRM" for dichotomous items, and "GRM" and "GPCM" for polytomous
items. Note that "DRM" covers all dichotomous IRT models (i.e, "1PLM", "2PLM", and "3PLM") and "GRM" and "GPCM" represent the graded response model and
(generalized) partial credit model, respectively. From the fourth column, item parameters should be included. For dichotomous items, the fourth, fifth,
and sixth columns represent the item discrimination (or slope), item difficulty, and item guessing parameters, respectively. When "1PLM" or "2PLM" is
specified for any items in the third column, NAs should be inserted for the item guessing parameters. For polytomous items, the item discrimination (or slope)
parameters should be contained in the fourth column and the item threshold (or step) parameters should be included from the fifth to the last columns.
When the number of categories differs between items, the empty cells of item parameters should be filled with NAs. See 'est_irt' for more details about
the item metadata.
Also, you should specify in the argument fipc = TRUE
and a specific FIPC method in the argument fipc.method
. Finally, you should provide
a vector of the location of the items to be fixed in the argument fix.loc
. For more details about implementing FIPC, see the
description of the function est_irt
.
When implementing FIPC, you can estimate both the emprical histogram and the scale of latent variable prior distribution by setting EmpHist = TRUE
.
If EmpHist = FALSE
, the normal prior distribution is used during the item parameter estimation and the scale of the normal prior distribution is
updated during the EM cycle.
The est_item
function requires a vector of the number of score categories for the items in the argument cats
. For example, a dichotomous item has
two score categories. If a single numeric value is specified, that value will be recycled across all items. If NULL and all items are binary items
(i.e., dichotomous items), it assumes that all items have two score categories.
If necessary, you need to specify whether prior distributions of item slope and guessing parameters (only for the IRT 3PL model) are used in the arguments of
use.aprior
and use.gprior
, respectively. If you decide to use the prior distributions, you should specify what distributions will be used for the prior
distributions in the arguments of aprior
and gprior
, respectively. Currently three probability distributions of Beta, Log-normal, and Normal
distributions are available.
In addition, if the response data include missing values, you must indicate the missing value in argument missing
.
Once the est_irt
function has been implemented, you'll get a list of several internal objects such as the item parameter estimates,
standard error of the parameter estimates.
In CAT, the fixed ability parameter calibration (FAPC), often called Stocking's Method A, is the relatively simplest and most straightforward online calibration method, which is the maximum likelihood estimation of the item parameters given the proficiency estimates. In CAT, FAPC can be used to put the parameter estimates of pretest items on the same scale of operational item parameter estimates and recalibrate the operational items to evaluate the parameter drifts of the operational items (Chen & Wang, 2016; Stocking, 1988). Also, FAPC is known to result in accurate, unbiased item parameters calibration when items are randomly rather than adaptively administered to examinees, which occurs most commonly with pretest items (Ban, Hanson, Wang, Yi, & Harris, 2001; Chen & Wang, 2016). Using irtQ package, the FAPC is implemented to calibrate the items with two main steps:
Prepare a data set for the calibration of item parameters (i.e., item response data and ability estimates).
Implement the FAPC to estimate the item parameters using the est_item
function.
To run the est_item
function, it requires two data sets:
Examinees' ability (or proficiency) estimates. It should be in the format of a numeric vector.
response data set for the items. It should be in the format of matrix where a row and column indicate the examinees and the items, respectively. The order of the examinees in the response data set must be exactly the same as that of the examinees' ability estimates.
The est_item
function estimates the pretest item parameters given the proficiency estimates. To estimate the item parameters,
you need to provide the response data in the argument data
and the ability estimates in the argument score
.
Also, you should provide a string vector of the IRT models in the argument model
to indicate what IRT model is used to calibrate each item.
Available IRT models are "1PLM", "2PLM", "3PLM", and "DRM" for dichotomous items, and "GRM" and "GPCM" for polytomous items. "GRM" and "GPCM" represent
the graded response model and (generalized) partial credit model, respectively. Note that "DRM" is considered as "3PLM" in this function. If a single
character of the IRT model is specified, that model will be recycled across all items.
The est_item
function requires a vector of the number of score categories for the items in the argument cats
. For example, a dichotomous item has
two score categories. If a single numeric value is specified, that value will be recycled across all items. If NULL and all items are binary items
(i.e., dichotomous items), it assumes that all items have two score categories.
If necessary, you need to specify whether prior distributions of item slope and guessing parameters (only for the IRT 3PL model) are used in the arguments of
use.aprior
and use.gprior
, respectively. If you decide to use the prior distributions, you should specify what distributions will be used for the prior
distributions in the arguments of aprior
and gprior
, respectively. Currently three probability distributions of Beta, Log-normal, and Normal
distributions are available.
In addition, if the response data include missing values, you must indicate the missing value in argument missing
.
Once the est_item
function has been implemented, you'll get a list of several internal objects such as the item parameter estimates,
standard error of the parameter estimates.
The example code below shows how to implement the online calibration and how to evaluate the IRT model-data fit:
##--------------------------------------------------------------- # Attach the packages library(irtQ) ##---------------------------------------------------------------------------- # 1. The example code below shows how to prepare the data sets and how to # implement the fixed item parameter calibration (FIPC): ##---------------------------------------------------------------------------- ## Step 1: prepare a data set ## In this example, we generated examinees' true proficiency parameters and simulated ## the item response data using the function "simdat". ## import the "-prm.txt" output file from flexMIRT flex_sam <- system.file("extdata", "flexmirt_sample-prm.txt", package = "irtQ") # select the item metadata x <- bring.flexmirt(file=flex_sam, "par")$Group1$full_df # generate 1,000 examinees' latent abilities from N(0.4, 1.3) set.seed(20) score <- rnorm(1000, mean=0.4, sd=1.3) # simulate the response data sim.dat <- simdat(x=x, theta=score, D=1) ## Step 2: Estimate the item parameters # fit the 3PL model to all dichotmous items, fit the GRM model to all polytomous data, # fix the five 3PL items (1st - 5th items) and three GRM items (53th to 55th items) # also, estimate the empirical histogram of latent variable fix.loc <- c(1:5, 53:55) (mod.fix1 <- est_irt(x=x, data=sim.dat, D=1, use.gprior=TRUE, gprior=list(dist="beta", params=c(5, 16)), EmpHist=TRUE, Etol=1e-3, fipc=TRUE, fipc.method="MEM", fix.loc=fix.loc)) summary(mod.fix1) # plot the estimated empirical histogram of latent variable prior distribution (emphist <- getirt(mod.fix1, what="weights")) plot(emphist$weight ~ emphist$theta, xlab="Theta", ylab="Density") ##---------------------------------------------------------------------------- # 2. The example code below shows how to prepare the data sets and how to estimate # the item parameters using the fixed abilit parameter calibration (FAPC): ##---------------------------------------------------------------------------- ## Step 1: prepare a data set ## In this example, we generated examinees' true proficiency parameters and simulated ## the item response data using the function "simdat". Because, the true ## proficiency parameters are not known in reality, however, the true proficiencies ## would be replaced with the proficiency estimates for the calibration. # import the "-prm.txt" output file from flexMIRT flex_sam <- system.file("extdata", "flexmirt_sample-prm.txt", package = "irtQ") # select the item metadata x <- bring.flexmirt(file=flex_sam, "par")$Group1$full_df # modify the item metadata so that some items follow 1PLM, 2PLM and GPCM x[c(1:3, 5), 3] <- "1PLM" x[c(1:3, 5), 4] <- 1 x[c(1:3, 5), 6] <- 0 x[c(4, 8:12), 3] <- "2PLM" x[c(4, 8:12), 6] <- 0 x[54:55, 3] <- "GPCM" # generate examinees' abilities from N(0, 1) set.seed(23) score <- rnorm(500, mean=0, sd=1) # simulate the response data data <- simdat(x=x, theta=score, D=1) ## Step 2: Estimate the item parameters # 1) item parameter estimation: constrain the slope parameters of the 1PLM to be equal (mod1 <- est_item(x, data, score, D=1, fix.a.1pl=FALSE, use.gprior=TRUE, gprior=list(dist="beta", params=c(5, 17)), use.startval=FALSE)) summary(mod1) # 2) item parameter estimation: fix the slope parameters of the 1PLM to 1 (mod2 <- est_item(x, data, score, D=1, fix.a.1pl=TRUE, a.val.1pl=1, use.gprior=TRUE, gprior=list(dist="beta", params=c(5, 17)), use.startval=FALSE)) summary(mod2) # 3) item parameter estimation: fix the guessing parameters of the 3PLM to 0.2 (mod3 <- est_item(x, data, score, D=1, fix.a.1pl=TRUE, fix.g=TRUE, a.val.1pl=1, g.val=.2, use.startval=FALSE)) summary(mod3)
In the irtQ package, both dichotomous and polytomous IRT models are available. For dichotomous items, IRT one-, two-, and three-parameter logistic models (1PLM, 2PLM, and 3PLM) are used. For polytomous items, the graded response model (GRM) and the (generalized) partial credit model (GPCM) are used. Note that the item discrimination (or slope) parameters should be fixed to 1 when the partial credit model is fit to data.
In the following, let be the response of an examinee with latent ability
on an item and suppose that there
are
unique score categories for each polytomous item.
For the IRT 1-3PL models, the probability that an examinee with provides a correct answer for an item is given by,
where is the item discrimination (or slope) parameter,
represents the item difficulty parameter,
refers to the item guessing parameter.
is a scaling factor in IRT models to make the logistic function
as close as possible to the normal ogive function when
. When the 1PLM is used,
is either fixed to a constant
value (e.g.,
) or constrained to have the same value across all 1PLM item data. When the IRT 1PLM or 2PLM is fit to data,
is set to 0.
For the GRM, the probability that an examinee with latent ability responds to score category
(
)
of an item is a given by,
where refers to the category boundary (threshold) function for score category
of an item
and its formula is analogous to that of 2PLM.
is the difficulty (or threshold) parameter for category boundary
of an item. Note that
and
.
For the GPCM, the probability that an examinee with latent ability responds to score category
(
)
of an item is a given by,
where is the difficulty parameter for category boundary
of an item. In other contexts, the difficulty parameter
can also be parameterized as
, where
refers to the location (or overall difficulty) parameter
and
represents a threshold parameter for score category
of an item. In the irtQ package,
difficulty
parameters are necessary when an item has
unique score categories because
. When a partial credit model is fit to data,
is fixed to 1.
Hwanggyu Lim [email protected]
Ames, A. J., & Penfield, R. D. (2015). An NCME Instructional Module on Item-Fit Statistics for Item Response Theory Models. Educational Measurement: Issues and Practice, 34(3), 39-48.
Baker, F. B., & Kim, S. H. (2004). Item response theory: Parameter estimation techniques. CRC Press.
Ban, J. C., Hanson, B. A., Wang, T., Yi, Q., & Harris, D., J. (2001) A comparative study of on-line pretest item calibration/scaling methods in computerized adaptive testing. Journal of Educational Measurement, 38(3), 191-212.
Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee's ability. In F. M. Lord & M. R. Novick (Eds.), Statistical theories of mental test scores (pp. 397-479). Reading, MA: Addison-Wesley.
Bock, R.D. (1960), Methods and applications of optimal scaling. Chapel Hill, NC: L.L. Thurstone Psychometric Laboratory.
Bock, R. D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46, 443-459.
Bock, R. D., & Mislevy, R. J. (1982). Adaptive EAP estimation of ability in a microcomputer environment. Psychometrika, 35, 179-198.
Cai, L. (2017). flexMIRT 3.5 Flexible multilevel multidimensional item analysis and test scoring [Computer software]. Chapel Hill, NC: Vector Psychometric Group.
Cappaert, K. J., Wen, Y., & Chang, Y. F. (2018). Evaluating CAT-adjusted approaches for suspected item parameter drift detection. Measurement: Interdisciplinary Research and Perspectives, 16(4), 226-238.
Chalmers, R. P. (2012). mirt: A multidimensional item response theory package for the R environment. Journal of Statistical Software, 48(6), 1-29.
Chen, P., & Wang, C. (2016). A new online calibration method for multidimensional computerized adaptive testing. Psychometrika, 81(3), 674-701.
González, J. (2014). SNSequate: Standard and nonstandard statistical models and methods for test equating. Journal of Statistical Software, 59, 1-30.
Hambleton, R. K., & Swaminathan, H. (1985) Item response theory: Principles and applications. Boston, MA: Kluwer.
Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991) Fundamentals of item response theory. Newbury Park, CA: Sage.
Han, K. T. (2016). Maximum likelihood score estimation method with fences for short-length tests and computerized adaptive tests. Applied psychological measurement, 40(4), 289-301.
Howard, J. P. (2017). Computational methods for numerical analysis with R. New York: Chapman and Hall/CRC.
Kang, T., & Chen, T. T. (2008). Performance of the generalized S-X2 item fit index for polytomous IRT models. Journal of Educational Measurement, 45(4), 391-406.
Kim, S. (2006). A comparative study of IRT fixed parameter calibration methods. Journal of Educational Measurement, 43(4), 355-381.
Kolen, M. J. & Brennan, R. L. (2004) Test Equating, Scaling, and Linking (2nd ed.). New York: Springer.
Kolen, M. J. & Tong, Y. (2010). Psychometric properties of IRT proficiency estimates. Educational Measurement: Issues and Practice, 29(3), 8-14.
Laplace, P. S. (1820).Theorie analytique des probabilites (in French). Courcier.
Li, Y. & Lissitz, R. (2004). Applications of the analytically derived asymptotic standard errors of item response theory item parameter estimates. Journal of educational measurement, 41(2), 85-117.
Lim, H., & Choe, E. M. (2023). Detecting differential item functioning in CAT using IRT residual DIF approach. Journal of Educational Measurement. doi:10.1111/jedm.12366.
Lim, H., Choe, E. M., & Han, K. T. (2022). A residual-based differential item functioning detection framework in item response theory. Journal of Educational Measurement, 59(1), 80-104. doi:10.1111/jedm.12313.
Lim, H., Zhu, D., Choe, E. M., & Han, K. T. (2023, April). Detecting differential item functioning among multiple groups using IRT residual DIF framework. Paper presented at the Annual Meeting of the National Council on Measurement in Education. Chicago, IL.
Lim, H., Davey, T., & Wells, C. S. (2020). A recursion-based analytical approach to evaluate the performance of MST. Journal of Educational Measurement, 58(2), 154-178.
Lord, F. & Wingersky, M. (1984). Comparison of IRT true score and equipercentile observed score equatings. Applied Psychological Measurement, 8(4), 453-461.
Magis, D., & Barrada, J. R. (2017). Computerized adaptive testing with R: Recent updates of the package catR. Journal of Statistical Software, 76, 1-19.
Magis, D., Yan, D., & Von Davier, A. A. (2017). Computerized adaptive and multistage testing with R: Using packages catR and mstR. Springer.
McKinley, R., & Mills, C. (1985). A comparison of several goodness-of-fit statistics. Applied Psychological Measurement, 9, 49-57.
Meilijson, I. (1989). A fast improvement to the EM algorithm on its own terms. Journal of the Royal Statistical Society: Series B (Methodological), 51, 127-138.
Muraki, E. & Bock, R. D. (2003). PARSCALE 4: IRT item analysis and test scoring for rating scale data [Computer Program]. Chicago, IL: Scientific Software International. URL http://www.ssicentral.com
Newcombe, R. G. (1998). Two-sided confidence intervals for the single proportion: comparison of seven methods. Statistics in medicine, 17(8), 857-872.
Orlando, M., & Thissen, D. (2000). Likelihood-based item-fit indices for dichotomous item response theory models. Applied Psychological Measurement, 24(1), 50-64.
Orlando, M., & Thissen, D. (2003). Further investigation of the performance of S-X2: An item fit index for use with dichotomous item response theory models. Applied Psychological Measurement, 27(4), 289-298.
Pritikin, J. (2018). rpf: Response Probability Functions. R package version 0.59. https://CRAN.R-project.org/package=rpf.
Pritikin, J. N., & Falk, C. F. (2020). OpenMx: A modular research environment for item response theory method development. Applied Psychological Measurement, 44(7-8), 561-562.
Stocking, M. L. (1996). An alternative method for scoring adaptive tests. Journal of Educational and Behavioral Statistics, 21(4), 365-389.
Stocking, M. L. (1988). Scale drift in on-line calibration (Research Rep. 88-28). Princeton, NJ: ETS.
Stone, C. A. (2000). Monte Carlo based null distribution for an alternative goodness-of-fit test statistic in IRT models. Journal of educational measurement, 37(1), 58-75.
Thissen, D. (1982). Marginal maximum likelihood estimation for the one-parameter logistic model. Psychometrika, 47, 175-186.
Thissen, D. & Wainer, H. (1982). Weighted likelihood estimation of ability in item response theory. Psychometrika, 54(3), 427-450.
Thissen, D., Pommerich, M., Billeaud, K., & Williams, V. S. (1995). Item Response Theory for Scores on Tests Including Polytomous Items with Ordered Responses. Applied Psychological Measurement, 19(1), 39-49.
Thissen, D. & Orlando, M. (2001). Item response theory for items scored in two categories. In D. Thissen & H. Wainer (Eds.), Test scoring (pp.73-140). Mahwah, NJ: Lawrence Erlbaum.
Wainer, H., & Mislevy, R. J. (1990). Item response theory, item calibration, and proficiency estimation. In H. Wainer (Ed.), Computer adaptive testing: A primer (Chap. 4, pp.65-102). Hillsdale, NJ: Lawrence Erlbaum.
Warm, T. A. (1989). Weighted likelihood estimation of ability in item response theory. Psychometrika, 54(3), 427-450.
Weeks, J. P. (2010). plink: An R Package for Linking Mixed-Format Tests Using IRT-Based Methods. Journal of Statistical Software, 35(12), 1-33. URL http://www.jstatsoft.org/v35/i12/.
Wells, C. S., & Bolt, D. M. (2008). Investigation of a nonparametric procedure for assessing goodness-of-fit in item response theory. Applied Measurement in Education, 21(1), 22-40.
Wilson, E. B. (1927). Probable inference, the law of succession, and statistical inference. Journal of the American Statistical Association, 22(158), 209-212.
Woods, C. M. (2007). Empirical histograms in item response theory with ordinal data. Educational and Psychological Measurement, 67(1), 73-87.
Yen, W. M. (1981). Using simulation results to choose a latent trait model. Applied Psychological Measurement, 5, 245-262.
Zimowski, M. F., Muraki, E., Mislevy, R. J., & Bock, R. D. (2003). BILOG-MG 3: Multiple-group IRT analysis and test maintenance for binary items [Computer Program]. Chicago, IL: Scientific Software International. URL http://www.ssicentral.com
This function creates a cbind matrix or rbind matrix using a list containing different length of numeric vectors.
bind.fill(List, type = c("rbind", "cbind"), fill = NA)
bind.fill(List, type = c("rbind", "cbind"), fill = NA)
List |
A list containing different length of numeric vectors |
type |
A character string specifying whether rbind is used or cbind is used. |
fill |
The value used to fill in missing data when aligning datasets. For |
A matrix.
Hwanggyu Lim [email protected]
# sample list score_list <- list(item1=c(0:3), item2=c(0:2), item3=c(0:5), item3=c(0:4)) # examples # 1) create a rbind with the sample score list bind.fill(score_list, type="rbind") # 2) create a cbind with the sample score list bind.fill(score_list, type="cbind") # 3) create a cbind with the sample score list, # and fill missing data with 0s. bind.fill(score_list, type="cbind", fill = 0L)
# sample list score_list <- list(item1=c(0:3), item2=c(0:2), item3=c(0:5), item3=c(0:4)) # examples # 1) create a rbind with the sample score list bind.fill(score_list, type="rbind") # 2) create a cbind with the sample score list bind.fill(score_list, type="cbind") # 3) create a cbind with the sample score list, # and fill missing data with 0s. bind.fill(score_list, type="cbind", fill = 0L)
This function is a modified version of the bisetion
function
in the cmna R package (Howard, 2017) to find a root of the function .funs
with respect to its first argument. Unlike the bisetion
of the cmna,
this bisetion
function accepts additional arguments of the function .fun
.
bisection(.fun, ..., lb, ub, tol = 1e-04, max.it = 100)
bisection(.fun, ..., lb, ub, tol = 1e-04, max.it = 100)
.fun |
A function for which the root is searched. |
... |
Additional arguments to be passed to |
lb |
A lower bound of the interval to be searched. |
ub |
An upper bound of the interval to be searched. |
tol |
The tolerance of error. Default is 1e-4. |
max.it |
The maximum number of iterations. Default is 100. |
The bisection method is a well-known root finding numerical algorithm that works for any continuous
function when the lower (lb
) and upper (ub
) bounds with opposite signs are provided.
This method repeatedly bisects the defined interval by two values with opposite signs until the absolute
difference of two values becomes less than the error tolerance (tol
) or the maximum
number of iterations (max.it
) is reached.
A list with three internal objects. The first object is the root found, the second object is the number of iterations used, and the third object is the approximate accuracy of the root (i.e., absolute difference between the final two values with opposite signs).
Howard, J. P. (2017). Computational methods for numerical analysis with R. New York: Chapman and Hall/CRC.
## example: find a theta corresponding to the probability of ## correct answer using the item response function of 2PLM ## (a = 1, b = 0.2) # set a function of theta find.th <- function(theta, p) { p - drm(theta = theta, a = 1, b = 0.2, D = 1) } # find the theta corresponding to p = 0.2 bisection(.fun = find.th, p = 0.2, lb = -10, ub = 10)$root # find the theta corresponding to p = 0.8 bisection(.fun = find.th, p = 0.8, lb = -10, ub = 10)$root
## example: find a theta corresponding to the probability of ## correct answer using the item response function of 2PLM ## (a = 1, b = 0.2) # set a function of theta find.th <- function(theta, p) { p - drm(theta = theta, a = 1, b = 0.2, D = 1) } # find the theta corresponding to p = 0.2 bisection(.fun = find.th, p = 0.2, lb = -10, ub = 10)$root # find the theta corresponding to p = 0.8 bisection(.fun = find.th, p = 0.8, lb = -10, ub = 10)$root
These functions import item and/or ability parameters from BILOG-MG 3, PARSCALE 4, flexMIRT, and mirt (R package).
bring.flexmirt( file, type = c("par", "sco"), rePar = TRUE, rePar.gpc = TRUE, n.factor = 1 ) bring.bilog(file, type = c("par", "sco")) bring.parscale(file, type = c("par", "sco")) bring.mirt(x)
bring.flexmirt( file, type = c("par", "sco"), rePar = TRUE, rePar.gpc = TRUE, n.factor = 1 ) bring.bilog(file, type = c("par", "sco")) bring.parscale(file, type = c("par", "sco")) bring.mirt(x)
file |
A file name (including a directory) containing the item or ability parameters. |
type |
A character string indicating a type of output file. Available types are "par" for a file containing item parameter estimates and "sco" for a file containing ability parameter estimates. |
rePar |
A logical value. If TRUE and when the IRT dichotomous model (e.g., 3PLM) or GRM is fit to data, the item intercept and logit of item guessing parameters are reparameterized into the item difficulty and item guessing parameters, respectively. Default is TRUE. |
rePar.gpc |
A logical value. If TRUE and when (G)PCM is fit to data, the nominal model parameters in the flexMIRT parameter output file are reparameterized into the (G)PCM slope/difficulty parameters. Default is TRUE. |
n.factor |
A numeric value indicating the number of estimated factors. This argument should be specified
when |
x |
An output object obtained from the function |
The bring.flexmirt
was written by modifying the function read.flexmirt
(Pritikin & Falk, 2020). The functions bring.bilog
and bring.parscale
were written by modifying the functions read.bilog
and read.parscale
(Weeks, 2010), respectively.
The file extensions for item parameter and ability files, respectively, are: ".par" and ".sco" for BILOG-MG and PARSCALE, and "-prm.txt" and "-sco.txt" for flexMIRT. For mirt, the name of the output object is specified by the user.
Although bring.flexmirt
is able to extract multidimensional item and ability parameter estimates,
this package only deals with unidimensional IRT methods.
For polytomous item parameters, bring.flexmirt
and bring.mirt
are able to import
the item parameters of the graded response model and the (generalized) partial credit model.
These functions return a list including several objects. Only for the output of flexMIRT, the results of multiple group analysis can be returned. In that case, each element of the list contains the estimation results for each group.
To illustrate how to import the item parameter estimate files of PARSCALE 4 and flexMIRT
using bring.parscale
and bring.flexmirt
, two item parameter
estimate output files are included in this package.
Among the two output files, one of them is from PARSCALE 4 with a file extension of ".PAR" (i.e., "parscale_sample.PAR") and another one is from flexMIRT with a file extension of "-prm.txt" (i.e., "flexmirt_sample-prm.txt").
For the two item parameter estimate output files, both are mixed-format tests with 55 items consisting of fifty dichotomous items following the IRT 3PL model and five polytomous items with five categories following the graded response model. The examples below show how to import those output files.
Regarding the item parameter files for any IRT software, only the internal object "full_df" in the returned list is
necessary for the IRT linking. The object "full_df" is a data frame containing the item metadata
in a test form (e.g., item parameters, number of categories, models). See info
or simdat
for more details about the item metadata.
Also, when item parameters are estimated using the partial credit or the generalized partial credit model,
item step parameters are returned in the object "full_df". Item step parameters are the overall item difficulty (or location)
parameter subtracted by the difficulty (or threshold) parameter for each category. See irtfit
for more details
about the parameterization of the (generalized) partial credit model.
Hwanggyu Lim [email protected]
Cai, L. (2017). flexMIRT 3.5 Flexible multilevel multidimensional item analysis and test scoring [Computer software]. Chapel Hill, NC: Vector Psychometric Group.
Chalmers, R. P. (2012). mirt: A multidimensional item response theory package for the R environment. Journal of Statistical Software, 48(6), 1-29.
Weeks, J. P. (2010). plink: An R Package for Linking Mixed-Format Tests Using IRT-Based Methods. Journal of Statistical Software, 35(12), 1-33. URL http://www.jstatsoft.org/v35/i12/.
Pritikin, J. (2018). rpf: Response Probability Functions. R package version 0.59. https://CRAN.R-project.org/package=rpf.
Pritikin, J. N., & Falk, C. F. (2020). OpenMx: A modular research environment for item response theory method development. Applied Psychological Measurement, 44(7-8), 561-562.
Muraki, E. & Bock, R. D. (2003). PARSCALE 4: IRT item analysis and test scoring for rating scale data [Computer Program]. Chicago, IL: Scientific Software International. URL http://www.ssicentral.com
Zimowski, M. F., Muraki, E., Mislevy, R. J., & Bock, R. D. (2003). BILOG-MG 3: Multiple-group IRT analysis and test maintenance for binary items [Computer Program]. Chicago, IL: Scientific Software International. URL http://www.ssicentral.com
## example 1 # import the "-prm.txt" output file from flexMIRT flex_sam <- system.file("extdata", "flexmirt_sample-prm.txt", package = "irtQ") # read item parameters and transform them to item meta data bring.flexmirt(file = flex_sam, "par")$Group1$full_df ## example 2 ## import the ".par" output file from PARSCALE pscale_sam <- system.file("extdata", "parscale_sample.PAR", package = "irtQ") # read item parameters and transform them to item meta data bring.parscale(file = pscale_sam, "par")$full_df
## example 1 # import the "-prm.txt" output file from flexMIRT flex_sam <- system.file("extdata", "flexmirt_sample-prm.txt", package = "irtQ") # read item parameters and transform them to item meta data bring.flexmirt(file = flex_sam, "par")$Group1$full_df ## example 2 ## import the ".par" output file from PARSCALE pscale_sam <- system.file("extdata", "parscale_sample.PAR", package = "irtQ") # read item parameters and transform them to item meta data bring.parscale(file = pscale_sam, "par")$full_df
This function computes the classification accuracy and consistency indices for complex assessments using the method proposed by Lee (2010). The function can handle both dichotomous and polytomous item response theory (IRT) models.
cac_lee(x, cutscore, theta = NULL, weights = NULL, D = 1, cut.obs = TRUE)
cac_lee(x, cutscore, theta = NULL, weights = NULL, D = 1, cut.obs = TRUE)
x |
A data frame containing the item metadata (e.g., item parameters, number of
score categories, models ...). See |
cutscore |
A numeric vector specifying the cut scores for classification. Cut scores are the points that separate different performance categories (e.g., pass vs. fail, or different grades). |
theta |
A numeric vector of ability estimates. Ability estimates (theta values) are the individual proficiency estimates obtained from the IRT model. The theta parameter is optional and can be NULL. |
weights |
An optional two-column data frame or matrix where the first column is the quadrature points (nodes) and the second column is the corresponding weights. This is typically used in quadrature-based IRT analysis. |
D |
A scaling factor in IRT models to make the logistic function as close as possible to the normal ogive function (if set to 1.7). Default is 1. |
cut.obs |
A logical value. If TRUE, it indicates the cutscores on the observed-summed score metric. If FALSE, it indicates they are on the IRT theta score metric. Default is TRUE. |
The function works by first checking the provided inputs. If both theta and weights are NULL, the function will stop and return an error message. Depending on the provided inputs, the function will then compute the classification accuracy and consistency indices using either the quadrature points and corresponding weights (D method) or individual ability estimates (P method). The function returns a list containing the confusion matrix, marginal and conditional classification accuracy and consistency indices, probabilities of being assigned to each level category, and cut scores.
A list containing the following elements:
confusion: A confusion matrix showing the cross table between true and expected levels.
marginal: A data frame showing the marginal classification accuracy and consistency indices.
conditional: A data frame showing the conditional classification accuracy and consistency indices.
prob.level: A data frame showing the probability of being assigned to each level category.
cutscore: A numeric vector showing the cut scores used in the analysis.
Hwanggyu Lim [email protected]
Lee, W. C. (2010). Classification consistency and accuracy for complex assessments using item response theory. Journal of Educational Measurement, 47(1), 1-17.
gen.weight
, est_score
, cac_rud
## ------------------------------------------------------------------------------ # 1. When the empirical ability distribution of the population is available # (D method) ## ------------------------------------------------------------------------------ ## import the "-prm.txt" output file from flexMIRT flex_prm <- system.file("extdata", "flexmirt_sample-prm.txt", package = "irtQ") # read item parameter data and transform it to item metadata x <- bring.flexmirt(file = flex_prm, "par")$Group1$full_df # set the cut scores on the observed-summed score metric cutscore <- c(10, 20, 30, 50) # create the data frame including the quadrature points # and the corresponding weights node <- seq(-4, 4, 0.25) weights <- gen.weight(dist = "norm", mu = 0, sigma = 1, theta = node) # calculate the classification accuracy and consistency cac_1 <- cac_lee(x = x, cutscore = cutscore, weights = weights, D = 1) print(cac_1) ## ------------------------------------------------------------------------------ # 2. When individual ability estimates are available (P method) ## ------------------------------------------------------------------------------ # randomly select the true abilities from N(0, 1) set.seed(12) theta <- rnorm(n = 1000, mean = 0, sd = 1) # simulate the item response data data <- simdat(x = x, theta = theta, D = 1) # estimate the ability parameters using the ML estimation est_th <- est_score( x = x, data = data, D = 1, method = "ML", range = c(-4, 4), se = FALSE )$est.theta # calculate the classification accuracy and consistency cac_2 <- cac_lee(x = x, cutscore = cutscore, theta = est_th, D = 1) print(cac_2) ## ------------------------------------------------------------------------------ # 3. When individual ability estimates are available, # but, the cutscores are on the IRT theta score metric ## ------------------------------------------------------------------------------ # set the theta cut scures cutscore <- c(-2, -0.4, 0.2, 1.0) # calculate the classification accuracy and consistency cac_3 <- cac_lee( x = x, cutscore = cutscore, theta = est_th, D = 1, cut.obs = FALSE ) print(cac_3)
## ------------------------------------------------------------------------------ # 1. When the empirical ability distribution of the population is available # (D method) ## ------------------------------------------------------------------------------ ## import the "-prm.txt" output file from flexMIRT flex_prm <- system.file("extdata", "flexmirt_sample-prm.txt", package = "irtQ") # read item parameter data and transform it to item metadata x <- bring.flexmirt(file = flex_prm, "par")$Group1$full_df # set the cut scores on the observed-summed score metric cutscore <- c(10, 20, 30, 50) # create the data frame including the quadrature points # and the corresponding weights node <- seq(-4, 4, 0.25) weights <- gen.weight(dist = "norm", mu = 0, sigma = 1, theta = node) # calculate the classification accuracy and consistency cac_1 <- cac_lee(x = x, cutscore = cutscore, weights = weights, D = 1) print(cac_1) ## ------------------------------------------------------------------------------ # 2. When individual ability estimates are available (P method) ## ------------------------------------------------------------------------------ # randomly select the true abilities from N(0, 1) set.seed(12) theta <- rnorm(n = 1000, mean = 0, sd = 1) # simulate the item response data data <- simdat(x = x, theta = theta, D = 1) # estimate the ability parameters using the ML estimation est_th <- est_score( x = x, data = data, D = 1, method = "ML", range = c(-4, 4), se = FALSE )$est.theta # calculate the classification accuracy and consistency cac_2 <- cac_lee(x = x, cutscore = cutscore, theta = est_th, D = 1) print(cac_2) ## ------------------------------------------------------------------------------ # 3. When individual ability estimates are available, # but, the cutscores are on the IRT theta score metric ## ------------------------------------------------------------------------------ # set the theta cut scures cutscore <- c(-2, -0.4, 0.2, 1.0) # calculate the classification accuracy and consistency cac_3 <- cac_lee( x = x, cutscore = cutscore, theta = est_th, D = 1, cut.obs = FALSE ) print(cac_3)
This function computes the classification accuracy and consistency based on the methods proposed by Rudner (2001, 2005). It can handle both situations where the empirical ability distribution of the population is available and when individual ability estimates are available.
cac_rud(cutscore, theta = NULL, se, weights = NULL)
cac_rud(cutscore, theta = NULL, se, weights = NULL)
cutscore |
A numeric vector specifying the cut scores for classification. Cut scores are the points that separate different performance categories (e.g., pass vs. fail, or different grades). |
theta |
A numeric vector of ability estimates. Ability estimates (theta values) are the individual proficiency estimates obtained from the IRT model. The theta parameter is optional and can be NULL. |
se |
A numeric vector of the same length as theta indicating the standard errors of the ability estimates. |
weights |
An optional two-column data frame or matrix where the first column is the quadrature points (nodes) and the second column is the corresponding weights. This is typically used in quadrature-based IRT analysis. |
The function first checks the provided inputs for correctness. It then computes the probabilities that an examinee with a specific ability is assigned to each level category, and calculates the conditional classification accuracy and consistency for each theta value. Finally, it computes the marginal accuracy and consistency.
A list containing the following elements:
confusion: A confusion matrix showing the cross table between true and expected levels.
marginal: A data frame showing the marginal classification accuracy and consistency indices.
conditional: A data frame showing the conditional classification accuracy and consistency indices.
prob.level: A data frame showing the probability of being assigned to each level category.
cutscore: A numeric vector showing the cut scores used in the analysis.
Hwanggyu Lim [email protected]
Rudner, L. M. (2001). Computing the expected proportions of misclassified examinees. Practical Assessment, Research, and Evaluation, 7(1), 14.
Rudner, L. M. (2005). Expected classification accuracy. Practical Assessment, Research, and Evaluation, 10(1), 13.
gen.weight
, est_score
, cac_lee
## ------------------------------------------------------------------------------ # 1. When the empirical ability distribution of the population is available ## ------------------------------------------------------------------------------ ## import the "-prm.txt" output file from flexMIRT flex_prm <- system.file("extdata", "flexmirt_sample-prm.txt", package = "irtQ") # read item parameter data and transform it to item metadata x <- bring.flexmirt(file = flex_prm, "par")$Group1$full_df # set the cut scores on the theta score metric cutscore <- c(-2, -0.5, 0.8) # create the data frame including the quadrature points # and the corresponding weights node <- seq(-4, 4, 0.25) weights <- gen.weight(dist = "norm", mu = 0, sigma = 1, theta = node) # compute the conditional standard errors across all quadrature points tif <- info(x = x, theta = node, D = 1, tif = TRUE)$tif se <- 1 / sqrt(tif) # calculate the classification accuracy and consistency cac_1 <- cac_rud(cutscore = cutscore, se = se, weights = weights) print(cac_1) ## ------------------------------------------------------------------------------ # 2. When individual ability estimates are available ## ------------------------------------------------------------------------------ # randomly select the true abilities from N(0, 1) set.seed(12) theta <- rnorm(n = 1000, mean = 0, sd = 1) # simulate the item response data data <- simdat(x = x, theta = theta, D = 1) # estimate the ability parameters and standard errors using the ML estimation est_theta <- est_score( x = x, data = data, D = 1, method = "ML", range = c(-4, 4), se = TRUE ) theta_hat <- est_theta$est.theta se <- est_theta$se.theta # calculate the classification accuracy and consistency cac_2 <- cac_rud(cutscore = cutscore, theta = theta_hat, se = se) print(cac_2)
## ------------------------------------------------------------------------------ # 1. When the empirical ability distribution of the population is available ## ------------------------------------------------------------------------------ ## import the "-prm.txt" output file from flexMIRT flex_prm <- system.file("extdata", "flexmirt_sample-prm.txt", package = "irtQ") # read item parameter data and transform it to item metadata x <- bring.flexmirt(file = flex_prm, "par")$Group1$full_df # set the cut scores on the theta score metric cutscore <- c(-2, -0.5, 0.8) # create the data frame including the quadrature points # and the corresponding weights node <- seq(-4, 4, 0.25) weights <- gen.weight(dist = "norm", mu = 0, sigma = 1, theta = node) # compute the conditional standard errors across all quadrature points tif <- info(x = x, theta = node, D = 1, tif = TRUE)$tif se <- 1 / sqrt(tif) # calculate the classification accuracy and consistency cac_1 <- cac_rud(cutscore = cutscore, se = se, weights = weights) print(cac_1) ## ------------------------------------------------------------------------------ # 2. When individual ability estimates are available ## ------------------------------------------------------------------------------ # randomly select the true abilities from N(0, 1) set.seed(12) theta <- rnorm(n = 1000, mean = 0, sd = 1) # simulate the item response data data <- simdat(x = x, theta = theta, D = 1) # estimate the ability parameters and standard errors using the ML estimation est_theta <- est_score( x = x, data = data, D = 1, method = "ML", range = c(-4, 4), se = TRUE ) theta_hat <- est_theta$est.theta se <- est_theta$se.theta # calculate the classification accuracy and consistency cac_2 <- cac_rud(cutscore = cutscore, theta = theta_hat, se = se) print(cac_2)
This function analyzes DIF on an item using CATSIB procedure (Nandakumar & Roussos, 2004), which is a modified version of SIBTEST (Shealy & Stout, 1993). The CATSIB procedure can be applied to a computerized adaptive testing (CAT) environment for differential item functioning (DIF) detection. In CATSIB, examinees are matched on IRT-based ability estimates adjusted by employing a regression correction method (Shealy & Stout, 1993) to reduce a statistical bias of the CATSIB statistic due to impact.
catsib( x = NULL, data, score = NULL, se = NULL, group, focal.name, D = 1, n.bin = c(80, 10), min.binsize = 3, max.del = 0.075, weight.group = c("comb", "foc", "ref"), alpha = 0.05, missing = NA, purify = FALSE, max.iter = 10, min.resp = NULL, method = "ML", range = c(-5, 5), norm.prior = c(0, 1), nquad = 41, weights = NULL, ncore = 1, verbose = TRUE, ... )
catsib( x = NULL, data, score = NULL, se = NULL, group, focal.name, D = 1, n.bin = c(80, 10), min.binsize = 3, max.del = 0.075, weight.group = c("comb", "foc", "ref"), alpha = 0.05, missing = NA, purify = FALSE, max.iter = 10, min.resp = NULL, method = "ML", range = c(-5, 5), norm.prior = c(0, 1), nquad = 41, weights = NULL, ncore = 1, verbose = TRUE, ... )
x |
A data frame containing the item metadata (e.g., item parameters, number of categories, models ...).
|
data |
A matrix containing examinees' response data of the items in the argument |
score |
A vector of examinees' ability estimates. If the abilities are not provided (i.e., |
se |
A vector of the standard errors of the ability estimates. The standard errors should be ordered in accordance with the order of
the ability estimates specified in the |
group |
A numeric or character vector indicating group membership of examinees. The length of vector should be the same with the number of rows in the response data matrix. |
focal.name |
A single numeric or character scalar representing the level associated with the focal group. For instance,
given |
D |
A scaling factor in IRT models to make the logistic function as close as possible to the normal ogive function (if set to 1.7). Default is 1. |
n.bin |
A vector of two positive integers to set the maximum and minimum numbers of bins (or intervals) on the ability scale. The first and second values indicate the maximum and minimum numbers of the bins, respectively. See below for more detail. |
min.binsize |
A positive integer value to set the minimum size of each bin. To ensure stable statistical estimation, each bin is required
to have a certain number of examinees (e.g, 3), at least, from both reference and focal groups if it was to be included in calculation of |
max.del |
A numerical value to set the maximum permissible proportion of examinees to be deleted from either reference group or focal group when automatically determining the number of bins on the ability scale. Default is 0.075. See below for more detail. |
weight.group |
A single character string to specify a target ability distribution over which the expectation of DIF measure, called |
alpha |
A numeric value to specify significance |
missing |
A value indicating missing values in the response data set. Default is NA. |
purify |
A logical value indicating whether a purification process will be implemented or not. Default is FALSE. See below for more detail. |
max.iter |
A positive integer value specifying the maximum number of iterations for the purification process. Default is 10. |
min.resp |
A positive integer value specifying the minimum number of item responses for an examinee required to compute the ability estimate. Default is NULL. See details below for more information. |
method |
A character string indicating a scoring method. Available methods are "ML" for the maximum likelihood estimation, "WL" for the weighted likelihood estimation, "MAP" for the maximum a posteriori estimation, and "EAP" for the expected a posteriori estimation. Default method is "ML". |
range |
A numeric vector of two components to restrict the range of ability scale for the ML, WL, MLF, and MAP scoring methods. Default is c(-5, 5). |
norm.prior |
A numeric vector of two components specifying a mean and standard deviation of the normal prior distribution.
These two parameters are used to obtain the gaussian quadrature points and the corresponding weights from the normal distribution. Default is
c(0,1). Ignored if |
nquad |
An integer value specifying the number of gaussian quadrature points from the normal prior distribution. Default is 41.
Ignored if |
weights |
A two-column matrix or data frame containing the quadrature points (in the first column) and the corresponding weights
(in the second column) of the latent variable prior distribution. The weights and quadrature points can be easily obtained
using the function |
ncore |
The number of logical CPU cores to use. Default is 1. See |
verbose |
A logical value. If TRUE, the progress messages of purification procedure are suppressed. Default is TRUE. |
... |
Additional arguments that will be forwarded to the |
In CATSIB procedure (Nandakumar & Roussos, 2004), because , which is the expected
regressed on
,
is a continuous variable, the range of
is divided into K equal intervals and examinees are classified into one of K intervals
on the basis of their
.Then, any intervals that contain less than three examinees in either reference or focal groups were
excluded from the computation of
, which is a measure of the amount of DIF, to ensure stable statistical estimation. According to
Nandakumar and Roussos (2004), a default minimum size of each bin is set to 3 in
min.binsize
.
To carefully choose the number of intervals (K), the catsib
automatically determines it by gradually decreasing K from a larger to
smaller numbers based the rule used in Nandakumar and Roussos (2004). Specifically, beginning with an arbitrary large number (e.g., 80),
if more than a certain permissible percentage, let's say 7.5%, of examinees in either the reference or focal groups were removed, the catsib
automatically decreases the number of bins by one unit until a total number of examinees in each group reaches to more than or equal to 92.5%.
However, Nandakumar and Roussos (2004) recommended setting the minimum K to 10 to avoid a situation that extremely a few intervals are left,
even if the number of remaining examinees in each group is less than 92.5%. Thus, the maximum and minimum number of bins are set to 80 and 10, respectively,
as default in n.bin
. Also, a default maximum permissible proportion of examinees to be deleted from either reference group or focal group is
set to 0.075 in max.del
.
When it comes to the target ability distribution used to compute , Li and Stout (1996) and Nandakumar and Roussos (2004) used the combined-group
target ability distribution, which is a default option in
weight.group
. See Nandakumar and Roussos (2004) for more detail about the CATSIB method.
Although Nandakumar and Roussos (2004) did not propose a purification procedure for DIF analysis using CATSIB, the catsib
can implement an iterative
purification process in a similar way as in Lim, Choe, and Han (2022). Simply, at each iterative purification, examinees' latent abilities are computed using
purified items and scoring method specified in the method
argument. The iterative purification process stops when no further DIF items are found or
the process reaches a predetermined limit of iteration, which can be specified in the max.iter
argument. See Lim et al. (2022)
for more details about the purification procedure.
Scoring with a limited number of items can result in large standard errors, which may impact the effectiveness of DIF detection within
the CATSIB procedure. The min.resp
argument can be employed to avoid using scores with significant standard errors when calculating
the CATSIB statistic, particularly during the purification process. For instance, if min.resp
is not NULL (e.g., min.resp=5
),
item responses from examinees whose total item responses fall below the specified minimum number are treated as missing values (i.e., NA).
Consequently, their ability estimates become missing values and are not utilized in computing the CATSIB statistic. If min.resp=NULL
,
an examinee's score will be computed as long as there is at least one item response for the examinee.
This function returns a list of four internal objects. The four objects are:
no_purify |
A list of several sub-objects containing the results of DIF analysis without a purification procedure. The sub-objects are:
|
purify |
A logical value indicating whether the purification process was used. |
with_purify |
A list of several sub-objects containing the results of DIF analysis with a purification procedure. The sub-objects are:
|
alpha |
A significance |
Hwanggyu Lim [email protected]
Li, H. H., & Stout, W. (1996). A new procedure for detection of crossing DIF. Psychometrika, 61(4), 647-677.
Lim, H., Choe, E. M., & Han, K. T. (2022). A residual-based differential item functioning detection framework in item response theory. Journal of Educational Measurement.
Nandakumar, R., & Roussos, L. (2004). Evaluation of the CATSIB DIF procedure in a pretest setting. Journal of Educational and Behavioral Statistics, 29(2), 177-199.
Shealy, R. T., & Stout, W. F. (1993). A model-based standardization approach that separates true bias/DIF from group ability differences and detects test bias/DIF as well as item bias/DIF. Psychometrika, 58, 159–194.
rdif
, est_item
, info
, simdat
,
shape_df
, gen.weight
, est_score
# call library library("dplyr") ## Uniform DIF detection ############################################### # (1) manipulate true uniform DIF data ############################################### # import the "-prm.txt" output file from flexMIRT flex_sam <- system.file("extdata", "flexmirt_sample-prm.txt", package = "irtQ") # select 36 of 3PLM items which are non-DIF items par_nstd <- bring.flexmirt(file = flex_sam, "par")$Group1$full_df %>% dplyr::filter(.data$model == "3PLM") %>% dplyr::filter(dplyr::row_number() %in% 1:36) %>% dplyr::select(1:6) par_nstd$id <- paste0("nondif", 1:36) # generate four new items to inject uniform DIF difpar_ref <- shape_df( par.drm = list(a = c(0.8, 1.5, 0.8, 1.5), b = c(0.0, 0.0, -0.5, -0.5), g = 0.15), item.id = paste0("dif", 1:4), cats = 2, model = "3PLM" ) # manipulate uniform DIF on the four new items by adding constants to b-parameters # for the focal group difpar_foc <- difpar_ref %>% dplyr::mutate_at(.vars = "par.2", .funs = function(x) x + rep(0.7, 4)) # combine the 4 DIF and 36 non-DIF items for both reference and focal groups # thus, the first four items have uniform DIF par_ref <- rbind(difpar_ref, par_nstd) par_foc <- rbind(difpar_foc, par_nstd) # generate the true thetas set.seed(123) theta_ref <- rnorm(500, 0.0, 1.0) theta_foc <- rnorm(500, 0.0, 1.0) # generate the response data resp_ref <- simdat(par_ref, theta = theta_ref, D = 1) resp_foc <- simdat(par_foc, theta = theta_foc, D = 1) data <- rbind(resp_ref, resp_foc) ############################################### # (2) estimate the item and ability parameters # using the aggregate data ############################################### # estimate the item parameters est_mod <- est_irt(data = data, D = 1, model = "3PLM") est_par <- est_mod$par.est # estimate the ability parameters using ML theta_est <- est_score(x = est_par, data = data, method = "ML") score <- theta_est$est.theta se <- theta_est$se.theta ############################################### # (3) conduct DIF analysis ############################################### # create a vector of group membership indicators # where '1' indicates the focal group group <- c(rep(0, 500), rep(1, 500)) # (a)-1 compute SIBTEST statistic by providing scores, # and without a purification dif_1 <- catsib( x = NULL, data = data, D = 1, score = score, se = se, group = group, focal.name = 1, weight.group = "comb", alpha = 0.05, missing = NA, purify = FALSE ) print(dif_1) # (a)-2 compute SIBTEST statistic by providing scores, # and with a purification dif_2 <- catsib( x = est_par, data = data, D = 1, score = score, se = se, group = group, focal.name = 1, weight.group = "comb", alpha = 0.05, missing = NA, purify = TRUE ) print(dif_2)
# call library library("dplyr") ## Uniform DIF detection ############################################### # (1) manipulate true uniform DIF data ############################################### # import the "-prm.txt" output file from flexMIRT flex_sam <- system.file("extdata", "flexmirt_sample-prm.txt", package = "irtQ") # select 36 of 3PLM items which are non-DIF items par_nstd <- bring.flexmirt(file = flex_sam, "par")$Group1$full_df %>% dplyr::filter(.data$model == "3PLM") %>% dplyr::filter(dplyr::row_number() %in% 1:36) %>% dplyr::select(1:6) par_nstd$id <- paste0("nondif", 1:36) # generate four new items to inject uniform DIF difpar_ref <- shape_df( par.drm = list(a = c(0.8, 1.5, 0.8, 1.5), b = c(0.0, 0.0, -0.5, -0.5), g = 0.15), item.id = paste0("dif", 1:4), cats = 2, model = "3PLM" ) # manipulate uniform DIF on the four new items by adding constants to b-parameters # for the focal group difpar_foc <- difpar_ref %>% dplyr::mutate_at(.vars = "par.2", .funs = function(x) x + rep(0.7, 4)) # combine the 4 DIF and 36 non-DIF items for both reference and focal groups # thus, the first four items have uniform DIF par_ref <- rbind(difpar_ref, par_nstd) par_foc <- rbind(difpar_foc, par_nstd) # generate the true thetas set.seed(123) theta_ref <- rnorm(500, 0.0, 1.0) theta_foc <- rnorm(500, 0.0, 1.0) # generate the response data resp_ref <- simdat(par_ref, theta = theta_ref, D = 1) resp_foc <- simdat(par_foc, theta = theta_foc, D = 1) data <- rbind(resp_ref, resp_foc) ############################################### # (2) estimate the item and ability parameters # using the aggregate data ############################################### # estimate the item parameters est_mod <- est_irt(data = data, D = 1, model = "3PLM") est_par <- est_mod$par.est # estimate the ability parameters using ML theta_est <- est_score(x = est_par, data = data, method = "ML") score <- theta_est$est.theta se <- theta_est$se.theta ############################################### # (3) conduct DIF analysis ############################################### # create a vector of group membership indicators # where '1' indicates the focal group group <- c(rep(0, 500), rep(1, 500)) # (a)-1 compute SIBTEST statistic by providing scores, # and without a purification dif_1 <- catsib( x = NULL, data = data, D = 1, score = score, se = se, group = group, focal.name = 1, weight.group = "comb", alpha = 0.05, missing = NA, purify = FALSE ) print(dif_1) # (a)-2 compute SIBTEST statistic by providing scores, # and with a purification dif_2 <- catsib( x = est_par, data = data, D = 1, score = score, se = se, group = group, focal.name = 1, weight.group = "comb", alpha = 0.05, missing = NA, purify = TRUE ) print(dif_2)
This function calculates the analytical asymptotic variance-covariance matrices (e.g., Li & Lissitz, 2004; Thissen & Wainer, 1982) of item parameter estimates for dichotomous and polytomous IRT Models without examinee's responses to test items, given a set of item parameter estimates and sample size. The square roots of variance terms in the matrices can be used as the asymptotic standard errors of maximum likelihood item parameter estimates.
covirt( x, D = 1, nstd = 1000, pcm.loc = NULL, norm.prior = c(0, 1), nquad = 41, weights = NULL )
covirt( x, D = 1, nstd = 1000, pcm.loc = NULL, norm.prior = c(0, 1), nquad = 41, weights = NULL )
x |
A data frame containing the item metadata (e.g., item parameters, number of categories, models ...).
See |
D |
A scaling factor in IRT models to make the logistic function as close as possible to the normal ogive function (if set to 1.7). Default is 1. |
nstd |
An integer value or a vector of integer values indicating a sample size. When a vector is specified, length of the vector must be
the same as the number of test items in the argument |
pcm.loc |
A vector of integer values indicating the locations of partial credit model (PCM) items. For the PCM items, the variance-covariance matrices are computed only for the item category difficulty parameters. Default is NULL. See below for details. |
norm.prior |
A numeric vector of two components specifying a mean and standard deviation of the normal prior distribution. These two parameters are used to obtain the gaussian quadrature points and the corresponding weights from the normal distribution. Default is c(0,1). |
nquad |
An integer value specifying the number of gaussian quadrature points from the normal prior distribution. Default is 41. |
weights |
A two-column matrix or data frame containing the theta values (in the first column) and the weights (in the second column)
for the prior distribution. The weights and theta values can be easily obtained using the function |
The standard errors obtained from the analytical approach are likely to represent lower bounds for the actual standard errors (Thissen & Wainer, 1982). Therefore, they may be useful for assessing the degree of precision of a set of item parameter estimates when the corresponding standard errors of the estimates are not presented in literature or research reports.
Sometimes item parameters need to be estimated using different sample size. If the item parameters in the argument x
were
calibrated with different number of examinees, a vector of different sample sizes should be specified in the argument nstd
. Suppose
that you want to compute the variance-covariance matrices of five IRT 3PLM items and the five items were calibrated with 500, 600, 1,000, 2,000,
and 700 examinees, respectively. Then, nstd = c(500, 600, 1000, 2000, 700)
must be specified.
Because you can specify only "GPCM" for both the partial credit model (PCM) or the generalized partial credit model (GPCM) in the item metadata,
you must indicate which items are the PCM items through the argument pcm.loc
. This is because the item category difficulty parameters are estimated
from the PCM, meaning that the variance-covariance of item parameter estimates must be computed for the item category difficulty parameters. Suppose
that you want to compute the variance-covariance matrices of five polytomous items and the last two items were calibrated with the PCM. Then,
pcm.loc = c(4, 5)
must be specified.
A list of two internal objects. The first internal object contains a list of the variance-covariance matrices of item parameter estimates. The second internal object contains a list of the standard errors of item parameter estimates.
Hwanggyu Lim [email protected]
Li, Y. & Lissitz, R. (2004). Applications of the analytically derived asymptotic standard errors of item response theory item parameter estimates. Journal of educational measurement, 41(2), 85-117.
Thissen, D. & Wainer, H. (1982). Weighted likelihood estimation of ability in item response theory. Psychometrika, 54(3), 427-450.
irtfit
, info
, simdat
, shape_df
, gen.weight
## the use of a "-prm.txt" file obtained sfrom a flexMIRT flex_prm <- system.file("extdata", "flexmirt_sample-prm.txt", package = "irtQ") # select the first two dichotomous items and last polytomous item x <- bring.flexmirt(file = flex_prm, "par")$Group1$full_df[c(1:2, 55), ] # compute the var-covariance matrices with sample size of 2,000 covirt(x, D = 1, nstd = 2000, norm.prior = c(0, 1), nquad = 41)
## the use of a "-prm.txt" file obtained sfrom a flexMIRT flex_prm <- system.file("extdata", "flexmirt_sample-prm.txt", package = "irtQ") # select the first two dichotomous items and last polytomous item x <- bring.flexmirt(file = flex_prm, "par")$Group1$full_df[c(1:2, 55), ] # compute the var-covariance matrices with sample size of 2,000 covirt(x, D = 1, nstd = 2000, norm.prior = c(0, 1), nquad = 41)
This function computes the probability of correct answers for multiple items for a given set of theta values using the IRT 1PL, 2PL, and 3PL models.
drm(theta, a, b, g = NULL, D = 1)
drm(theta, a, b, g = NULL, D = 1)
theta |
A vector of ability values. |
a |
A vector of item discrimination (or slope) parameters. |
b |
A vector of item difficulty (or threshold) parameters. |
g |
A vector of item guessing parameters. |
D |
A scaling factor in IRT models to make the logistic function as close as possible to the normal ogive function (if set to 1.7). Default is 1. |
g
does not need to be specified when the response probabilities of
the 1PL and 2PL models are computed.
This function returns a matrix where a row indicates the ability and a column represents the item.
Hwanggyu Lim [email protected]
## when vectors are used for both theta values and item parameters (3PLM) drm(c(-0.1, 0.0, 1.5), a = c(1, 2), b = c(0, 1), g = c(0.2, 0.1), D = 1) ## when vectors are only used for item parameters (2PLM) drm(0.0, a = c(1, 2), b = c(0, 1), D = 1) ## when vectors are only used for theta values (3PLM) drm(c(-0.1, 0.0, 1.5), a = 1, b = 1, g = 0.2, D = 1)
## when vectors are used for both theta values and item parameters (3PLM) drm(c(-0.1, 0.0, 1.5), a = c(1, 2), b = c(0, 1), g = c(0.2, 0.1), D = 1) ## when vectors are only used for item parameters (2PLM) drm(0.0, a = c(1, 2), b = c(0, 1), D = 1) ## when vectors are only used for theta values (3PLM) drm(c(-0.1, 0.0, 1.5), a = 1, b = 1, g = 0.2, D = 1)
This function fits unidimensional item response (IRT) models to a mixture of dichotomous and polytomous data using the marginal maximum likelihood estimation via the expectation-maximization (MMLE-EM) algorithm (Bock & Aitkin, 1981). This function also implements the fixed item parameter calibration (FIPC; Kim, 2006). As Method A (Stocking, 1988), FIPC is one of useful online item calibration methods for computerized adaptive testing (CAT) to put the parameter estimates of pretest items on the same scale of operational item parameter estimates (Ban, Hanson, Wang, Yi, & Harris, 2001). For dichotomous items, IRT one-, two-, and three-parameter logistic models are available. For polytomous items, the graded response model (GRM) and the (generalized) partial credit model (GPCM) are available.
est_irt( x = NULL, data, D = 1, model = NULL, cats = NULL, item.id = NULL, fix.a.1pl = FALSE, fix.a.gpcm = FALSE, fix.g = FALSE, a.val.1pl = 1, a.val.gpcm = 1, g.val = 0.2, use.aprior = FALSE, use.bprior = FALSE, use.gprior = TRUE, aprior = list(dist = "lnorm", params = c(0, 0.5)), bprior = list(dist = "norm", params = c(0, 1)), gprior = list(dist = "beta", params = c(5, 16)), missing = NA, Quadrature = c(49, 6), weights = NULL, group.mean = 0, group.var = 1, EmpHist = FALSE, use.startval = FALSE, Etol = 1e-04, MaxE = 500, control = list(iter.max = 200), fipc = FALSE, fipc.method = "MEM", fix.loc = NULL, fix.id = NULL, se = TRUE, verbose = TRUE )
est_irt( x = NULL, data, D = 1, model = NULL, cats = NULL, item.id = NULL, fix.a.1pl = FALSE, fix.a.gpcm = FALSE, fix.g = FALSE, a.val.1pl = 1, a.val.gpcm = 1, g.val = 0.2, use.aprior = FALSE, use.bprior = FALSE, use.gprior = TRUE, aprior = list(dist = "lnorm", params = c(0, 0.5)), bprior = list(dist = "norm", params = c(0, 1)), gprior = list(dist = "beta", params = c(5, 16)), missing = NA, Quadrature = c(49, 6), weights = NULL, group.mean = 0, group.var = 1, EmpHist = FALSE, use.startval = FALSE, Etol = 1e-04, MaxE = 500, control = list(iter.max = 200), fipc = FALSE, fipc.method = "MEM", fix.loc = NULL, fix.id = NULL, se = TRUE, verbose = TRUE )
x |
A data frame containing the item metadata. This metadata is necessary to obtain the information of
each item (i.e., number of score categories and IRT model) to be calibrated. You can easily create an empty
item metadata using the function |
data |
A matrix containing examinees' response data for the items in the argument |
D |
A scaling factor in IRT models to make the logistic function as close as possible to the normal ogive function (if set to 1.7). Default is 1. |
model |
A vector of character strings indicating what IRT model is used to calibrate each item. Available IRT models are
"1PLM", "2PLM", "3PLM", and "DRM" for dichotomous items, and "GRM" and "GPCM" for polytomous items. "GRM" and "GPCM" represent the graded
response model and (generalized) partial credit model, respectively. Note that "DRM" is considered as "3PLM" in this function.
If a single character of the IRT model is specified, that model will be recycled across all items. The provided information in the |
cats |
A numeric vector specifying the number of score categories for each item. For example, a dichotomous
item has two score categories. If a single numeric value is specified, that value will be recycled across all items. If |
item.id |
A character vector of item IDs. If NULL, the item IDs are generated automatically. When |
fix.a.1pl |
A logical value. If TRUE, the slope parameters of the 1PLM items are fixed to a specific value specified in the argument
|
fix.a.gpcm |
A logical value. If TRUE, the GPCM items are calibrated with the partial credit model and the slope parameters of
the GPCM items are fixed to a specific value specified in the argument |
fix.g |
A logical value. If TRUE, the guessing parameters of the 3PLM items are fixed to a specific value specified in the argument
|
a.val.1pl |
A numeric value. This value is used to fixed the slope parameters of the 1PLM items. |
a.val.gpcm |
A numeric value. This value is used to fixed the slope parameters of the GPCM items. |
g.val |
A numeric value. This value is used to fixed the guessing parameters of the 3PLM items. |
use.aprior |
A logical value. If TRUE, a prior distribution for the slope parameters is used for the parameter calibration across all items. Default is FALSE. |
use.bprior |
A logical value. If TRUE, a prior distribution for the difficulty (or threshold) parameters is used for the parameter calibration across all items. Default is FALSE. |
use.gprior |
A logical value. If TRUE, a prior distribution for the guessing parameters is used for the parameter calibration across all 3PLM items. Default is TRUE. |
aprior |
A list containing the information of the prior distribution for item slope parameters. Three probability distributions
of Beta, Log-normal, and Normal distributions are available. In the list, a character string of the distribution name must be specified
in the first internal argument and a vector of two numeric values for the two parameters of the distribution must be specified in the
second internal argument. Specifically, when Beta distribution is used, "beta" should be specified in the first argument. When Log-normal
distribution is used, "lnorm" should be specified in the first argument. When Normal distribution is used, "norm" should be specified
in the first argument. In terms of the two parameters of the three distributions, see |
bprior |
A list containing the information of the prior distribution for item difficulty (or threshold) parameters. Three probability distributions
of Beta, Log-normal, and Normal distributions are available. In the list, a character string of the distribution name must be specified
in the first internal argument and a vector of two numeric values for the two parameters of the distribution must be specified in the
second internal argument. Specifically, when Beta distribution is used, "beta" should be specified in the first argument. When Log-normal
distribution is used, "lnorm" should be specified in the first argument. When Normal distribution is used, "norm" should be specified
in the first argument. In terms of the two parameters of the three distributions, see |
gprior |
A list containing the information of the prior distribution for item guessing parameters. Three probability distributions
of Beta, Log-normal, and Normal distributions are available. In the list, a character string of the distribution name must be specified
in the first internal argument and a vector of two numeric values for the two parameters of the distribution must be specified in the
second internal argument. Specifically, when Beta distribution is used, "beta" should be specified in the first argument. When Log-normal
distribution is used, "lnorm" should be specified in the first argument. When Normal distribution is used, "norm" should be specified
in the first argument. In terms of the two parameters of the three distributions, see |
missing |
A value indicating missing values in the response data set. Default is NA. |
Quadrature |
A numeric vector of two components specifying the number of quadrature points (in the first component) and the symmetric minimum and maximum values of these points (in the second component). For example, a vector of c(49, 6) indicates 49 rectangular quadrature points over -6 and 6. The quadrature points are used in the E step of the EM algorithm. Default is c(49, 6). |
weights |
A two-column matrix or data frame containing the quadrature points (in the first column) and the corresponding weights
(in the second column) of the latent variable prior distribution. If not NULL, the scale of the latent ability distribution will be will be fixed
to the scale of the provided quadrature points and weights. The weights and quadrature points can be easily obtained
using the function |
group.mean |
A numeric value to set the mean of latent variable prior distribution when |
group.var |
A positive numeric value to set the variance of latent variable prior distribution when |
EmpHist |
A logical value. If TRUE, the empirical histogram of the latent variable prior distribution is simultaneously estimated with the item parameters using Woods's (2007) approach. The items are calibrated against the estimated empirical prior distributions. See below for details. |
use.startval |
A logical value. If TRUE, the item parameters provided in the item metadata (i.e., the argument |
Etol |
A positive numeric value. This value sets the convergence criterion for E steps of the EM algorithm. Default is 1e-4. |
MaxE |
A positive integer value. This value determines the maximum number of the E steps in the EM algorithm. Default is 500. |
control |
A list of control parameters to be passed to the optimization function of |
fipc |
A logical value. If TRUE, FIPC is implemented for item parameter estimation. When |
fipc.method |
A character string specifying the FIPC method. Available methods include "OEM" for "No Prior Weights Updating and One EM Cycle
(NWU-OEM; Wainer & Mislevy, 1990)" and "MEM" for "Multiple Prior Weights Updating and Multiple EM Cycles (MWU-MEM; Kim, 2006)."
When |
fix.loc |
A vector of positive integer values specifying the locations of the items to be fixed in the item metadata (i.e., |
fix.id |
A vector of character strings specifying IDs of the items to be fixed when the FIPC is implemented (i.e., |
se |
A logical value. If FALSE, the standard errors of the item parameter estimates are not computed. Default is TRUE. |
verbose |
A logical value. If FALSE, all progress messages including the process information on the EM algorithm are suppressed. Default is TRUE. |
A specific form of a data frame should be used for the argument x
. The first column should have item IDs,
the second column should contain unique score category numbers of the items, and the third column should include IRT models being fit to the items.
The available IRT models are "1PLM", "2PLM", "3PLM", and "DRM" for dichotomous item data, and "GRM" and "GPCM" for polytomous item data.
Note that "DRM" covers all dichotomous IRT models (i.e, "1PLM", "2PLM", and "3PLM") and "GRM" and "GPCM" represent the graded
response model and (generalized) partial credit model, respectively. The next columns should include the item parameters of the fitted IRT models.
For dichotomous items, the fourth, fifth, and sixth columns represent the item discrimination (or slope), item difficulty, and
item guessing parameters, respectively. When "1PLM" and "2PLM" are specified in the third column, NAs should be inserted in the sixth column
for the item guessing parameters. For polytomous items, the item discrimination (or slope) parameters should be included in the
fourth column and the item difficulty (or threshold) parameters of category boundaries should be contained from the fifth to the last columns.
When the number of unique score categories differs between items, the empty cells of item parameters should be filled with NAs.
In the irtQ package, the item difficulty (or threshold) parameters of category boundaries for GPCM are expressed as
the item location (or overall difficulty) parameter subtracted by the threshold parameter for unique score categories of the item.
Note that when an GPCM item has K unique score categories, K-1 item difficulty parameters are necessary because
the item difficulty parameter for the first category boundary is always 0. For example, if an GPCM item has five score categories,
four item difficulty parameters should be specified. An example of a data frame with a single-format test is as follows:
ITEM1 | 2 | 1PLM | 1.000 | 1.461 | NA |
ITEM2 | 2 | 2PLM | 1.921 | -1.049 | NA |
ITEM3 | 2 | 3PLM | 1.736 | 1.501 | 0.203 |
ITEM4 | 2 | 3PLM | 0.835 | -1.049 | 0.182 |
ITEM5 | 2 | DRM | 0.926 | 0.394 | 0.099 |
And an example of a data frame for a mixed-format test is as follows:
ITEM1 | 2 | 1PLM | 1.000 | 1.461 | NA | NA | NA |
ITEM2 | 2 | 2PLM | 1.921 | -1.049 | NA | NA | NA |
ITEM3 | 2 | 3PLM | 0.926 | 0.394 | 0.099 | NA | NA |
ITEM4 | 2 | DRM | 1.052 | -0.407 | 0.201 | NA | NA |
ITEM5 | 4 | GRM | 1.913 | -1.869 | -1.238 | -0.714 | NA |
ITEM6 | 5 | GRM | 1.278 | -0.724 | -0.068 | 0.568 | 1.072 |
ITEM7 | 4 | GPCM | 1.137 | -0.374 | 0.215 | 0.848 | NA |
ITEM8 | 5 | GPCM | 1.233 | -2.078 | -1.347 | -0.705 | -0.116 |
See IRT Models
section in the page of irtQ-package
for more details about the IRT models used in the irtQ package.
An easier way to create a data frame for the argument x
is by using the function shape_df
.
To fit the IRT models to data, the IRT model and the number of score category information for the estimated items must be provided as well as
the item response data. There are two way to provide the IRT model and score category information. The first way is to provide the item metadata
to the argument x
. As explained above, the item metadata can be easily created by the function shape_df
. The second way is
specify the IRT models and the score category information into the arguments of model
and cats
. Thus, if x=NULL
, the specified
information in model
and cats
are used.
To implement FIPC, however, the item metadata must be provided in the argument x
. This is because the item parameters of the fixed items
in the item metadata are used to estimate the characteristic of the underlying latent variable prior distribution when calibrating the rest of freely estimated items.
More specifically, the underlying latent variable prior distribution of the fixed items is estimated during the calibration of the freely estimated items
to put the item parameters of the freely estimated items on the scale of the fixed item parameters (Kim, 2006).
In terms of approaches for FIPC, Kim (2006) described five different methods. Among them, two methods are available in the
function est_irt
. The first method is "NWU-OEM" where uses just one E step in the EM algorithm, involving data from only the fixed items, and
just one M step, involving data from only non-fixed items. This method is suggested by Wainer and Mislevy (1990) in the context of online calibration. This method
can be implemented by setting fipc.method = "OEM"
. The second method is "MWU-MEM" which iteratively updates the latent variable prior distribution and
finds the parameter estimates of the non-fixed items. In this method, the same procedure of NWU-OEM method is applied to the first EM cycle. From the second
EM cycle, both the parameters of non-fixed items and the weights of the prior distribution are concurrently updated. This method can be implemented by
setting fipc.method = "MEM"
. See Kim (2006) for more details.
When fipc = TRUE
, the information of which items are fixed needs to be provided via either fix.loc
or fix.id
. For example, suppose that
five items in which IDs are CMC1, CMC2, CMC3, CMC4, and CMC5 should be fixed and all item IDs are provided in X
or item.id
. Also, the five items are
located in the 1st through 5th rows of the item metadata (i.e., x
). Then the item parameters of the five items can be fixed by setting
fix.loc = c(1, 2, 3, 4, 5)
or fix.id = c("CMC1", "CMC2", "CMC3", "CMC4", "CMC5")
. Note that if both arguments are not NULL, the information
provided into the fix.loc
argument is ignored.
When EmpHist = TRUE
, the empirical histogram (i.e., densities at the quadrature points) of latent variable prior distribution is simultaneously estimated
with the item parameters. If fipc = TRUE
given EmpHist = TRUE
, the scale parameters (e.g., mean and variance) of the empirical prior distribution
are estimated as well. If fipc = FALSE
given EmpHist = TRUE
, the scale parameters of the empirical prior distribution are fixed to the values specified
in the arguments of group.mean
and group.var
. When EmpHist = FALSE
, the normal prior distribution is used during the item parameter estimation.
If fipc = TRUE
given EmpHist = FALSE
, the scale parameters of the normal prior distribution are estimated as well as the item parameters.
If fipc = FALSE
given EmpHist = FALSE
, the scale parameters of the normal prior distribution are fixed to the values specified in the arguments
of group.mean
and group.var
.
This function returns an object of class est_irt
. Within this object, several internal objects are contained such as:
estimates |
A data frame containing both the item parameter estimates and the corresponding standard errors of estimates. |
par.est |
A data frame containing the item parameter estimates. |
se.est |
A data frame containing the standard errors of the item parameter estimates. Note that the standard errors are estimated using the cross-production approximation method (Meilijson, 1989). |
pos.par |
A data frame containing the position number of each item parameter being estimated. The position information is useful when interpreting the variance-covariance matrix of item parameter estimates. |
covariance |
A matrix of variance-covariance matrix of item parameter estimates. |
loglikelihood |
A sum of the log-likelihood values of the observed data set (marginal log-likelihood) across all items in the data set |
aic |
A model fit statistic of Akaike information criterion based on the loglikelihood. |
bic |
A model fit statistic of Bayesian information criterion based on the loglikelihood. |
group.par |
A data frame containing the mean, variance, and standard deviation of latent variable prior distribution. |
weights |
A two-column data frame containing the quadrature points (in the first column) and the corresponding weights (in the second column) of the (updated) latent variable prior distribution. |
posterior.dist |
A matrix of normalized posterior densities for all the response patterns at each of the quadrature points. The row and column indicate each individual's response pattern and the quadrature point, respectively. |
data |
A data.frame of the examinees' response data set. |
scale.D |
A scaling factor in IRT models. |
ncase |
A total number of response patterns. |
nitem |
A total number of items included in the response data. |
Etol |
A convergence criteria for E steps of the EM algorithm. |
MaxE |
The maximum number of E steps in the EM algorithm. |
aprior |
A list containing the information of the prior distribution for item slope parameters. |
gprior |
A list containing the information of the prior distribution for item guessing parameters. |
npar.est |
A total number of the estimated parameters. |
niter |
The number of EM cycles completed. |
maxpar.diff |
A maximum item parameter change when the EM cycles were completed. |
EMtime |
Time (in seconds) spent for the EM cycles. |
SEtime |
Time (in seconds) spent for computing the standard errors of the item parameter estimates. |
TotalTime |
Time (in seconds) spent for total compuatation. |
test.1 |
Status of the first-order test to report if the gradients has vanished sufficiently for the solution to be stable. |
test.2 |
Status of the second-order test to report if the information matrix is positive definite, which is a prerequisite for the solution to be a possible maximum. |
var.note |
A note to report if the variance-covariance matrix of item parameter estimates is obtainable from the information matrix. |
fipc |
A logical value to indicate if FIPC was used. |
fipc.method |
A method used for the FIPC. |
fix.loc |
A vector of integer values specifying the locations of the fixed items when the FIPC was implemented. |
The internal objects can be easily extracted using the function getirt
.
Hwanggyu Lim [email protected]
Ban, J. C., Hanson, B. A., Wang, T., Yi, Q., & Harris, D., J. (2001) A comparative study of on-line pretest item calibration/scaling methods in computerized adaptive testing. Journal of Educational Measurement, 38(3), 191-212.
Bock, R. D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46, 443-459.
Kim, S. (2006). A comparative study of IRT fixed parameter calibration methods. Journal of Educational Measurement, 43(4), 355-381.
Meilijson, I. (1989). A fast improvement to the EM algorithm on its own terms. Journal of the Royal Statistical Society: Series B (Methodological), 51, 127-138.
Stocking, M. L. (1988). Scale drift in on-line calibration (Research Rep. 88-28). Princeton, NJ: ETS.
Wainer, H., & Mislevy, R. J. (1990). Item response theory, item calibration, and proficiency estimation. In H. Wainer (Ed.), Computer adaptive testing: A primer (Chap. 4, pp.65-102). Hillsdale, NJ: Lawrence Erlbaum.
Woods, C. M. (2007). Empirical histograms in item response theory with ordinal data. Educational and Psychological Measurement, 67(1), 73-87.
est_item
, irtfit
, info
, simdat
, shape_df
, sx2_fit
,
traceline.est_irt
, getirt
## ---------------------------------------------------------------------- # 1. item parameter estimation for the dichotomous item data (LSAT6) ## ---------------------------------------------------------------------- # fit the 1PL model to LSAT6 data and constrain the slope parameters # to be equal (mod.1pl.c <- est_irt(data = LSAT6, D = 1, model = "1PLM", cats = 2, fix.a.1pl = FALSE)) # summary of the estimation summary(mod.1pl.c) # extract the item parameter estimates getirt(mod.1pl.c, what = "par.est") # extract the standard error estimates getirt(mod.1pl.c, what = "se.est") # fit the 1PL model to LSAT6 data and fix the slope parameters to 1.0 (mod.1pl.f <- est_irt(data = LSAT6, D = 1, model = "1PLM", cats = 2, fix.a.1pl = TRUE, a.val.1pl = 1)) # summary of the estimation summary(mod.1pl.f) # fit the 2PL model to LSAT6 data (mod.2pl <- est_irt(data = LSAT6, D = 1, model = "2PLM", cats = 2)) # summary of the estimation summary(mod.2pl) # assess the fit of the 2PL model to the LSAT5 data using S-X2 fit statistic (sx2fit.2pl <- sx2_fit(x = mod.2pl)) # compute the item and test information at several theta points theta <- seq(-4, 4, 0.1) (info.2pl <- info(x = mod.2pl, theta = theta)) # draw the test characteristic curve plot (trace.2pl <- traceline(x = mod.2pl, theta = theta)) plot(trace.2pl) # draw the item characteristic curve for the 1st item plot(trace.2pl, item.loc = 1) # fit the 2PL model to LSAT6 data and # estimate the empirical histogram of latent variable prior distribution # also use a less stringent convergence criterion for E-step (mod.2pl.hist <- est_irt(data = LSAT6, D = 1, model = "2PLM", cats = 2, EmpHist = TRUE, Etol = 0.001)) (emphist <- getirt(mod.2pl.hist, what = "weights")) plot(emphist$weight ~ emphist$theta, type = "h") # fit the 3PL model to LSAT6 data and use the Beta prior distribution for # the guessing parameters (mod.3pl <- est_irt( data = LSAT6, D = 1, model = "3PLM", cats = 2, use.gprior = TRUE, gprior = list(dist = "beta", params = c(5, 16)) )) # summary of the estimation summary(mod.3pl) # fit the 3PL model to LSAT6 data, but fix the guessing parameters to be 0.2 (mod.3pl.f <- est_irt(data = LSAT6, D = 1, model = "3PLM", cats = 2, fix.g = TRUE, g.val = 0.2)) # summary of the estimation summary(mod.3pl.f) # fit the different dichotomous models to each item of LSAT6 data # fit the constrained 1PL model to the 1st, 2nd, and 3rd items, fit the 2PL model to # the 4th item, and fit the 3PL model to the 5th item with the Beta prior of # the guessing parameter (mod.drm.mix <- est_irt( data = LSAT6, D = 1, model = c("1PLM", "1PLM", "1PLM", "2PLM", "3PLM"), cats = 2, fix.a.1pl = FALSE, use.gprior = TRUE, gprior = list(dist = "beta", params = c(5, 16)) )) # summary of the estimation summary(mod.drm.mix) ## ------------------------------------------------------------------------------ # 2. item parameter estimation for the mixed-item format data (simulation data) ## ------------------------------------------------------------------------------ ## import the "-prm.txt" output file from flexMIRT flex_sam <- system.file("extdata", "flexmirt_sample-prm.txt", package = "irtQ") # select the item metadata x <- bring.flexmirt(file = flex_sam, "par")$Group1$full_df # modify the item metadata so that the 39th and 40th items follow GPCM x[39:40, 3] <- "GPCM" # generate 1,000 examinees' latent abilities from N(0, 1) set.seed(37) score1 <- rnorm(1000, mean = 0, sd = 1) # simulate the response data sim.dat1 <- simdat(x = x, theta = score1, D = 1) # fit the 3PL model to all dichotomous items, fit the GPCM model to 39th and 40th items, # and fit the GRM model to the 53th, 54th, 55th items. # use the beta prior distribution for the guessing parameters, use the log-normal # prior distribution for the slope parameters, and use the normal prior distribution # for the difficulty (or threshold) parameters. # also, specify the argument 'x' to provide the IRT model and score category information # for items item.meta <- shape_df(item.id = x$id, cats = x$cats, model = x$model, default.par = TRUE) (mod.mix1 <- est_irt( x = item.meta, data = sim.dat1, D = 1, use.aprior = TRUE, use.bprior = TRUE, use.gprior = TRUE, aprior = list(dist = "lnorm", params = c(0.0, 0.5)), bprior = list(dist = "norm", params = c(0.0, 2.0)), gprior = list(dist = "beta", params = c(5, 16)) )) # summary of the estimation summary(mod.mix1) # estimate examinees' latent scores given the item parameter estimates using the MLE (score.mle <- est_score(x = mod.mix1, method = "ML", range = c(-4, 4), ncore = 2)) # compute the traditional fit statistics (fit.mix1 <- irtfit( x = mod.mix1, score = score.mle$est.theta, group.method = "equal.width", n.width = 10, loc.theta = "middle" )) # residual plots for the first item (dichotomous item) plot( x = fit.mix1, item.loc = 1, type = "both", ci.method = "wald", show.table = TRUE, ylim.sr.adjust = TRUE ) # residual plots for the last item (polytomous item) plot( x = fit.mix1, item.loc = 55, type = "both", ci.method = "wald", show.table = FALSE, ylim.sr.adjust = TRUE ) # fit the 2PL model to all dichotomous items, fit the GPCM model to 39th and 40th items, # and fit the GRM model to the 53th, 54th, 55th items. # also, specify the arguments of 'model' and 'cats' to provide the IRT model and # score category information for items (mod.mix2 <- est_irt( data = sim.dat1, D = 1, model = c(rep("2PLM", 38), rep("GPCM", 2), rep("2PLM", 12), rep("GRM", 3)), cats = c(rep(2, 38), rep(5, 2), rep(2, 12), rep(5, 3)) )) # summary of the estimation summary(mod.mix2) # fit the 2PL model to all dichotomous items, fit the GPCM model to 39th and 40th items, # fit the GRM model to the 53th, 54th, 55th items, and estimate the empirical histogram # of latent variable prior distribution. # also, specify the arguments of 'model' and 'cats' to provide the IRT model and # score category information for items (mod.mix3 <- est_irt( data = sim.dat1, D = 1, model = c(rep("2PLM", 38), rep("GPCM", 2), rep("2PLM", 12), rep("GRM", 3)), cats = c(rep(2, 38), rep(5, 2), rep(2, 12), rep(5, 3)), EmpHist = TRUE )) (emphist <- getirt(mod.mix3, what = "weights")) plot(emphist$weight ~ emphist$theta, type = "h") # fit the 2PL model to all dichotomous items, # fit the PCM model to 39th and 40th items by fixing the slope parameters to 1, # and fit the GRM model to the 53th, 54th, 55th items. # also, specify the arguments of 'model' and 'cats' to provide the IRT model and # score category information for items (mod.mix4 <- est_irt( data = sim.dat1, D = 1, model = c(rep("2PLM", 38), rep("GPCM", 2), rep("2PLM", 12), rep("GRM", 3)), cats = c(rep(2, 38), rep(5, 2), rep(2, 12), rep(5, 3)), fix.a.gpcm = TRUE, a.val.gpcm = 1 )) # summary of the estimation summary(mod.mix4) ## ------------------------------------------------------------------------------ # 3. fixed item parameter calibration (FIPC) for the mixed-item format data # (simulation data) ## ------------------------------------------------------------------------------ ## import the "-prm.txt" output file from flexMIRT flex_sam <- system.file("extdata", "flexmirt_sample-prm.txt", package = "irtQ") # select the item metadata x <- bring.flexmirt(file = flex_sam, "par")$Group1$full_df # generate 1,000 examinees' latent abilities from N(0.4, 1.3) set.seed(20) score2 <- rnorm(1000, mean = 0.4, sd = 1.3) # simulate the response data sim.dat2 <- simdat(x = x, theta = score2, D = 1) # fit the 3PL model to all dichotomous items, fit the GRM model to all polytomous data, # fix the five 3PL items (1st - 5th items) and three GRM items (53rd to 55th items) # also, estimate the empirical histogram of latent variable # use the MEM method. fix.loc <- c(1:5, 53:55) (mod.fix1 <- est_irt( x = x, data = sim.dat2, D = 1, use.gprior = TRUE, gprior = list(dist = "beta", params = c(5, 16)), EmpHist = TRUE, Etol = 1e-3, fipc = TRUE, fipc.method = "MEM", fix.loc = fix.loc )) (prior.par <- mod.fix1$group.par) (emphist <- getirt(mod.fix1, what = "weights")) plot(emphist$weight ~ emphist$theta, type = "h") # summary of the estimation summary(mod.fix1) # or the same five items can be fixed by providing their item IDs to the 'fix.id' argument # in this case, set fix.loc = NULL fix.id <- c(x$id[1:5], x$id[53:55]) (mod.fix1 <- est_irt( x = x, data = sim.dat2, D = 1, use.gprior = TRUE, gprior = list(dist = "beta", params = c(5, 16)), EmpHist = TRUE, Etol = 1e-3, fipc = TRUE, fipc.method = "MEM", fix.loc = NULL, fix.id = fix.id )) # summary of the estimation summary(mod.fix1) # fit the 3PL model to all dichotomous items, fit the GRM model to all polytomous data, # fix the five 3PL items (1st - 5th items) and three GRM items (53rd to 55th items) # at this moment, do estimate the empirical histogram of latent variable. # instead, estimate the scale of normal prior distribution of latent variable # use the MEM method. fix.loc <- c(1:5, 53:55) (mod.fix2 <- est_irt( x = x, data = sim.dat2, D = 1, use.gprior = TRUE, gprior = list(dist = "beta", params = c(5, 16)), EmpHist = FALSE, Etol = 1e-3, fipc = TRUE, fipc.method = "MEM", fix.loc = fix.loc )) (prior.par <- mod.fix2$group.par) (emphist <- getirt(mod.fix2, what = "weights")) plot(emphist$weight ~ emphist$theta, type = "h") # fit the 3PL model to all dichotomous items, fit the GRM model to all polytomous data, # at this moment fix only the five 3PL items (1st - 5th items) # and estimate the empirical histogram of latent variable. # use the OEM method. Thus, only 1 EM cycle is used. fix.loc <- c(1:5) (mod.fix3 <- est_irt( x = x, data = sim.dat2, D = 1, use.gprior = TRUE, gprior = list(dist = "beta", params = c(5, 16)), EmpHist = TRUE, Etol = 1e-3, fipc = TRUE, fipc.method = "OEM", fix.loc = fix.loc )) (prior.par <- mod.fix3$group.par) (emphist <- getirt(mod.fix3, what = "weights")) plot(emphist$weight ~ emphist$theta, type = "h") # summary of the estimation summary(mod.fix3) # fit the 3PL model to all dichotomous items, fit the GRM model to all polytomous data, # at this moment fix all 55 items and estimate only the latent ability distribution # using the MEM method. fix.loc <- c(1:55) (mod.fix4 <- est_irt( x = x, data = sim.dat2, D = 1, EmpHist = TRUE, Etol = 1e-3, fipc = TRUE, fipc.method = "MEM", fix.loc = fix.loc )) (prior.par <- mod.fix4$group.par) (emphist <- getirt(mod.fix4, what = "weights")) plot(emphist$weight ~ emphist$theta, type = "h") # summary of the estimation summary(mod.fix4) # or all 55 items can be fixed by providing their item IDs to the 'fix.id' argument # in this case, set fix.loc = NULL fix.id <- x$id (mod.fix4 <- est_irt( x = x, data = sim.dat2, D = 1, EmpHist = TRUE, Etol = 1e-3, fipc = TRUE, fipc.method = "MEM", fix.loc = NULL, fix.id = fix.id )) # summary of the estimation summary(mod.fix4)
## ---------------------------------------------------------------------- # 1. item parameter estimation for the dichotomous item data (LSAT6) ## ---------------------------------------------------------------------- # fit the 1PL model to LSAT6 data and constrain the slope parameters # to be equal (mod.1pl.c <- est_irt(data = LSAT6, D = 1, model = "1PLM", cats = 2, fix.a.1pl = FALSE)) # summary of the estimation summary(mod.1pl.c) # extract the item parameter estimates getirt(mod.1pl.c, what = "par.est") # extract the standard error estimates getirt(mod.1pl.c, what = "se.est") # fit the 1PL model to LSAT6 data and fix the slope parameters to 1.0 (mod.1pl.f <- est_irt(data = LSAT6, D = 1, model = "1PLM", cats = 2, fix.a.1pl = TRUE, a.val.1pl = 1)) # summary of the estimation summary(mod.1pl.f) # fit the 2PL model to LSAT6 data (mod.2pl <- est_irt(data = LSAT6, D = 1, model = "2PLM", cats = 2)) # summary of the estimation summary(mod.2pl) # assess the fit of the 2PL model to the LSAT5 data using S-X2 fit statistic (sx2fit.2pl <- sx2_fit(x = mod.2pl)) # compute the item and test information at several theta points theta <- seq(-4, 4, 0.1) (info.2pl <- info(x = mod.2pl, theta = theta)) # draw the test characteristic curve plot (trace.2pl <- traceline(x = mod.2pl, theta = theta)) plot(trace.2pl) # draw the item characteristic curve for the 1st item plot(trace.2pl, item.loc = 1) # fit the 2PL model to LSAT6 data and # estimate the empirical histogram of latent variable prior distribution # also use a less stringent convergence criterion for E-step (mod.2pl.hist <- est_irt(data = LSAT6, D = 1, model = "2PLM", cats = 2, EmpHist = TRUE, Etol = 0.001)) (emphist <- getirt(mod.2pl.hist, what = "weights")) plot(emphist$weight ~ emphist$theta, type = "h") # fit the 3PL model to LSAT6 data and use the Beta prior distribution for # the guessing parameters (mod.3pl <- est_irt( data = LSAT6, D = 1, model = "3PLM", cats = 2, use.gprior = TRUE, gprior = list(dist = "beta", params = c(5, 16)) )) # summary of the estimation summary(mod.3pl) # fit the 3PL model to LSAT6 data, but fix the guessing parameters to be 0.2 (mod.3pl.f <- est_irt(data = LSAT6, D = 1, model = "3PLM", cats = 2, fix.g = TRUE, g.val = 0.2)) # summary of the estimation summary(mod.3pl.f) # fit the different dichotomous models to each item of LSAT6 data # fit the constrained 1PL model to the 1st, 2nd, and 3rd items, fit the 2PL model to # the 4th item, and fit the 3PL model to the 5th item with the Beta prior of # the guessing parameter (mod.drm.mix <- est_irt( data = LSAT6, D = 1, model = c("1PLM", "1PLM", "1PLM", "2PLM", "3PLM"), cats = 2, fix.a.1pl = FALSE, use.gprior = TRUE, gprior = list(dist = "beta", params = c(5, 16)) )) # summary of the estimation summary(mod.drm.mix) ## ------------------------------------------------------------------------------ # 2. item parameter estimation for the mixed-item format data (simulation data) ## ------------------------------------------------------------------------------ ## import the "-prm.txt" output file from flexMIRT flex_sam <- system.file("extdata", "flexmirt_sample-prm.txt", package = "irtQ") # select the item metadata x <- bring.flexmirt(file = flex_sam, "par")$Group1$full_df # modify the item metadata so that the 39th and 40th items follow GPCM x[39:40, 3] <- "GPCM" # generate 1,000 examinees' latent abilities from N(0, 1) set.seed(37) score1 <- rnorm(1000, mean = 0, sd = 1) # simulate the response data sim.dat1 <- simdat(x = x, theta = score1, D = 1) # fit the 3PL model to all dichotomous items, fit the GPCM model to 39th and 40th items, # and fit the GRM model to the 53th, 54th, 55th items. # use the beta prior distribution for the guessing parameters, use the log-normal # prior distribution for the slope parameters, and use the normal prior distribution # for the difficulty (or threshold) parameters. # also, specify the argument 'x' to provide the IRT model and score category information # for items item.meta <- shape_df(item.id = x$id, cats = x$cats, model = x$model, default.par = TRUE) (mod.mix1 <- est_irt( x = item.meta, data = sim.dat1, D = 1, use.aprior = TRUE, use.bprior = TRUE, use.gprior = TRUE, aprior = list(dist = "lnorm", params = c(0.0, 0.5)), bprior = list(dist = "norm", params = c(0.0, 2.0)), gprior = list(dist = "beta", params = c(5, 16)) )) # summary of the estimation summary(mod.mix1) # estimate examinees' latent scores given the item parameter estimates using the MLE (score.mle <- est_score(x = mod.mix1, method = "ML", range = c(-4, 4), ncore = 2)) # compute the traditional fit statistics (fit.mix1 <- irtfit( x = mod.mix1, score = score.mle$est.theta, group.method = "equal.width", n.width = 10, loc.theta = "middle" )) # residual plots for the first item (dichotomous item) plot( x = fit.mix1, item.loc = 1, type = "both", ci.method = "wald", show.table = TRUE, ylim.sr.adjust = TRUE ) # residual plots for the last item (polytomous item) plot( x = fit.mix1, item.loc = 55, type = "both", ci.method = "wald", show.table = FALSE, ylim.sr.adjust = TRUE ) # fit the 2PL model to all dichotomous items, fit the GPCM model to 39th and 40th items, # and fit the GRM model to the 53th, 54th, 55th items. # also, specify the arguments of 'model' and 'cats' to provide the IRT model and # score category information for items (mod.mix2 <- est_irt( data = sim.dat1, D = 1, model = c(rep("2PLM", 38), rep("GPCM", 2), rep("2PLM", 12), rep("GRM", 3)), cats = c(rep(2, 38), rep(5, 2), rep(2, 12), rep(5, 3)) )) # summary of the estimation summary(mod.mix2) # fit the 2PL model to all dichotomous items, fit the GPCM model to 39th and 40th items, # fit the GRM model to the 53th, 54th, 55th items, and estimate the empirical histogram # of latent variable prior distribution. # also, specify the arguments of 'model' and 'cats' to provide the IRT model and # score category information for items (mod.mix3 <- est_irt( data = sim.dat1, D = 1, model = c(rep("2PLM", 38), rep("GPCM", 2), rep("2PLM", 12), rep("GRM", 3)), cats = c(rep(2, 38), rep(5, 2), rep(2, 12), rep(5, 3)), EmpHist = TRUE )) (emphist <- getirt(mod.mix3, what = "weights")) plot(emphist$weight ~ emphist$theta, type = "h") # fit the 2PL model to all dichotomous items, # fit the PCM model to 39th and 40th items by fixing the slope parameters to 1, # and fit the GRM model to the 53th, 54th, 55th items. # also, specify the arguments of 'model' and 'cats' to provide the IRT model and # score category information for items (mod.mix4 <- est_irt( data = sim.dat1, D = 1, model = c(rep("2PLM", 38), rep("GPCM", 2), rep("2PLM", 12), rep("GRM", 3)), cats = c(rep(2, 38), rep(5, 2), rep(2, 12), rep(5, 3)), fix.a.gpcm = TRUE, a.val.gpcm = 1 )) # summary of the estimation summary(mod.mix4) ## ------------------------------------------------------------------------------ # 3. fixed item parameter calibration (FIPC) for the mixed-item format data # (simulation data) ## ------------------------------------------------------------------------------ ## import the "-prm.txt" output file from flexMIRT flex_sam <- system.file("extdata", "flexmirt_sample-prm.txt", package = "irtQ") # select the item metadata x <- bring.flexmirt(file = flex_sam, "par")$Group1$full_df # generate 1,000 examinees' latent abilities from N(0.4, 1.3) set.seed(20) score2 <- rnorm(1000, mean = 0.4, sd = 1.3) # simulate the response data sim.dat2 <- simdat(x = x, theta = score2, D = 1) # fit the 3PL model to all dichotomous items, fit the GRM model to all polytomous data, # fix the five 3PL items (1st - 5th items) and three GRM items (53rd to 55th items) # also, estimate the empirical histogram of latent variable # use the MEM method. fix.loc <- c(1:5, 53:55) (mod.fix1 <- est_irt( x = x, data = sim.dat2, D = 1, use.gprior = TRUE, gprior = list(dist = "beta", params = c(5, 16)), EmpHist = TRUE, Etol = 1e-3, fipc = TRUE, fipc.method = "MEM", fix.loc = fix.loc )) (prior.par <- mod.fix1$group.par) (emphist <- getirt(mod.fix1, what = "weights")) plot(emphist$weight ~ emphist$theta, type = "h") # summary of the estimation summary(mod.fix1) # or the same five items can be fixed by providing their item IDs to the 'fix.id' argument # in this case, set fix.loc = NULL fix.id <- c(x$id[1:5], x$id[53:55]) (mod.fix1 <- est_irt( x = x, data = sim.dat2, D = 1, use.gprior = TRUE, gprior = list(dist = "beta", params = c(5, 16)), EmpHist = TRUE, Etol = 1e-3, fipc = TRUE, fipc.method = "MEM", fix.loc = NULL, fix.id = fix.id )) # summary of the estimation summary(mod.fix1) # fit the 3PL model to all dichotomous items, fit the GRM model to all polytomous data, # fix the five 3PL items (1st - 5th items) and three GRM items (53rd to 55th items) # at this moment, do estimate the empirical histogram of latent variable. # instead, estimate the scale of normal prior distribution of latent variable # use the MEM method. fix.loc <- c(1:5, 53:55) (mod.fix2 <- est_irt( x = x, data = sim.dat2, D = 1, use.gprior = TRUE, gprior = list(dist = "beta", params = c(5, 16)), EmpHist = FALSE, Etol = 1e-3, fipc = TRUE, fipc.method = "MEM", fix.loc = fix.loc )) (prior.par <- mod.fix2$group.par) (emphist <- getirt(mod.fix2, what = "weights")) plot(emphist$weight ~ emphist$theta, type = "h") # fit the 3PL model to all dichotomous items, fit the GRM model to all polytomous data, # at this moment fix only the five 3PL items (1st - 5th items) # and estimate the empirical histogram of latent variable. # use the OEM method. Thus, only 1 EM cycle is used. fix.loc <- c(1:5) (mod.fix3 <- est_irt( x = x, data = sim.dat2, D = 1, use.gprior = TRUE, gprior = list(dist = "beta", params = c(5, 16)), EmpHist = TRUE, Etol = 1e-3, fipc = TRUE, fipc.method = "OEM", fix.loc = fix.loc )) (prior.par <- mod.fix3$group.par) (emphist <- getirt(mod.fix3, what = "weights")) plot(emphist$weight ~ emphist$theta, type = "h") # summary of the estimation summary(mod.fix3) # fit the 3PL model to all dichotomous items, fit the GRM model to all polytomous data, # at this moment fix all 55 items and estimate only the latent ability distribution # using the MEM method. fix.loc <- c(1:55) (mod.fix4 <- est_irt( x = x, data = sim.dat2, D = 1, EmpHist = TRUE, Etol = 1e-3, fipc = TRUE, fipc.method = "MEM", fix.loc = fix.loc )) (prior.par <- mod.fix4$group.par) (emphist <- getirt(mod.fix4, what = "weights")) plot(emphist$weight ~ emphist$theta, type = "h") # summary of the estimation summary(mod.fix4) # or all 55 items can be fixed by providing their item IDs to the 'fix.id' argument # in this case, set fix.loc = NULL fix.id <- x$id (mod.fix4 <- est_irt( x = x, data = sim.dat2, D = 1, EmpHist = TRUE, Etol = 1e-3, fipc = TRUE, fipc.method = "MEM", fix.loc = NULL, fix.id = fix.id )) # summary of the estimation summary(mod.fix4)
This function performs the fixed ability parameter calibration (FAPC), often called Method A, which is the maximum likelihood estimation of item parameters given the ability estimates (Baker & Kim, 2004; Ban, Hanson, Wang, Yi, & Harris, 2001; Stocking, 1988). Also, this could be considered as a special type of the joint maximum likelihood estimation where only one cycle of parameter estimation is implemented given the ability estimates (Birnbaum, 1968). FAPC is one of potentially useful online item calibration methods for computerized adaptive testing (CAT) to put the parameter estimates of pretest items on the same scale of operational item parameter estimates and recalibrate the operational items to evaluate the parameter drifts of the operational items (Chen & Wang, 2016; Stocking, 1988).
est_item( x = NULL, data, score, D = 1, model = NULL, cats = NULL, item.id = NULL, fix.a.1pl = FALSE, fix.a.gpcm = FALSE, fix.g = FALSE, a.val.1pl = 1, a.val.gpcm = 1, g.val = 0.2, use.aprior = FALSE, use.bprior = FALSE, use.gprior = TRUE, aprior = list(dist = "lnorm", params = c(0, 0.5)), bprior = list(dist = "norm", params = c(0, 1)), gprior = list(dist = "beta", params = c(5, 17)), missing = NA, use.startval = FALSE, control = list(eval.max = 500, iter.max = 500), verbose = TRUE )
est_item( x = NULL, data, score, D = 1, model = NULL, cats = NULL, item.id = NULL, fix.a.1pl = FALSE, fix.a.gpcm = FALSE, fix.g = FALSE, a.val.1pl = 1, a.val.gpcm = 1, g.val = 0.2, use.aprior = FALSE, use.bprior = FALSE, use.gprior = TRUE, aprior = list(dist = "lnorm", params = c(0, 0.5)), bprior = list(dist = "norm", params = c(0, 1)), gprior = list(dist = "beta", params = c(5, 17)), missing = NA, use.startval = FALSE, control = list(eval.max = 500, iter.max = 500), verbose = TRUE )
x |
A data frame containing the item metadata. This metadata is necessary to obtain the information of
each item (i.e., number of score categories and IRT model) to be calibrated. You can easily create an empty
item metadata using the function |
data |
A matrix containing examinees' response data for the items in the argument |
score |
A vector of examinees' ability estimates. Length of the vector must be the same as the number of rows in the response data set. |
D |
A scaling factor in IRT models to make the logistic function as close as possible to the normal ogive function (if set to 1.7). Default is 1. |
model |
A vector of character strings indicating what IRT model is used to calibrate each item. Available IRT models are
"1PLM", "2PLM", "3PLM", and "DRM" for dichotomous items, and "GRM" and "GPCM" for polytomous items. "GRM" and "GPCM" represent the graded
response model and (generalized) partial credit model, respectively. Note that "DRM" is considered as "3PLM" in this function.
If a single character of the IRT model is specified, that model will be recycled across all items. The provided information in the |
cats |
A numeric vector specifying the number of score categories for each item. For example, a dichotomous
item has two score categories. If a single numeric value is specified, that value will be recycled across all items. If |
item.id |
A character vector of item IDs. If NULL, the item IDs are generated automatically. Default is NULL. |
fix.a.1pl |
A logical value. If TRUE, the slope parameters of the 1PLM items are fixed to a specific value specified in the argument
|
fix.a.gpcm |
A logical value. If TRUE, the GPCM items are calibrated with the partial credit model and the slope parameters of
the GPCM items are fixed to a specific value specified in the argument |
fix.g |
A logical value. If TRUE, the guessing parameters of the 3PLM items are fixed to a specific value specified in the argument
|
a.val.1pl |
A numeric value. This value is used to fixed the slope parameters of the 1PLM items. |
a.val.gpcm |
A numeric value. This value is used to fixed the slope parameters of the GPCM items. |
g.val |
A numeric value. This value is used to fixed the guessing parameters of the 3PLM items. |
use.aprior |
A logical value. If TRUE, a prior distribution for the slope parameters is used for the parameter calibration across all items. Default is FALSE. |
use.bprior |
A logical value. If TRUE, a prior distribution for the difficulty (or threshold) parameters is used for the parameter calibration across all items. Default is FALSE. |
use.gprior |
A logical value. If TRUE, a prior distribution for the guessing parameters is used for the parameter calibration across all 3PLM items. Default is TRUE. |
aprior |
A list containing the information of the prior distribution for item slope parameters. Three probability distributions
of Beta, Log-normal, and Normal distributions are available. In the list, a character string of the distribution name must be specified
in the first internal argument and a vector of two numeric values for the two parameters of the distribution must be specified in the
second internal argument. Specifically, when Beta distribution is used, "beta" should be specified in the first argument. When Log-normal
distribution is used, "lnorm" should be specified in the first argument. When Normal distribution is used, "norm" should be specified
in the first argument. In terms of the two parameters of the three distributions, see |
bprior |
A list containing the information of the prior distribution for item difficulty (or threshold) parameters. Three probability distributions
of Beta, Log-normal, and Normal distributions are available. In the list, a character string of the distribution name must be specified
in the first internal argument and a vector of two numeric values for the two parameters of the distribution must be specified in the
second internal argument. Specifically, when Beta distribution is used, "beta" should be specified in the first argument. When Log-normal
distribution is used, "lnorm" should be specified in the first argument. When Normal distribution is used, "norm" should be specified
in the first argument. In terms of the two parameters of the three distributions, see |
gprior |
A list containing the information of the prior distribution for item guessing parameters. Three probability distributions
of Beta, Log-normal, and Normal distributions are available. In the list, a character string of the distribution name must be specified
in the first internal argument and a vector of two numeric values for the two parameters of the distribution must be specified in the
second internal argument. Specifically, when Beta distribution is used, "beta" should be specified in the first argument. When Log-normal
distribution is used, "lnorm" should be specified in the first argument. When Normal distribution is used, "norm" should be specified
in the first argument. In terms of the two parameters of the three distributions, see |
missing |
A value indicating missing values in the response data set. Default is NA. |
use.startval |
A logical value. If TRUE, the item parameters provided in the item metadata (i.e., the argument |
control |
A list of control parameters to be passed to the optimization function of |
verbose |
A logical value. If FALSE, all progress messages are suppressed. Default is TRUE. |
In most cases, the function est_item
will return successfully converged item parameter estimates using
the default internal starting values. However, if there is a convergence problem in the calibration, one possible solution is using
different starting values. When the item parameter values are specified in the item metadata (i.e., the argument x
), those values
can be used as the starting values for the item parameter calibration by setting use.startval = TRUE
.
This function returns an object of class est_item
. Within this object, several internal objects are contained such as:
estimates |
A data frame containing both the item parameter estimates and the corresponding standard errors of estimates. |
par.est |
A data frame containing the item parameter estimates. |
se.est |
A data frame containing the standard errors of the item parameter estimates. Note that the standard errors are estimated using observed information functions. |
pos.par |
A data frame containing the position number of item parameters being estimated. The position information is useful when interpreting the variance-covariance matrix of item parameter estimates. |
covariance |
A matrix of variance-covariance matrix of item parameter estimates. |
loglikelihood |
A sum of the log-likelihood values of the complete data set across all estimated items. |
data |
A data frame of the examinees' response data set. |
score |
A vector of the examinees' ability values used as the fixed effects. |
scale.D |
A scaling factor in IRT models. |
convergence |
A string indicating the convergence status of the item parameter estimation. |
nitem |
A total number of items included in the response data. |
deleted.item |
The items which have no item response data. Those items are excluded from the item parameter estimation. |
npar.est |
A total number of the estimated parameters. |
n.response |
An integer vector indicating the number of item responses for each item used to estimate the item parameters. |
TotalTime |
Time (in seconds) spent for total compuatation. |
The internal objects can be easily extracted using the function getirt
.
Hwanggyu Lim [email protected]
Baker, F. B., & Kim, S. H. (2004). Item response theory: Parameter estimation techniques. CRC Press.
Ban, J. C., Hanson, B. A., Wang, T., Yi, Q., & Harris, D., J. (2001) A comparative study of on-line pretest item calibration/scaling methods in computerized adaptive testing. Journal of Educational Measurement, 38(3), 191-212.
Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee's ability. In F. M. Lord & M. R. Novick (Eds.), Statistical theories of mental test scores (pp. 397-479). Reading, MA: Addison-Wesley.
Chen, P., & Wang, C. (2016). A new online calibration method for multidimensional computerized adaptive testing. Psychometrika, 81(3), 674-701.
Stocking, M. L. (1988). Scale drift in on-line calibration (Research Rep. 88-28). Princeton, NJ: ETS.
irtfit
, info
, simdat
, shape_df
, sx2_fit
,
traceline.est_item
, getirt
## import the "-prm.txt" output file from flexMIRT flex_sam <- system.file("extdata", "flexmirt_sample-prm.txt", package = "irtQ") # select the item metadata x <- bring.flexmirt(file = flex_sam, "par")$Group1$full_df # modify the item metadata so that some items follow 1PLM, 2PLM and GPCM x[c(1:3, 5), 3] <- "1PLM" x[c(1:3, 5), 4] <- 1 x[c(1:3, 5), 6] <- 0 x[c(4, 8:12), 3] <- "2PLM" x[c(4, 8:12), 6] <- 0 x[54:55, 3] <- "GPCM" # generate examinees' abilities from N(0, 1) set.seed(23) score <- rnorm(500, mean = 0, sd = 1) # simulate the response data data <- simdat(x = x, theta = score, D = 1) # 1) item parameter estimation: constrain the slope parameters of the 1PLM to be equal (mod1 <- est_item(x, data, score, D = 1, fix.a.1pl = FALSE, use.gprior = TRUE, gprior = list(dist = "beta", params = c(5, 17)), use.startval = FALSE )) summary(mod1) # extract the item parameter estimates getirt(mod1, what = "par.est") # 2) item parameter estimation: fix the slope parameters of the 1PLM to 1 (mod2 <- est_item(x, data, score, D = 1, fix.a.1pl = TRUE, a.val.1pl = 1, use.gprior = TRUE, gprior = list(dist = "beta", params = c(5, 17)), use.startval = FALSE )) summary(mod2) # extract the standard error estimates getirt(mod2, what = "se.est") # 3) item parameter estimation: fix the guessing parameters of the 3PLM to 0.2 (mod3 <- est_item(x, data, score, D = 1, fix.a.1pl = TRUE, fix.g = TRUE, a.val.1pl = 1, g.val = .2, use.startval = FALSE )) summary(mod3) # extract both item parameter and standard error estimates getirt(mod2, what = "estimates")
## import the "-prm.txt" output file from flexMIRT flex_sam <- system.file("extdata", "flexmirt_sample-prm.txt", package = "irtQ") # select the item metadata x <- bring.flexmirt(file = flex_sam, "par")$Group1$full_df # modify the item metadata so that some items follow 1PLM, 2PLM and GPCM x[c(1:3, 5), 3] <- "1PLM" x[c(1:3, 5), 4] <- 1 x[c(1:3, 5), 6] <- 0 x[c(4, 8:12), 3] <- "2PLM" x[c(4, 8:12), 6] <- 0 x[54:55, 3] <- "GPCM" # generate examinees' abilities from N(0, 1) set.seed(23) score <- rnorm(500, mean = 0, sd = 1) # simulate the response data data <- simdat(x = x, theta = score, D = 1) # 1) item parameter estimation: constrain the slope parameters of the 1PLM to be equal (mod1 <- est_item(x, data, score, D = 1, fix.a.1pl = FALSE, use.gprior = TRUE, gprior = list(dist = "beta", params = c(5, 17)), use.startval = FALSE )) summary(mod1) # extract the item parameter estimates getirt(mod1, what = "par.est") # 2) item parameter estimation: fix the slope parameters of the 1PLM to 1 (mod2 <- est_item(x, data, score, D = 1, fix.a.1pl = TRUE, a.val.1pl = 1, use.gprior = TRUE, gprior = list(dist = "beta", params = c(5, 17)), use.startval = FALSE )) summary(mod2) # extract the standard error estimates getirt(mod2, what = "se.est") # 3) item parameter estimation: fix the guessing parameters of the 3PLM to 0.2 (mod3 <- est_item(x, data, score, D = 1, fix.a.1pl = TRUE, fix.g = TRUE, a.val.1pl = 1, g.val = .2, use.startval = FALSE )) summary(mod3) # extract both item parameter and standard error estimates getirt(mod2, what = "estimates")
This function conducts a multiple-group item calibration (Bock & Zimowski, 1997) using the marginal maximum likelihood estimation via the expectation-maximization (MMLE-EM) algorithm (Bock & Aitkin, 1981). This function also implements the multiple-group fixed item parameter calibration (MG-FIPC; e.g., Kim & Kolen, 2016). The MG-FIPC is an extension of the single-group FIPC method (Kim, 2006) to multiple-group data. For dichotomous items, IRT one-, two-, and three-parameter logistic models are available. For polytomous items, the graded response model (GRM) and the (generalized) partial credit model (GPCM) are available.
est_mg( x = NULL, data, group.name = NULL, D = 1, model = NULL, cats = NULL, item.id = NULL, free.group = NULL, fix.a.1pl = FALSE, fix.a.gpcm = FALSE, fix.g = FALSE, a.val.1pl = 1, a.val.gpcm = 1, g.val = 0.2, use.aprior = FALSE, use.bprior = FALSE, use.gprior = TRUE, aprior = list(dist = "lnorm", params = c(0, 0.5)), bprior = list(dist = "norm", params = c(0, 1)), gprior = list(dist = "beta", params = c(5, 16)), missing = NA, Quadrature = c(49, 6), weights = NULL, group.mean = 0, group.var = 1, EmpHist = FALSE, use.startval = FALSE, Etol = 0.001, MaxE = 500, control = list(eval.max = 200, iter.max = 200), fipc = FALSE, fipc.method = "MEM", fix.loc = NULL, fix.id = NULL, se = TRUE, verbose = TRUE )
est_mg( x = NULL, data, group.name = NULL, D = 1, model = NULL, cats = NULL, item.id = NULL, free.group = NULL, fix.a.1pl = FALSE, fix.a.gpcm = FALSE, fix.g = FALSE, a.val.1pl = 1, a.val.gpcm = 1, g.val = 0.2, use.aprior = FALSE, use.bprior = FALSE, use.gprior = TRUE, aprior = list(dist = "lnorm", params = c(0, 0.5)), bprior = list(dist = "norm", params = c(0, 1)), gprior = list(dist = "beta", params = c(5, 16)), missing = NA, Quadrature = c(49, 6), weights = NULL, group.mean = 0, group.var = 1, EmpHist = FALSE, use.startval = FALSE, Etol = 0.001, MaxE = 500, control = list(eval.max = 200, iter.max = 200), fipc = FALSE, fipc.method = "MEM", fix.loc = NULL, fix.id = NULL, se = TRUE, verbose = TRUE )
x |
A list containing item metadata across all groups to be analyzed. For example, if five group data are analyzed, the list should include
five internal objects of the item metadata for the five groups. The internal objects of the list should be ordered in accordance with the order of the group names
specified in the |
data |
A list containing item response matrices across all groups to be analyzed. For example, if five group data are analyzed, the list should include
five internal objects of the response data matrices for the five groups. The internal objects of the list should be ordered in accordance with the order of the group names
specified in the |
group.name |
A vector of character strings indicating group names. For example, if five group data are analyzed, |
D |
D A scaling factor in IRT models to make the logistic function as close as possible to the normal ogive function (if set to 1.7). Default is 1. |
model |
A list containing character vectors of the IRT models used to calibrate items across all groups' data. For example,
if five group data are analyzed, the list should include five internal objects of the character vectors for the five groups.
The internal objects of the list should be ordered in accordance with the order of the group names specified in the |
cats |
A list containing numeric vectors specifying the number of score categories of items across all groups' data to be analyzed. For example,
if five group data are analyzed, the list should include five internal objects of the numeric vectors for the five groups. The internal objects of the list
should be ordered in accordance with the order of the group names specified in the |
item.id |
A list containing character vectors of item IDs across all groups' data to be analyzed. For example,
if five group data are analyzed, the list should include five internal objects of the character vectors of item IDs for the five groups.
The internal objects of the list should be ordered in accordance with the order of the group names specified in the |
free.group |
A numeric or character vector indicating groups in which scales (i.e., a mean and standard deviation) of the latent ability
distributions are freely estimated. The scales of other groups not specified in this argument are fixed based on the information provided in
the |
fix.a.1pl |
A logical value. If TRUE, the slope parameters of the 1PLM items are fixed to a specific value specified in the argument
|
fix.a.gpcm |
A logical value. If TRUE, the GPCM items are calibrated with the partial credit model and the slope parameters of
the GPCM items are fixed to a specific value specified in the argument |
fix.g |
A logical value. If TRUE, the guessing parameters of the 3PLM items are fixed to a specific value specified in the argument
|
a.val.1pl |
A numeric value. This value is used to fixed the slope parameters of the 1PLM items. |
a.val.gpcm |
A numeric value. This value is used to fixed the slope parameters of the GPCM items. |
g.val |
A numeric value. This value is used to fixed the guessing parameters of the 3PLM items. |
use.aprior |
A logical value. If TRUE, a prior distribution for the slope parameters is used for the parameter calibration across all items. Default is FALSE. |
use.bprior |
A logical value. If TRUE, a prior distribution for the difficulty (or threshold) parameters is used for the parameter calibration across all items. Default is FALSE. |
use.gprior |
A logical value. If TRUE, a prior distribution for the guessing parameters is used for the parameter calibration across all 3PLM items. Default is TRUE. |
aprior |
A list containing the information of the prior distribution for item slope parameters. Three probability distributions
of Beta, Log-normal, and Normal distributions are available. In the list, a character string of the distribution name must be specified
in the first internal argument and a vector of two numeric values for the two parameters of the distribution must be specified in the
second internal argument. Specifically, when Beta distribution is used, "beta" should be specified in the first argument. When Log-normal
distribution is used, "lnorm" should be specified in the first argument. When Normal distribution is used, "norm" should be specified
in the first argument. In terms of the two parameters of the three distributions, see |
bprior |
A list containing the information of the prior distribution for item difficulty (or threshold) parameters. Three probability distributions
of Beta, Log-normal, and Normal distributions are available. In the list, a character string of the distribution name must be specified
in the first internal argument and a vector of two numeric values for the two parameters of the distribution must be specified in the
second internal argument. Specifically, when Beta distribution is used, "beta" should be specified in the first argument. When Log-normal
distribution is used, "lnorm" should be specified in the first argument. When Normal distribution is used, "norm" should be specified
in the first argument. In terms of the two parameters of the three distributions, see |
gprior |
A list containing the information of the prior distribution for item guessing parameters. Three probability distributions
of Beta, Log-normal, and Normal distributions are available. In the list, a character string of the distribution name must be specified
in the first internal argument and a vector of two numeric values for the two parameters of the distribution must be specified in the
second internal argument. Specifically, when Beta distribution is used, "beta" should be specified in the first argument. When Log-normal
distribution is used, "lnorm" should be specified in the first argument. When Normal distribution is used, "norm" should be specified
in the first argument. In terms of the two parameters of the three distributions, see |
missing |
A value indicating missing values in the response data set. Default is NA. |
Quadrature |
A numeric vector of two components specifying the number of quadrature points (in the first component) and the symmetric minimum and maximum values of these points (in the second component). For example, a vector of c(49, 6) indicates 49 rectangular quadrature points over -6 and 6. The quadrature points are used in the E step of the EM algorithm. Default is c(49, 6). |
weights |
A two-column matrix or data frame containing the quadrature points (in the first column) and the corresponding weights
(in the second column) of the latent variable prior distribution. If not NULL, the scale of the latent ability distributions of the groups that
are not freely estimated (i.e., the groups not specified in the |
group.mean |
A numeric value to set the mean of latent variable prior distribution when |
group.var |
A positive numeric value to set the variance of latent variable prior distribution when |
EmpHist |
A logical value. If TRUE, the empirical histograms of the latent variable prior distributions across all groups are simultaneously estimated with the item parameters using Woods's (2007) approach. The items are calibrated against the estimated empirical prior distributions. |
use.startval |
A logical value. If TRUE, the item parameters provided in the item metadata (i.e., the argument |
Etol |
A positive numeric value. This value sets the convergence criterion for E steps of the EM algorithm. Default is 1e-3. |
MaxE |
A positive integer value. This value determines the maximum number of the E steps in the EM algorithm. Default is 500. |
control |
A list of control parameters to be passed to the optimization function of |
fipc |
A logical value. If TRUE, the MG-FIPC is implemented for item parameter estimation. When |
fipc.method |
A character string specifying the FIPC method. Available methods include "OEM" for "No Prior Weights Updating and One EM Cycle
(NWU-OEM; Wainer & Mislevy, 1990)" and "MEM" for "Multiple Prior Weights Updating and Multiple EM Cycles (MWU-MEM; Kim, 2006)."
When |
fix.loc |
A list of positive integer vectors. Each internal integer vector indicates the locations of the items to be fixed in the item metadata
(i.e., |
fix.id |
A vector of character strings specifying IDs of the items to be fixed when the MG-FIPC is implemented (i.e., |
se |
A logical value. If FALSE, the standard errors of the item parameter estimates are not computed. Default is TRUE. |
verbose |
A logical value. If FALSE, all progress messages including the process information on the EM algorithm are suppressed. Default is TRUE. |
The multiple-group (MG) item calibration (Bock & Zimowski, 1996) provides a unified approach to the testing situations or problems involving multiple groups such as nonequivalent groups equating, vertical scaling, and identification of differential item functioning (DIF). In such contexts, typically there exist test takers from different groups responding to the same test form or to the common items shared between different test forms.
The goal of the MG item calibration is to estimate the item parameters and the latent ability distributions of the multiple groups
simultaneously (Bock & Zimowski, 1996). In the irtQ package, the est_mg
function supports the MG item calibration
using the marginal maximum likelihood estimation via the expectation-maximization (MMLE-EM) algorithm (Bock & Aitkin, 1981). In addition,
The function also supports the MG fixed item parameter calibration (MG-FIPC; e.g., Kim & Kolen, 2016) if the parameters of certain items
need to be fixed across multiple test forms,
In MG IRT analyses, it is commonly seen that the test forms of multiple groups share some common (or anchor) items between the groups.
By default the common items that have the same item IDs between different groups are automatically constrained so that they can have the same
item parameter estimates across the groups in the est_mg
function.
Most of the features of the est_mg
function are similar with those of the est_irt
function. The main difference is
that several arguments in the est_mg
functions take an object of a list format which contains several internal objects
corresponding to the groups to be analyzed. Those arguments include x
, data
, model
, cats
, item.id
,
and fix.loc
.
Also, the est_mg
has two new arguments, the group.name
and free.group
. The group.name
is required to
assign a unique group name to each group. The order of internal objects in the lists provided in the x
, data
, model
, cats
,
item.id
, and fix.loc
arguments must match that of the group names supplied to the group.name
argument.
The free.group
argument is necessary to indicate the freed groups in which scales (i.e., a mean and standard deviation) of the latent ability
distributions are estimated. When no item parameters are fixed (i.e., fipc = FALSE
), at least one group should have a fixed scale
(e.g., a mean of 0 and variance of 1) of the latent ability distribution among the multiple groups that shares common items in order to
solve the scale indeterminancy in IRT estimation. By providing which are the freed groups into the free.group
argument, the scales
of the groups specified in the free.group
argument are freely estimated while the scales of all other groups not specified
in the argument are fixed based on the information provided in the group.mean
and group.var
arguments or
the weights
argument.
The situations where the implementation of MG-FIPC is necessary arise when the new latent ability scales from MG test data are linked to the established scale (e.g., the scale of an item bank). In a single run of MG-FIPC method, all the parameters of freed items across multiple test forms and the density of the latent ability distributions for multiple groups can be estimated on the scale of the fixed items (Kim & Kolen, 2016). Suppose that three different test forms, Form 1, Form 2, and Form 3, are administered to three nonequivalent groups, Group1, Group2, and Group3. Form 1 and Form 2 share 12 common items, C1I1 to C1I12, and Form 2 and Form 3 share 10 common items, C2I1 to C2I10, while there is no common item between Form 1 and Form 3. Also, suppose that all unique items of Form 1 came from an item bank, which were already calibrated on the scale of the item bank. In this case, the goal of the MG-FIPC is to estimate the parameters of all the items across the three test forms, except the unique items in Form 1, and the latent ability distributions of the three groups on the same scale of the item bank. To accomplish this task, the unique items in Form 1 need to be fixed during the MG-FIPC to link the current MG test data and the item bank.
The est_mg
function can implement the MG-FIPC by setting fipc = TRUE
. Then, the information of which items are fixed needs
to be supplied via either of fix.loc
or fix.id
. When utilizing the fix.loc
argument, a list of the locations of the items that are
fixed in each group test form should be prepared. For example, suppose that three group data are analyzed. For the first group test form,
the 1st, 3rd, and 5th items are fixed, for the second group test form, the 2nd, 3rd, 4th, and 7th items are fixed, and for the third group test form,
the 1st, 2nd, and 6th items are fixed. Then fix.loc = list(c(1, 3, 5), c(2, 3, 4, 7), c(1, 2, 6))
. Instead of using the fix.loc
, the fix.id
argument can be used by providing a vector of the IDs of the items to be fixed. Again suppose that the three group data are analyzed. For the first group
test form, the three items in which IDs are G1I1, C1I1, are C1I2 are fixed. For the second group test form, the four items in which IDs are C1I1, C1I2, C2I1,
and C2I2 are fixed. For the third group test form, the three items in which IDs are C2I1, C2I2, and G3I1 are fixed. Then there are six unique items to
be fixed across the three groups (i.e., G1I1, C1I1, C1I2, C2I1, C2I2, and G3I1) because C1I1 and C1I2 are common items between the first and the second groups,
and C2I1 and C2I2 are common items between the second and the third groups. Thus, fix.id = c("G1I1", "C1I1", "C1I2", "C2I1", "C2I2", "G3I1")
should be
set. Note that when the information of item IDs supplied in the fix.id
argument overrides the information provided into the fix.loc
argument.
This function returns an object of class est_mg
. Within this object, several internal objects are contained such as:
estimates |
A list containing two internal objects (i.e., overall and group) of the item parameter estimates and the corresponding standard errors of estimates. The first internal object (overall) is a data frame of the item parameter and standard error estimates for the combined data set across all groups. Accordingly, the data frame includes unique items across all groups. The second internal object (group) is a list of group specific data frames containing item parameter and standard error estimates |
par.est |
A list containing two internal objects (i.e., overall and group) of the item parameter estimates. The format of the list is the same with the internal object of 'estimates' |
se.est |
A list containing two internal objects (i.e., overall and group) of the standard errors of item parameter estimates. The format of the list is the same with the internal object of 'estimates'. Note that the standard errors are estimated using the cross-production approximation method (Meilijson, 1989). |
pos.par |
A data frame containing the position number of each item parameter being estimated. This item position data frame was created based on the combined data sets across all groups (see the first internal object of 'estimates'). The position information is useful when interpreting the variance-covariance matrix of item parameter estimates. |
covariance |
A matrix of variance-covariance matrix of item parameter estimates. This matrix was created based on the combined data sets across all groups (see the first internal object of 'estimates') |
loglikelihood |
A list containing two internal objects (i.e., overall and group) of the log-likelihood values of observed data set (marginal log-likelihood). The format of the list is the same with the internal object of 'estimates'. Specifically, the first internal object (overall) contains a sum of the log-likelihood values of the observed data set across all unique items of all groups. The second internal object (group) shows the group specific log-likelihood values. |
aic |
A model fit statistic of Akaike information criterion based on the loglikelihood of all unique items.. |
bic |
A model fit statistic of Bayesian information criterion based on the loglikelihood of all unique items. |
group.par |
A list containing the summary statistics (i.e., a mean, variance, and standard deviation) of latent variable prior distributions across all groups. |
weights |
a list of the two-column data frames containing the quadrature points (in the first column) and the corresponding weights (in the second column) of the (updated) latent variable prior distributions for all groups. |
posterior.dist |
A matrix of normalized posterior densities for all the response patterns at each of the quadrature points. The row and column indicate each individual's response pattern and the quadrature point, respectively. |
data |
A list containing two internal objects (i.e., overall and group) of the examinees' response data sets. The format of the list is the same with the internal object of 'estimates'. |
scale.D |
A scaling factor in IRT models. |
ncase |
A list containing two internal objects (i.e., overall and group) with the total number of response patterns. The format of the list is the same with the internal object of 'estimates'. |
nitem |
A list containing two internal objects (i.e., overall and group) with the total number of items included in the response data set. The format of the list is the same with the internal object of 'estimates'. |
Etol |
A convergence criteria for E steps of the EM algorithm. |
MaxE |
The maximum number of E steps in the EM algorithm. |
aprior |
A list containing the information of the prior distribution for item slope parameters. |
gprior |
A list containing the information of the prior distribution for item guessing parameters. |
npar.est |
A total number of the estimated parameters across all unique items. |
niter |
The number of EM cycles completed. |
maxpar.diff |
A maximum item parameter change when the EM cycles were completed. |
EMtime |
Time (in seconds) spent for the EM cycles. |
SEtime |
Time (in seconds) spent for computing the standard errors of the item parameter estimates. |
TotalTime |
Time (in seconds) spent for total compuatation. |
test.1 |
Status of the first-order test to report if the gradients has vanished sufficiently for the solution to be stable. |
test.2 |
Status of the second-order test to report if the information matrix is positive definite, which is a prerequisite for the solution to be a possible maximum. |
var.note |
A note to report if the variance-covariance matrix of item parameter estimates is obtainable from the information matrix. |
fipc |
A logical value to indicate if FIPC was used. |
fipc.method |
A method used for the FIPC. |
fix.loc |
A list containing two internal objects (i.e., overall and group) with the locations of the fixed items when the FIPC was implemented. The format of the list is the same with the internal object of 'estimates'. |
The internal objects can be easily extracted using the function getirt
.
Hwanggyu Lim [email protected]
Bock, R. D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46, 443-459.
Bock, R. D., & Zimowski, M. F. (1997). Multiple group IRT. In W. J. van der Linden & R. K. Hambleton (Eds.), Handbook of modern item response theory (pp. 433-448). New York: Springer.
Kim, S. (2006). A comparative study of IRT fixed parameter calibration methods. Journal of Educational Measurement, 43(4), 355-381.
Kim, S., & Kolen, M. J. (2016). Multiple group IRT fixed-parameter estimation for maintaining an extablished ability scale. Center for Advanced Studies in Measurement and Assessment Report, 49.
Meilijson, I. (1989). A fast improvement to the EM algorithm on its own terms. Journal of the Royal Statistical Society: Series B (Methodological), 51, 127-138.
Woods, C. M. (2007). Empirical histograms in item response theory with ordinal data. Educational and Psychological Measurement, 67(1), 73-87.
## ------------------------------------------------------------------------------ # 1. MG-calibration using simMG data # - Details : # (a) constrain the common items between the groups to have # the same item parameters (i.e., items C1I1 - C1I12 between # Groups 1 and 2, and items C2I1 - C2I10 between Groups 2 and 3) # (b) freely estimate the means and variances of ability # distributions except the reference group in which mean # and variance are fixed to 0 and 1, respectively. ## ------------------------------------------------------------------------------ # 1-(1). freely estimates the means and variances of groups 2 and 3 # import the true item metadata of the three groups x <- simMG$item.prm # extract model, score category, and item ID information from # the item metadata for the three groups model <- list(x$Group1$model, x$Group2$model, x$Group3$model) cats <- list(x$Group1$cats, x$Group2$cats, x$Group3$cats) item.id <- list(x$Group1$id, x$Group2$id, x$Group3$id) # import the simulated response data sets of the three groups data <- simMG$res.dat # import the group names of the three groups group.name <- simMG$group.name # set the groups 2 and 3 as the free groups in which scale # of the ability distributions will be freely estimated. # then, the group 1 will be a reference group in which the scale # of the ability distribution will be fixed to the values specified # in the 'group.mean' and 'group.var' arguments free.group <- c(2, 3) # or use 'free.group <- group.name[2:3]' # estimate the IRT parameters: # as long as the common items between any groups have the same item IDs, # their item parameters will be constrained to be the same between # the groups unless the FIPC is not implemented fit.1 <- est_mg( data = data, group.name = group.name, model = model, cats = cats, item.id = item.id, D = 1, free.group = free.group, use.gprior = TRUE, gprior = list(dist = "beta", params = c(5, 16)), group.mean = 0, group.var = 1, EmpHist = TRUE, Etol = 0.001, MaxE = 500 ) # summary of the estimation summary(fit.1) # extract the item parameter estimates getirt(fit.1, what = "par.est") # extract the standard error estimates getirt(fit.1, what = "se.est") # extract the group parameter estimates (i.e., scale parameters) getirt(fit.1, what = "group.par") # extract the posterior latent ability distributions of the groups getirt(fit.1, what = "weights") # 1-(2). or the same parameter estimation can be implemented by # inserting a list of item metadata for the groups into the 'x' # argument. if the item metadata contains the item parameters # which you want to use as starting values for the estimation, # you can set 'use.startval = TRUE'. # also set the groups in which scales of ability distributions # will be freely estimated using the group names free.group <- group.name[2:3] fit.2 <- est_mg( x = x, data = data, group.name = group.name, D = 1, free.group = free.group, use.gprior = TRUE, gprior = list(dist = "beta", params = c(5, 16)), group.mean = 0, group.var = 1, EmpHist = TRUE, use.startval = TRUE, Etol = 0.001, MaxE = 500 ) # summary of the estimation summary(fit.2) ## ------------------------------------------------------------------------------ # 2. MG-calibration with the FIPC using simMG data # - Details : # (a) fix the parameters of the common items between the groups # (i.e., items C1I1 - C1I12 between Groups 1 and 2, and # items C2I1 - C2I10 between Groups 2 and 3) # (b) freely estimate the means and variances of ability # distributions of all three groups ## ------------------------------------------------------------------------------ # 2-(1). freely estimates the means and variances of all three groups # set all three groups as the free groups in which scale # of the ability distributions will be freely estimated. free.group <- 1:3 # or use 'free.group <- group.name' # specify the locations of items to be fixed in each group data # note that for the first group, the common items C1I1 - C1I12 are located # in the 1st - 10th and 11th to 12th rows of the first group's item metadata # for the second group, the common items C1I1 - C1I12 are located # in the 1st - 12th rows of the second group's item metadata # also, for the second group, the common items C2I1 - C2I10 are located # in the 41st - 50th rows of the second group's item metadata # for the third group, the common items C2I1 - C2I10 are located # in the 1st - 10th rows of the third group's item metadata fix.loc <- list( c(1:10, 49:50), c(1:12, 41:50), c(1:10) ) # estimate the IRT parameters using FIPC: # when the FIPC is implemented, a list of the item metadata for all # groups must be provided into the 'x' argument. # for the fixed items, their item parameters must be specified # in the item metadata # for other non-fixed items, any parameter values can be provided # in the item metadata # also, set fipc = TRUE and fipc.method = "MEM" fit.3 <- est_mg( x = x, data = data, group.name = group.name, D = 1, free.group = free.group, use.gprior = TRUE, gprior = list(dist = "beta", params = c(5, 16)), EmpHist = TRUE, Etol = 0.001, MaxE = 500, fipc = TRUE, fipc.method = "MEM", fix.loc = fix.loc ) # summary of the estimation summary(fit.3) # extract the item parameter estimates getirt(fit.3, what = "par.est") # extract the standard error estimates getirt(fit.3, what = "se.est") # extract the group parameter estimates (i.e., scale parameters) getirt(fit.3, what = "group.par") # extract the posterior latent ability distributions of the groups getirt(fit.3, what = "weights") # 2-(2). or the FIPC can be implemented by providing which items # will be fixed in a different way using the 'fix.id' argument. # a vector of the fixed item IDs needs to be provided into the # 'fix.id' argument as such. fix.id <- c(paste0("C1I", 1:12), paste0("C2I", 1:10)) fit.4 <- est_mg( x = x, data = data, group.name = group.name, D = 1, free.group = free.group, use.gprior = TRUE, gprior = list(dist = "beta", params = c(5, 16)), EmpHist = TRUE, Etol = 0.001, MaxE = 500, fipc = TRUE, fipc.method = "MEM", fix.id = fix.id ) # summary of the estimation summary(fit.4) ## ------------------------------------------------------------------------------ # 3. MG-calibration with the FIPC using simMG data # (estimate the group parameters only) # - Details : # (a) fix all item parameters across all three groups # (b) freely estimate the means and variances of ability # distributions of all three groups ## ------------------------------------------------------------------------------ # 3-(1). freely estimates the means and variances of all three groups # set all three groups as the free groups in which scale # of the ability distributions will be freely estimated. free.group <- 1:3 # or use 'free.group <- group.name' # specify the locations of all item parameters to be fixed # in each item metadata fix.loc <- list(1:50, 1:50, 1:38) # estimate the group parameters only using FIPC: fit.5 <- est_mg( x = x, data = data, group.name = group.name, D = 1, free.group = free.group, use.gprior = TRUE, gprior = list(dist = "beta", params = c(5, 16)), EmpHist = TRUE, Etol = 0.001, MaxE = 500, fipc = TRUE, fipc.method = "MEM", fix.loc = fix.loc ) # summary of the estimation summary(fit.5) # extract the group parameter estimates (i.e., scale parameters) getirt(fit.5, what = "group.par") ## ------------------------------------------------------------------------------ # 4. MG-calibration with the FIPC using simMG data # (fix the unique items of the group 1 only) # - Details : # (a) fix item parameters of unique items in the group 1 only # (b) constrain the common items between the groups to have # the same item parameters (i.e., items C1I1 - C1I12 between # Groups 1 and 2, and items C2I1 - C2I10 between Groups 2 and 3) # (c) freely estimate the means and variances of ability # distributions of all three groups ## ------------------------------------------------------------------------------ # 4-(1). freely estimates the means and variances of all three groups # set all three groups as the free groups in which scale # of the ability distributions will be freely estimated. free.group <- group.name # or use 'free.group <- 1:3' # specify the item IDs of the unique items in the group 1 to be fixed # in each item metadata using the 'fix.id' argument or # you can use the 'fix.loc' argument by setting # 'fix.loc = list(11:48, NULL, NULL)' fix.id <- paste0("G1I", 1:38) # estimate the IRT parameters using FIPC: fit.6 <- est_mg( x = x, data = data, group.name = group.name, D = 1, free.group = free.group, use.gprior = TRUE, gprior = list(dist = "beta", params = c(5, 16)), EmpHist = TRUE, Etol = 0.001, MaxE = 500, fipc = TRUE, fipc.method = "MEM", fix.loc = NULL, fix.id = fix.id ) # summary of the estimation summary(fit.6) # extract the group parameter estimates (i.e., scale parameters) getirt(fit.6, what = "group.par")
## ------------------------------------------------------------------------------ # 1. MG-calibration using simMG data # - Details : # (a) constrain the common items between the groups to have # the same item parameters (i.e., items C1I1 - C1I12 between # Groups 1 and 2, and items C2I1 - C2I10 between Groups 2 and 3) # (b) freely estimate the means and variances of ability # distributions except the reference group in which mean # and variance are fixed to 0 and 1, respectively. ## ------------------------------------------------------------------------------ # 1-(1). freely estimates the means and variances of groups 2 and 3 # import the true item metadata of the three groups x <- simMG$item.prm # extract model, score category, and item ID information from # the item metadata for the three groups model <- list(x$Group1$model, x$Group2$model, x$Group3$model) cats <- list(x$Group1$cats, x$Group2$cats, x$Group3$cats) item.id <- list(x$Group1$id, x$Group2$id, x$Group3$id) # import the simulated response data sets of the three groups data <- simMG$res.dat # import the group names of the three groups group.name <- simMG$group.name # set the groups 2 and 3 as the free groups in which scale # of the ability distributions will be freely estimated. # then, the group 1 will be a reference group in which the scale # of the ability distribution will be fixed to the values specified # in the 'group.mean' and 'group.var' arguments free.group <- c(2, 3) # or use 'free.group <- group.name[2:3]' # estimate the IRT parameters: # as long as the common items between any groups have the same item IDs, # their item parameters will be constrained to be the same between # the groups unless the FIPC is not implemented fit.1 <- est_mg( data = data, group.name = group.name, model = model, cats = cats, item.id = item.id, D = 1, free.group = free.group, use.gprior = TRUE, gprior = list(dist = "beta", params = c(5, 16)), group.mean = 0, group.var = 1, EmpHist = TRUE, Etol = 0.001, MaxE = 500 ) # summary of the estimation summary(fit.1) # extract the item parameter estimates getirt(fit.1, what = "par.est") # extract the standard error estimates getirt(fit.1, what = "se.est") # extract the group parameter estimates (i.e., scale parameters) getirt(fit.1, what = "group.par") # extract the posterior latent ability distributions of the groups getirt(fit.1, what = "weights") # 1-(2). or the same parameter estimation can be implemented by # inserting a list of item metadata for the groups into the 'x' # argument. if the item metadata contains the item parameters # which you want to use as starting values for the estimation, # you can set 'use.startval = TRUE'. # also set the groups in which scales of ability distributions # will be freely estimated using the group names free.group <- group.name[2:3] fit.2 <- est_mg( x = x, data = data, group.name = group.name, D = 1, free.group = free.group, use.gprior = TRUE, gprior = list(dist = "beta", params = c(5, 16)), group.mean = 0, group.var = 1, EmpHist = TRUE, use.startval = TRUE, Etol = 0.001, MaxE = 500 ) # summary of the estimation summary(fit.2) ## ------------------------------------------------------------------------------ # 2. MG-calibration with the FIPC using simMG data # - Details : # (a) fix the parameters of the common items between the groups # (i.e., items C1I1 - C1I12 between Groups 1 and 2, and # items C2I1 - C2I10 between Groups 2 and 3) # (b) freely estimate the means and variances of ability # distributions of all three groups ## ------------------------------------------------------------------------------ # 2-(1). freely estimates the means and variances of all three groups # set all three groups as the free groups in which scale # of the ability distributions will be freely estimated. free.group <- 1:3 # or use 'free.group <- group.name' # specify the locations of items to be fixed in each group data # note that for the first group, the common items C1I1 - C1I12 are located # in the 1st - 10th and 11th to 12th rows of the first group's item metadata # for the second group, the common items C1I1 - C1I12 are located # in the 1st - 12th rows of the second group's item metadata # also, for the second group, the common items C2I1 - C2I10 are located # in the 41st - 50th rows of the second group's item metadata # for the third group, the common items C2I1 - C2I10 are located # in the 1st - 10th rows of the third group's item metadata fix.loc <- list( c(1:10, 49:50), c(1:12, 41:50), c(1:10) ) # estimate the IRT parameters using FIPC: # when the FIPC is implemented, a list of the item metadata for all # groups must be provided into the 'x' argument. # for the fixed items, their item parameters must be specified # in the item metadata # for other non-fixed items, any parameter values can be provided # in the item metadata # also, set fipc = TRUE and fipc.method = "MEM" fit.3 <- est_mg( x = x, data = data, group.name = group.name, D = 1, free.group = free.group, use.gprior = TRUE, gprior = list(dist = "beta", params = c(5, 16)), EmpHist = TRUE, Etol = 0.001, MaxE = 500, fipc = TRUE, fipc.method = "MEM", fix.loc = fix.loc ) # summary of the estimation summary(fit.3) # extract the item parameter estimates getirt(fit.3, what = "par.est") # extract the standard error estimates getirt(fit.3, what = "se.est") # extract the group parameter estimates (i.e., scale parameters) getirt(fit.3, what = "group.par") # extract the posterior latent ability distributions of the groups getirt(fit.3, what = "weights") # 2-(2). or the FIPC can be implemented by providing which items # will be fixed in a different way using the 'fix.id' argument. # a vector of the fixed item IDs needs to be provided into the # 'fix.id' argument as such. fix.id <- c(paste0("C1I", 1:12), paste0("C2I", 1:10)) fit.4 <- est_mg( x = x, data = data, group.name = group.name, D = 1, free.group = free.group, use.gprior = TRUE, gprior = list(dist = "beta", params = c(5, 16)), EmpHist = TRUE, Etol = 0.001, MaxE = 500, fipc = TRUE, fipc.method = "MEM", fix.id = fix.id ) # summary of the estimation summary(fit.4) ## ------------------------------------------------------------------------------ # 3. MG-calibration with the FIPC using simMG data # (estimate the group parameters only) # - Details : # (a) fix all item parameters across all three groups # (b) freely estimate the means and variances of ability # distributions of all three groups ## ------------------------------------------------------------------------------ # 3-(1). freely estimates the means and variances of all three groups # set all three groups as the free groups in which scale # of the ability distributions will be freely estimated. free.group <- 1:3 # or use 'free.group <- group.name' # specify the locations of all item parameters to be fixed # in each item metadata fix.loc <- list(1:50, 1:50, 1:38) # estimate the group parameters only using FIPC: fit.5 <- est_mg( x = x, data = data, group.name = group.name, D = 1, free.group = free.group, use.gprior = TRUE, gprior = list(dist = "beta", params = c(5, 16)), EmpHist = TRUE, Etol = 0.001, MaxE = 500, fipc = TRUE, fipc.method = "MEM", fix.loc = fix.loc ) # summary of the estimation summary(fit.5) # extract the group parameter estimates (i.e., scale parameters) getirt(fit.5, what = "group.par") ## ------------------------------------------------------------------------------ # 4. MG-calibration with the FIPC using simMG data # (fix the unique items of the group 1 only) # - Details : # (a) fix item parameters of unique items in the group 1 only # (b) constrain the common items between the groups to have # the same item parameters (i.e., items C1I1 - C1I12 between # Groups 1 and 2, and items C2I1 - C2I10 between Groups 2 and 3) # (c) freely estimate the means and variances of ability # distributions of all three groups ## ------------------------------------------------------------------------------ # 4-(1). freely estimates the means and variances of all three groups # set all three groups as the free groups in which scale # of the ability distributions will be freely estimated. free.group <- group.name # or use 'free.group <- 1:3' # specify the item IDs of the unique items in the group 1 to be fixed # in each item metadata using the 'fix.id' argument or # you can use the 'fix.loc' argument by setting # 'fix.loc = list(11:48, NULL, NULL)' fix.id <- paste0("G1I", 1:38) # estimate the IRT parameters using FIPC: fit.6 <- est_mg( x = x, data = data, group.name = group.name, D = 1, free.group = free.group, use.gprior = TRUE, gprior = list(dist = "beta", params = c(5, 16)), EmpHist = TRUE, Etol = 0.001, MaxE = 500, fipc = TRUE, fipc.method = "MEM", fix.loc = NULL, fix.id = fix.id ) # summary of the estimation summary(fit.6) # extract the group parameter estimates (i.e., scale parameters) getirt(fit.6, what = "group.par")
This function estimates examinees' latent ability parameters. Available scoring methods are maximum likelihood estimation (ML), maximum likelihood estimation with fences (MLF; Han, 2016), weighted likelihood estimation (Warm, 1989), maximum a posteriori estimation (MAP; Hambleton et al., 1991), expected a posteriori estimation (EAP; Bock & Mislevy, 1982), EAP summed scoring (Thissen et al., 1995; Thissen & Orlando, 2001), and inverse test characteristic curve (TCC) scoring (e.g., Kolen & Brennan, 2004; Kolen & Tong, 2010; Stocking, 1996).
est_score(x, ...) ## Default S3 method: est_score( x, data, D = 1, method = "ML", range = c(-5, 5), norm.prior = c(0, 1), nquad = 41, weights = NULL, fence.a = 3, fence.b = NULL, tol = 1e-04, max.iter = 100, se = TRUE, stval.opt = 1, intpol = TRUE, range.tcc = c(-7, 7), missing = NA, ncore = 1, ... ) ## S3 method for class 'est_irt' est_score( x, method = "ML", range = c(-5, 5), norm.prior = c(0, 1), nquad = 41, weights = NULL, fence.a = 3, fence.b = NULL, tol = 1e-04, max.iter = 100, se = TRUE, stval.opt = 1, intpol = TRUE, range.tcc = c(-7, 7), missing = NA, ncore = 1, ... )
est_score(x, ...) ## Default S3 method: est_score( x, data, D = 1, method = "ML", range = c(-5, 5), norm.prior = c(0, 1), nquad = 41, weights = NULL, fence.a = 3, fence.b = NULL, tol = 1e-04, max.iter = 100, se = TRUE, stval.opt = 1, intpol = TRUE, range.tcc = c(-7, 7), missing = NA, ncore = 1, ... ) ## S3 method for class 'est_irt' est_score( x, method = "ML", range = c(-5, 5), norm.prior = c(0, 1), nquad = 41, weights = NULL, fence.a = 3, fence.b = NULL, tol = 1e-04, max.iter = 100, se = TRUE, stval.opt = 1, intpol = TRUE, range.tcc = c(-7, 7), missing = NA, ncore = 1, ... )
x |
A data frame containing the item metadata (e.g., item parameters, number of categories, models ...) or an object of
class |
... |
additional arguments to pass to |
data |
A matrix or vector containing examinees' response data for the items in the argument |
D |
A scaling factor in IRT models to make the logistic function as close as possible to the normal ogive function (if set to 1.7). Default is 1. |
method |
A character string indicating a scoring method. Available methods are "ML" for the maximum likelihood estimation, "MLF" for the maximum likelihood estimation with fences, "WL" for the weighted likelihood estimation, "MAP" for the maximum a posteriori estimation, "EAP" for the expected a posteriori estimation, "EAP.SUM" for the expected a posteriori summed scoring, and "INV.TCC" for the inverse TCC scoring. Default method is "ML". |
range |
A numeric vector of two components to restrict the range of ability scale for the ML, MLF, WL, and MAP scoring methods. Default is c(-5, 5). |
norm.prior |
A numeric vector of two components specifying a mean and standard deviation of the normal prior distribution.
These two parameters are used to obtain the gaussian quadrature points and the corresponding weights from the normal distribution. Default is
c(0,1). Ignored if |
nquad |
An integer value specifying the number of gaussian quadrature points from the normal prior distribution. Default is 41.
Ignored if |
weights |
A two-column matrix or data frame containing the quadrature points (in the first column) and the corresponding weights
(in the second column) of the latent variable prior distribution. The weights and quadrature points can be easily obtained
using the function |
fence.a |
A numeric value specifying the item slope parameter (i.e., a-parameter) for the two imaginary items in MLF. See below for details. Default is 3.0. |
fence.b |
A numeric vector of two components specifying the lower and upper fences of item difficulty parameters (i.e., b-parameters)
for the two imaginary items, respectively, in MLF. When |
tol |
A numeric value of the convergent tolerance for the ML, MLF, WL, MAP, and inverse TCC scoring methods. For the ML, MLF, WL, and MAP, Newton Raphson method is implemented for optimization. For the inverse TCC scoring, the bisection method is used. Default is 1e-4. |
max.iter |
An positive integer value specifying the maximum number of iterations of Newton Raphson method. Default is 100. |
se |
A logical value. If TRUE, the standard errors of ability estimates are computed. However, if |
stval.opt |
An positive integer value specifying the starting value option for the ML, MLF, WL, and MAP scoring methods. Available options are 1 for the brute-force method, 2 for the observed sum score-based method, and 3 for setting to 0. Default is 1. See below for details. |
intpol |
A logical value. If TRUE and |
range.tcc |
A numeric vector of two components to be used as the lower and upper bounds of ability estimates when |
missing |
A value indicating missing values in the response data set. Default is NA. See below for details. |
ncore |
The number of logical CPU cores to use. Default is 1. See below for details. |
For MAP scoring method, only the normal prior distribution is available for the population distribution.
When there are missing data in the response data set, the missing value must be specified in missing
. The missing data are taken into account
when either of ML, MLF, WL, MAP, and EAP is used. When "EAP.SUM" or "INV.TCC" is used, however, any missing responses are replaced with incorrect
responses (i.e., 0s).
In the maximum likelihood estimation with fences (MLF; Han, 2016), two 2PLM imaginary items are necessary. The first imaginary item serves as the lower
fence and its difficulty parameter (i.e., b-parameters) should be lower than any difficulty parameter values in the test form. Likewise, the second
imaginary item serves as the upper fence and its difficulty parameter should be greater than any difficulty parameter values in the test form. Also, the two
imaginary items should have a very high item slope parameter (i.e., a-parameter) value. See Han (2016) for more details. When fence.b = NULL
in MLF,
the function automatically sets the lower and upper fences of item difficulty parameters using the values
in the range
argument.
When "INV.TCC" method is used employing the IRT 3-parameter logistic model (3PLM) in a test, ability estimates for the observed sum scores less than the
sum of item guessing parameters are not attainable. In this case, linear interpolation can be applied by setting intpol = TRUE
.
Let and
be the minimum and maximum ability estimates and
be the ability estimate for
the smallest observed sum score, X, but greater than or equal to the sum of item guessing parameters. When linear interpolation method is used,
the first value of the
range.tcc
is set to . Then, a linear line is constructed between two points of
(x=
, y=0) and (x=
, y=X). Also, the first value of the
range.tcc
is set to , which is
the ability estimates corresponding to the maximum observed sum score. When it comes to the scoring method of "INV.TCC", the standard errors of ability
estimates are computed using an approach suggested by Lim, Davey, and Wells (2020). The code for the inverse TCC scoring was written by modifying
the function
irt.eq.tse
of the SNSequate R package (González, 2014).
In terms of the starting value to be used for ML, MLF, WL, and MAP scoring methods, the brute-force method is used when stval.opt = 1
. With this option,
the log-likelihood values were evaluated at the discrete theta values with increments of 0.1 given range
. The theta node that has the largest
log-likelihood is used as the starting value. when stval.opt = 2
, the starting value is obtained based on the observed sum score. For example,
if the maximum observed sum score (max.score) is 30 for a test and an examinee has an observed sum score of 20 (obs.score), then the starting value
is "log(obs.score / (max.score - obs.score))". For all incorrect response, the starting value is "log(1 / max.score)" and for all correct responses,
it is "log(max.score / 1)".
To speed up the ability estimation for ML, MLF, WL, MAP, and EAP methods, this function applies a parallel process using multiple logical CPU cores.
You can set the number of logical CPU cores by specifying a positive integer value in the argument ncore
. Default value is 1.
Note that the standard errors of ability estimates are computed using the Fisher expected information for ML, MLF, WL, and MAP methods.
To implement WL method, the Pi
, Ji
, and Ii
functions of catR
(Magis & Barrada, 2017) were referred.
When method
is either of "ML", "MLF", "WL", "MAP", or "EAP", a two column data frame including the ability estimates (1st column)
and the standard errors of ability estimates (2nd column) is returned. When method
is "EAP.SUM" or "INV.TCC", a list of two internal
objects are returned. The first object is a three column data frame including the observed sum scores (1st column), the ability estimates (2nd column),
and the standard errors of ability estimates (3rd column). The second object is a score table including the possible raw sum scores
and corresponding ability and standard error estimates.
est_score(default)
: Default method to estimate examinees' latent ability parameters using a data frame x
containing the item metadata.
est_score(est_irt)
: An object created by the function est_irt
.
Hwanggyu Lim [email protected]
Bock, R. D., & Mislevy, R. J. (1982). Adaptive EAP estimation of ability in a microcomputer environment. Psychometrika, 35, 179-198.
González, J. (2014). SNSequate: Standard and nonstandard statistical models and methods for test equating. Journal of Statistical Software, 59, 1-30.
Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991).Fundamentals of item response theory. Newbury Park, CA: Sage.
Han, K. T. (2016). Maximum likelihood score estimation method with fences for short-length tests and computerized adaptive tests. Applied psychological measurement, 40(4), 289-301.
Howard, J. P. (2017). Computational methods for numerical analysis with R. New York: Chapman and Hall/CRC.
Kolen, M. J. & Brennan, R. L. (2004). Test Equating, Scaling, and Linking (2nd ed.). New York: Springer
Kolen, M. J. & Tong, Y. (2010). Psychometric properties of IRT proficiency estimates. Educational Measurement: Issues and Practice, 29(3), 8-14.
Lim, H., Davey, T., & Wells, C. S. (2020). A recursion-based analytical approach to evaluate the performance of MST. Journal of Educational Measurement, 58(2), 154-178.
Magis, D., & Barrada, J. R. (2017). Computerized adaptive testing with R: Recent updates of the package catR. Journal of Statistical Software, 76, 1-19.
Stocking, M. L. (1996). An alternative method for scoring adaptive tests. Journal of Educational and Behavioral Statistics, 21(4), 365-389.
Thissen, D. & Orlando, M. (2001). Item response theory for items scored in two categories. In D. Thissen & H. Wainer (Eds.), Test scoring (pp.73-140). Mahwah, NJ: Lawrence Erlbaum.
Thissen, D., Pommerich, M., Billeaud, K., & Williams, V. S. (1995). Item Response Theory for Scores on Tests Including Polytomous Items with Ordered Responses. Applied Psychological Measurement, 19(1), 39-49.
Warm, T. A. (1989). Weighted likelihood estimation of ability in item response theory. Psychometrika, 54(3), 427-450.
irtfit
, info
, simdat
, shape_df
, gen.weight
## the use of a "-prm.txt" file obtained from a flexMIRT flex_prm <- system.file("extdata", "flexmirt_sample-prm.txt", package = "irtQ") # read item parameters and transform them to item metadata x <- bring.flexmirt(file = flex_prm, "par")$Group1$full_df # generate examinees abilities set.seed(12) theta <- rnorm(10) # simulate the item response data data <- simdat(x, theta, D = 1) # estimate the abilities using ML est_score(x, data, D = 1, method = "ML", range = c(-4, 4), se = TRUE) # estimate the abilities using WL est_score(x, data, D = 1, method = "WL", range = c(-4, 4), se = TRUE) # estimate the abilities using MLF with default fences of item difficulty parameters est_score(x, data, D = 1, method = "MLF", fence.a = 3.0, fence.b = NULL, se = TRUE) # estimate the abilities using MLF with different fences of item difficulty parameters est_score(x, data, D = 1, method = "MLF", fence.a = 3.0, fence.b = c(-7, 7), se = TRUE) # estimate the abilities using MAP est_score(x, data, D = 1, method = "MAP", norm.prior = c(0, 1), nquad = 30, se = TRUE) # estimate the abilities using EAP est_score(x, data, D = 1, method = "EAP", norm.prior = c(0, 1), nquad = 30, se = TRUE) # estimate the abilities using EAP summed scoring est_score(x, data, D = 1, method = "EAP.SUM", norm.prior = c(0, 1), nquad = 30) # estimate the abilities using inverse TCC scoring est_score(x, data, D = 1, method = "INV.TCC", intpol = TRUE, range.tcc = c(-7, 7))
## the use of a "-prm.txt" file obtained from a flexMIRT flex_prm <- system.file("extdata", "flexmirt_sample-prm.txt", package = "irtQ") # read item parameters and transform them to item metadata x <- bring.flexmirt(file = flex_prm, "par")$Group1$full_df # generate examinees abilities set.seed(12) theta <- rnorm(10) # simulate the item response data data <- simdat(x, theta, D = 1) # estimate the abilities using ML est_score(x, data, D = 1, method = "ML", range = c(-4, 4), se = TRUE) # estimate the abilities using WL est_score(x, data, D = 1, method = "WL", range = c(-4, 4), se = TRUE) # estimate the abilities using MLF with default fences of item difficulty parameters est_score(x, data, D = 1, method = "MLF", fence.a = 3.0, fence.b = NULL, se = TRUE) # estimate the abilities using MLF with different fences of item difficulty parameters est_score(x, data, D = 1, method = "MLF", fence.a = 3.0, fence.b = c(-7, 7), se = TRUE) # estimate the abilities using MAP est_score(x, data, D = 1, method = "MAP", norm.prior = c(0, 1), nquad = 30, se = TRUE) # estimate the abilities using EAP est_score(x, data, D = 1, method = "EAP", norm.prior = c(0, 1), nquad = 30, se = TRUE) # estimate the abilities using EAP summed scoring est_score(x, data, D = 1, method = "EAP.SUM", norm.prior = c(0, 1), nquad = 30) # estimate the abilities using inverse TCC scoring est_score(x, data, D = 1, method = "INV.TCC", intpol = TRUE, range.tcc = c(-7, 7))
This function generates a set of weights based on a set of theta values to be used in the functions est_score
and sx2_fit
.
gen.weight(n = 41, dist = "norm", mu = 0, sigma = 1, l = -4, u = 4, theta)
gen.weight(n = 41, dist = "norm", mu = 0, sigma = 1, l = -4, u = 4, theta)
n |
An integer identifying the number of theta (or node) values for which weights are generated. Default is 41. |
dist |
A character string specifying a probability distribution from which the weights are generated. Available distributions are
"norm" for a normal distribution, "unif" for a uniform distribution, and "emp" for an empirical distribution.
When |
mu , sigma
|
A mean and standard deviation of a normal distribution. |
l , u
|
Lower and upper limits of a uniform distribution. |
theta |
A vector of empirical theta (or node) values for which weights are generated. |
When the argument theta
is missing, n weights can be generated from either the normal distribution or the uniform distribution.
Note that if dist = "norm"
, gaussian quadrature points and weights from the normal distribution are generated. See
gauss.quad.prob()
in the statmod package for more details.
When the argument theta
is not missing, the weights corresponding to the provided theta values are generated. Specifically, if
dist = "norm"
, normalized weights from the normal distribution are returned. If dist = "emp"
, every specified theta value has the equal
values of normalized weights.
This function returns a data frame with two columns, where the first column has theta values (nodes) and the second column provides weights.
Hwanggyu Lim [email protected]
## example 1 ## generate 41 gaussian quadrature points and weights of normal distribution gen.weight(n = 41, dist = "norm", mu = 0, sigma = 1) ## example 2 ## generate 41 theta values and weights from the uniform normal distribution, ## given the mininum value of -4 and the maximum value of 4 gen.weight(n = 41, dist = "unif", l = -4, u = 4) ## example 3 ## generate the normalized weights from the standardized normal distribution, ## given a set of theta values theta <- seq(-4, 4, by = 0.1) gen.weight(dist = "norm", mu = 0, sigma = 1, theta = theta) ## example 4 ## generate the same values of normalized weights for the theta values that are ## randomly sampled from the standardized normal distribution theta <- rnorm(100) gen.weight(dist = "emp", theta = theta)
## example 1 ## generate 41 gaussian quadrature points and weights of normal distribution gen.weight(n = 41, dist = "norm", mu = 0, sigma = 1) ## example 2 ## generate 41 theta values and weights from the uniform normal distribution, ## given the mininum value of -4 and the maximum value of 4 gen.weight(n = 41, dist = "unif", l = -4, u = 4) ## example 3 ## generate the normalized weights from the standardized normal distribution, ## given a set of theta values theta <- seq(-4, 4, by = 0.1) gen.weight(dist = "norm", mu = 0, sigma = 1, theta = theta) ## example 4 ## generate the same values of normalized weights for the theta values that are ## randomly sampled from the standardized normal distribution theta <- rnorm(100) gen.weight(dist = "emp", theta = theta)
This method extracts various internal objects from an object of class est_irt
,
est_mg
, or est_item
.
getirt(x, ...) ## S3 method for class 'est_irt' getirt(x, what, ...) ## S3 method for class 'est_mg' getirt(x, what, ...) ## S3 method for class 'est_item' getirt(x, what, ...)
getirt(x, ...) ## S3 method for class 'est_irt' getirt(x, what, ...) ## S3 method for class 'est_mg' getirt(x, what, ...) ## S3 method for class 'est_item' getirt(x, what, ...)
x |
|
... |
Further arguments passed to or from other methods. |
what |
A character string indicating what to extract. |
Objects which can be extracted from the object of class est_irt
include:
A data frame containing both the item parameter estimates and the corresponding standard errors of estimates.
A data frame containing the item parameter estimates.
A data frame containing the standard errors of the item parameter estimates. Note that the standard errors are estimated using the cross-production approximation method (Meilijson, 1989).
A data frame containing the position number of each item parameter being estimated. The position information is useful when interpreting the variance-covariance matrix of item parameter estimates.
A matrix of variance-covariance matrix of item parameter estimates.
A sum of the log-likelihood values of the observed data set (marginal log-likelihood) across all items in the data set.
A model fit statistic of Akaike information criterion based on the loglikelihood.
A model fit statistic of Bayesian information criterion based on the loglikelihood.
A data frame containing the mean, variance, and standard deviation of latent variable prior distribution.
A two-column data frame containing the quadrature points (in the first column) and the corresponding weights (in the second column) of the (updated) latent variable prior distribution.
A matrix of normalized posterior densities for all the response patterns at each of the quadrature points. The row and column indicate each individual's response pattern and the quadrature point, respectively.
A data frame of the examinees' response data set.
A scaling factor in IRT models.
A total number of response patterns.
A total number of items included in the response data.
A convergence criteria for E steps of the EM algorithm.
The maximum number of E steps in the EM algorithm.
A list containing the information of the prior distribution for item slope parameters.
A list containing the information of the prior distribution for item difficulty (or threshold) parameters.
A list containing the information of the prior distribution for item guessing parameters.
A total number of the estimated parameters.
The number of EM cycles completed.
A maximum item parameter change when the EM cycles were completed.
Time (in seconds) spent for the EM cycles.
Time (in seconds) spent for computing the standard errors of the item parameter estimates.
Time (in seconds) spent for total compuatation.
Status of the first-order test to report if the gradients has vanished sufficiently for the solution to be stable.
Status of the second-order test to report if the information matrix is positive definite, which is a prerequisite for the solution to be a possible maximum.
A note to report if the variance-covariance matrix of item parameter estimates is obtainable from the information matrix.
A logical value to indicate if FIPC was used.
A method used for the FIPC.
A vector of integer values specifying the locations of the fixed items when the FIPC was implemented.
Objects which can be extracted from the object of class est_mg
include:
A list containing two internal objects (i.e., overall and group) of the item parameter estimates and the corresponding standard errors of estimates. The first internal object (overall) is a data frame of the item parameter and standard error estimates for the combined data set across all groups. Accordingly, the data frame includes unique items across all groups. The second internal object (group) is a list of group specific data frames containing item parameter and standard error estimates
A list containing two internal objects (i.e., overall and group) of the item parameter estimates. The format of the list is the same with the internal object of 'estimates'
A list containing two internal objects (i.e., overall and group) of the standard errors of item parameter estimates. The format of the list is the same with the internal object of 'estimates'. Note that the standard errors are estimated using the cross-production approximation method (Meilijson, 1989).
A data frame containing the position number of each item parameter being estimated. This item position data frame was created based on the combined data sets across all groups (see the first internal object of 'estimates'). The position information is useful when interpreting the variance-covariance matrix of item parameter estimates.
A matrix of variance-covariance matrix of item parameter estimates. This matrix was created based on the combined data sets across all groups (see the first internal object of 'estimates')
A list containing two internal objects (i.e., overall and group) of the log-likelihood values of observed data set (marginal log-likelihood). The format of the list is the same with the internal object of 'estimates'. Specifically, the first internal object (overall) contains a sum of the log-likelihood values of the observed data set across all unique items of all groups. The second internal object (group) shows the group specific log-likelihood values.
A model fit statistic of Akaike information criterion based on the loglikelihood of all unique items..
A model fit statistic of Bayesian information criterion based on the loglikelihood of all unique items.
A list containing the summary statistics (i.e., a mean, variance, and standard deviation) of latent variable prior distributions across all groups.
a list of the two-column data frames containing the quadrature points (in the first column) and the corresponding weights (in the second column) of the (updated) latent variable prior distributions for all groups.
A matrix of normalized posterior densities for all the response patterns at each of the quadrature points. The row and column indicate each individual's response pattern and the quadrature point, respectively.
A list containing two internal objects (i.e., overall and group) of the examinees' response data sets. The format of the list is the same with the internal object of 'estimates'.
A scaling factor in IRT models.
A list containing two internal objects (i.e., overall and group) with the total number of response patterns. The format of the list is the same with the internal object of 'estimates'.
A list containing two internal objects (i.e., overall and group) with the total number of items included in the response data set. The format of the list is the same with the internal object of 'estimates'.
A convergence criteria for E steps of the EM algorithm.
The maximum number of E steps in the EM algorithm.
A list containing the information of the prior distribution for item slope parameters.
A list containing the information of the prior distribution for item guessing parameters.
A total number of the estimated parameters across all unique items.
The number of EM cycles completed.
A maximum item parameter change when the EM cycles were completed.
Time (in seconds) spent for the EM cycles.
Time (in seconds) spent for computing the standard errors of the item parameter estimates.
Time (in seconds) spent for total compuatation.
Status of the first-order test to report if the gradients has vanished sufficiently for the solution to be stable.
Status of the second-order test to report if the information matrix is positive definite, which is a prerequisite for the solution to be a possible maximum.
A note to report if the variance-covariance matrix of item parameter estimates is obtainable from the information matrix.
A logical value to indicate if FIPC was used.
A method used for the FIPC.
A list containing two internal objects (i.e., overall and group) with the locations of the fixed items when the FIPC was implemented. The format of the list is the same with the internal object of 'estimates'.
Objects which can be extracted from the object of class est_item
include:
A data frame containing both the item parameter estimates and the corresponding standard errors of estimates.
A data frame containing the item parameter estimates.
A data frame containing the standard errors of the item parameter estimates. Note that the standard errors are estimated using observed information functions.
A data frame containing the position number of each item parameter being estimated. The position information is useful when interpreting the variance-covariance matrix of item parameter estimates.
A matrix of variance-covariance matrix of item parameter estimates.
A sum of the log-likelihood values of the complete data set across all estimated items.
A data frame of the examinees' response data set.
A vector of the examinees' ability values used as the fixed effects.
A scaling factor in IRT models.
A string indicating the convergence status of the item parameter estimation.
A total number of items included in the response data.
The items which have no item response data. Those items are excluded from the item parameter estimation.
A total number of the estimated parameters.
An integer vector indicating the number of item responses for each item used to estimate the item parameters.
Time (in seconds) spent for total compuatation.
See est_irt
, est_mg
, and est_item
for more details.
The internal objects extracted from an object of class est_irt
,
est_mg
, or est_item
.
getirt(est_irt)
: An object created by the function est_irt
.
getirt(est_mg)
: An object created by the function est_mg
.
getirt(est_item)
: An object created by the function est_item
.
Hwanggyu Lim [email protected]
# fit the 2PL model to LSAT6 data mod.2pl <- est_irt(data = LSAT6, D = 1, model = "2PLM", cats = 2) # extract the item parameter estimates (est.par <- getirt(mod.2pl, what = "par.est")) # extract the standard error estimates (est.se <- getirt(mod.2pl, what = "se.est")) # extract the variance-covariance matrix of item parameter estimates (cov.mat <- getirt(mod.2pl, what = "covariance"))
# fit the 2PL model to LSAT6 data mod.2pl <- est_irt(data = LSAT6, D = 1, model = "2PLM", cats = 2) # extract the item parameter estimates (est.par <- getirt(mod.2pl, what = "par.est")) # extract the standard error estimates (est.se <- getirt(mod.2pl, what = "se.est")) # extract the variance-covariance matrix of item parameter estimates (cov.mat <- getirt(mod.2pl, what = "covariance"))
This function computes three GRDIF statistics, ,
,
and
, for analyzing differential item functioning (DIF) among multiple groups
(Lim, Zhu, Choe, & Han, 2023). They are specialized to capture uniform DIF, nonuniform DIF, and
mixed DIF, respectively.
grdif(x, ...) ## Default S3 method: grdif( x, data, score = NULL, group, focal.name, D = 1, alpha = 0.05, missing = NA, purify = FALSE, purify.by = c("grdifrs", "grdifr", "grdifs"), max.iter = 10, min.resp = NULL, post.hoc = TRUE, method = "ML", range = c(-4, 4), norm.prior = c(0, 1), nquad = 41, weights = NULL, ncore = 1, verbose = TRUE, ... ) ## S3 method for class 'est_irt' grdif( x, score = NULL, group, focal.name, alpha = 0.05, missing = NA, purify = FALSE, purify.by = c("grdifrs", "grdifr", "grdifs"), max.iter = 10, min.resp = NULL, post.hoc = TRUE, method = "ML", range = c(-4, 4), norm.prior = c(0, 1), nquad = 41, weights = NULL, ncore = 1, verbose = TRUE, ... ) ## S3 method for class 'est_item' grdif( x, group, focal.name, alpha = 0.05, missing = NA, purify = FALSE, purify.by = c("grdifrs", "grdifr", "grdifs"), max.iter = 10, min.resp = NULL, post.hoc = TRUE, method = "ML", range = c(-4, 4), norm.prior = c(0, 1), nquad = 41, weights = NULL, ncore = 1, verbose = TRUE, ... )
grdif(x, ...) ## Default S3 method: grdif( x, data, score = NULL, group, focal.name, D = 1, alpha = 0.05, missing = NA, purify = FALSE, purify.by = c("grdifrs", "grdifr", "grdifs"), max.iter = 10, min.resp = NULL, post.hoc = TRUE, method = "ML", range = c(-4, 4), norm.prior = c(0, 1), nquad = 41, weights = NULL, ncore = 1, verbose = TRUE, ... ) ## S3 method for class 'est_irt' grdif( x, score = NULL, group, focal.name, alpha = 0.05, missing = NA, purify = FALSE, purify.by = c("grdifrs", "grdifr", "grdifs"), max.iter = 10, min.resp = NULL, post.hoc = TRUE, method = "ML", range = c(-4, 4), norm.prior = c(0, 1), nquad = 41, weights = NULL, ncore = 1, verbose = TRUE, ... ) ## S3 method for class 'est_item' grdif( x, group, focal.name, alpha = 0.05, missing = NA, purify = FALSE, purify.by = c("grdifrs", "grdifr", "grdifs"), max.iter = 10, min.resp = NULL, post.hoc = TRUE, method = "ML", range = c(-4, 4), norm.prior = c(0, 1), nquad = 41, weights = NULL, ncore = 1, verbose = TRUE, ... )
x |
A data frame containing item metadata (e.g., item parameters, number of categories, models, etc.),
an object of class |
... |
Additional arguments that will be forwarded to the |
data |
A matrix containing examinees' response data for items in |
score |
A vector of examinees' ability estimates. If abilities are not provided, |
group |
A numeric or character vector indicating group membership of examinees. The length of the vector should be the same as the number of rows in the response data matrix. |
focal.name |
A character or numeric vector representing levels associated with focal groups.
For instance, consider |
D |
A scaling factor in IRT models to make the logistic function as close as possible to the normal ogive function (if set to 1.7). Default is 1. |
alpha |
A numeric value to specify the significance |
missing |
A value indicating missing values in the response data set. Default is NA. |
purify |
A logical value indicating whether a purification process will be implemented or not. Default is FALSE. |
purify.by |
A character string specifying a GRDIF statistic with which the purification
is implemented. Available statistics are "grdifrs" for |
max.iter |
A positive integer value specifying the maximum number of iterations for the purification process. Default is 10. |
min.resp |
A positive integer value specifying the minimum number of item responses for an examinee required to compute the ability estimate. Default is NULL. See details below for more information. |
post.hoc |
A logical value indicating whether to conduct a post-hoc RDIF analysis for all possible combinations of paired groups for statistically flagged items. The default is TRUE. See below for more details. |
method |
A character string indicating a scoring method. Available methods are "ML" for maximum likelihood estimation, "WL" for the weighted likelihood estimation, "MAP" for maximum a posteriori estimation, and "EAP" for expected a posteriori estimation. The default method is "ML". |
range |
A numeric vector with two components to restrict the ability scale range for ML, WL, EAP, and MAP scoring methods. The default is c(-5, 5). |
norm.prior |
A numeric vector with two components specifying the mean and standard deviation of
the normal prior distribution. These parameters are used to obtain Gaussian quadrature points
and their corresponding weights from the normal distribution. The default is c(0,1). Ignored if |
nquad |
An integer value specifying the number of Gaussian quadrature points from the normal
prior distribution. The default is 41. Ignored if |
weights |
A two-column matrix or data frame containing the quadrature points (in the first column)
and their corresponding weights (in the second column) for the latent variable prior distribution.
The weights and quadrature points can be obtained using the |
ncore |
The number of logical CPU cores to use. The default is 1. See |
verbose |
A logical value. If TRUE, progress messages for the purification procedure are suppressed. The default is TRUE. |
The GRDIF framework (Lim et al., 2023) is a generalized version of the RDIF detection framework,
designed to assess DIF for multiple groups. The GRDIF framework comprises three statistics: ,
,
and
, which focus on detecting uniform, nonuniform, and mixed DIF, respectively.
Under the null hypothesis that a test contains no DIF items,
,
, and
asymptotically follow the
distributions with G-1, G-1, and 2(G-1) degrees of freedom, respectively,
where G represents the total number of groups being compared. For more information on the GRDIF framework, see Lim et al. (2023).
The grdif
function calculates all three GRDIF statistics: ,
, and
. The current
version of the
grdif
function supports both dichotomous and polytomous item response data. To compute these statistics, the grdif
function requires (1) item parameter estimates obtained from aggregate data, regardless of group membership, (2) examinees' ability estimates
(e.g., MLE), and (3) examinees' item response data. Note that the ability estimates must be computed using the aggregate data-based
item parameter estimates. The item parameter estimates should be provided in the x
argument, the ability estimates in the
score
argument, and the response data in the data
argument. When abilities are not given in the score
argument
(i.e., score = NULL
), the grdif
function estimates examinees' abilities automatically using the scoring method
specified in the method
argument (e.g., method = "ML"
).
The group
argument accepts a vector with numeric or character values, indicating the group membership of examinees.
The vector may include multiple distinct values, where one value represents the reference group and the others represent the focal groups.
The length of the vector should be the same as the number of rows in the response data, with each value indicating the group membership
of each examinee. After specifying the group
, a numeric or character vector should be provided in the focal.name
argument
to define which group values in the group
argument represent the focal groups. The reference group will be the group not included
in the focal.name
vector.
Similar to the original RDIF framework for two-groups comparison, the GRDIF framework can implement an iterative purification process.
When purify = TRUE
, the purification process is executed based on one of the GRDIF statistics specified in the purify.by
argument (e.g., purify.by="grdifrs"
). During each iterative purification, examinees' latent abilities are calculated using purified
items and the scoring method specified in the method
argument. The iterative purification process stops when no additional DIF items
are identified or when the process reaches a predetermined limit of iterations, which can be set in the max.iter
argument.
For more information about the purification procedure, refer to Lim et al. (2022).
Scoring with a limited number of items can result in large standard errors, which may impact the effectiveness of DIF detection within
the GRDIF framework. The min.resp
argument can be employed to avoid using scores with significant standard errors when calculating
the GRDIF statistics, particularly during the purification process. For instance, if min.resp
is not NULL (e.g., min.resp=5
),
item responses from examinees whose total item responses fall below the specified minimum number are treated as missing values (i.e., NA).
Consequently, their ability estimates become missing values and are not utilized in computing the GRDIF statistics. If min.resp=NULL
,
an examinee's score will be computed as long as there is at least one item response for the examinee.
The post.hoc
argument allows you to perform a post-hoc RDIF analysis for all possible combinations of paired groups
for items flagged as statistically significant. For example, consider four groups of examinees: A, B, C, and D. If post.hoc = TRUE
,
the grdif
function will perform a post-hoc RDIF analysis for all possible pairs of groups
(A-B, A-C, A-D, B-C, B-D, and C-D) for each flagged item. This helps to identify which specific pairs of groups
have DIF for each item, providing a more detailed understanding of the DIF patterns in the data. Note that when purification is implemented
(i.e., purify = TRUE
), the post-hoc RDIF analysis is conducted for each flagged item during each single iteration of
the purification process.
This function returns a list of four internal objects. The four objects are:
no_purify |
A list of several sub-objects containing the results of DIF analysis without a purification procedure. The sub-objects are:
|
purify |
A logical value indicating whether the purification process was used. |
with_purify |
A list of several sub-objects containing the results of DIF analysis with a purification procedure. The sub-objects are:
|
alpha |
A significance |
grdif(default)
: Default method to computes three GRDIF statistics with multiple group data
using a data frame x
containing the item metadata.
grdif(est_irt)
: An object created by the function est_irt
.
grdif(est_item)
: An object created by the function est_item
.
Hwanggyu Lim [email protected]
Lim, H., & Choe, E. M. (2023). Detecting differential item functioning in CAT using IRT residual DIF approach. Journal of Educational Measurement. doi:10.1111/jedm.12366.
Lim, H., Choe, E. M., & Han, K. T. (2022). A residual-based differential item functioning detection framework in item response theory. Journal of Educational Measurement, 59(1), 80-104. doi:10.1111/jedm.12313.
Lim, H., Zhu, D., Choe, E. M., & Han, K. T. (2023, April). Detecting differential item functioning among multiple groups using IRT residual DIF framework. Paper presented at the Annual Meeting of the National Council on Measurement in Education. Chicago, IL.
rdif
est_item
, info
, simdat
, shape_df
,
gen.weight
, est_score
# load library library("dplyr") ## Uniform DIF detection for four groups (1R/3F) ######################################################## # (1) Manipulate uniform DIF for all three focal groups ######################################################## # Import the "-prm.txt" output file from flexMIRT flex_sam <- system.file("extdata", "flexmirt_sample-prm.txt", package = "irtQ") # Select 36 of 3PLM items which are non-DIF items par_nstd <- bring.flexmirt(file = flex_sam, "par")$Group1$full_df %>% dplyr::filter(.data$model == "3PLM") %>% dplyr::filter(dplyr::row_number() %in% 1:36) %>% dplyr::select(1:6) par_nstd$id <- paste0("nondif", 1:36) # Generate four new items where uniform DIF will be manipulated difpar_ref <- shape_df( par.drm = list(a = c(0.8, 1.5, 0.8, 1.5), b = c(0.0, 0.0, -0.5, -0.5), g = .15), item.id = paste0("dif", 1:4), cats = 2, model = "3PLM" ) # Manipulate uniform DIF on the four new items by adjusting the b-parameters # for the three focal groups difpar_foc1 <- difpar_ref %>% dplyr::mutate_at(.vars = "par.2", .funs = function(x) x + c(0.7, 0.7, 0, 0)) difpar_foc2 <- difpar_ref %>% dplyr::mutate_at(.vars = "par.2", .funs = function(x) x + c(0, 0, 0.7, 0.7)) difpar_foc3 <- difpar_ref %>% dplyr::mutate_at(.vars = "par.2", .funs = function(x) x + c(-0.4, -0.4, -0.5, -0.5)) # Combine the 4 DIF and 36 non-DIF item data for both reference and focal groups. # Thus, the first four items have uniform DIF for thee three focal groups par_ref <- rbind(difpar_ref, par_nstd) par_foc1 <- rbind(difpar_foc1, par_nstd) par_foc2 <- rbind(difpar_foc2, par_nstd) par_foc3 <- rbind(difpar_foc3, par_nstd) # Generate the true thetas from the different ability distributions set.seed(128) theta_ref <- rnorm(500, 0.0, 1.0) theta_foc1 <- rnorm(500, -1.0, 1.0) theta_foc2 <- rnorm(500, 1.0, 1.0) theta_foc3 <- rnorm(500, 0.5, 1.0) # Generate the response data resp_ref <- irtQ::simdat(par_ref, theta = theta_ref, D = 1) resp_foc1 <- irtQ::simdat(par_foc1, theta = theta_foc1, D = 1) resp_foc2 <- irtQ::simdat(par_foc2, theta = theta_foc2, D = 1) resp_foc3 <- irtQ::simdat(par_foc3, theta = theta_foc3, D = 1) data <- rbind(resp_ref, resp_foc1, resp_foc2, resp_foc3) ######################################################## # (2) Estimate the item and ability parameters # using the aggregate data ######################################################## # Estimate the item parameters est_mod <- irtQ::est_irt(data = data, D = 1, model = "3PLM") est_par <- est_mod$par.est # Estimate the ability parameters using MLE score <- irtQ::est_score(x = est_par, data = data, method = "ML")$est.theta ######################################################## # (3) Conduct DIF analysis ######################################################## # Create a vector of group membership indicators, # where 1, 2 and 3 indicate the three focal groups group <- c(rep(0, 500), rep(1, 500), rep(2, 500), rep(3, 500)) # (a) Compute GRDIF statistics without purification # and implement the post-hoc two-groups comparison analysis for # the flagged items dif_nopuri <- grdif( x = est_par, data = data, score = score, group = group, focal.name = c(1, 2, 3), D = 1, alpha = 0.05, purify = FALSE, post.hoc = TRUE ) print(dif_nopuri) # Print the post-hoc analysis results for the fagged items print(dif_nopuri$no_purify$post.hoc) # (b) Compute GRDIF statistics with purification # based on \eqn{GRDIF_{R}} and implement the post-hoc # two-groups comparison analysis for flagged items dif_puri_r <- grdif( x = est_par, data = data, score = score, group = group, focal.name = c(1, 2, 3), D = 1, alpha = 0.05, purify = TRUE, purify.by = "grdifr", post.hoc = TRUE ) print(dif_puri_r) # Print the post-hoc analysis results without purification print(dif_puri_r$no_purify$post.hoc) # Print the post-hoc analysis results with purification print(dif_puri_r$with_purify$post.hoc)
# load library library("dplyr") ## Uniform DIF detection for four groups (1R/3F) ######################################################## # (1) Manipulate uniform DIF for all three focal groups ######################################################## # Import the "-prm.txt" output file from flexMIRT flex_sam <- system.file("extdata", "flexmirt_sample-prm.txt", package = "irtQ") # Select 36 of 3PLM items which are non-DIF items par_nstd <- bring.flexmirt(file = flex_sam, "par")$Group1$full_df %>% dplyr::filter(.data$model == "3PLM") %>% dplyr::filter(dplyr::row_number() %in% 1:36) %>% dplyr::select(1:6) par_nstd$id <- paste0("nondif", 1:36) # Generate four new items where uniform DIF will be manipulated difpar_ref <- shape_df( par.drm = list(a = c(0.8, 1.5, 0.8, 1.5), b = c(0.0, 0.0, -0.5, -0.5), g = .15), item.id = paste0("dif", 1:4), cats = 2, model = "3PLM" ) # Manipulate uniform DIF on the four new items by adjusting the b-parameters # for the three focal groups difpar_foc1 <- difpar_ref %>% dplyr::mutate_at(.vars = "par.2", .funs = function(x) x + c(0.7, 0.7, 0, 0)) difpar_foc2 <- difpar_ref %>% dplyr::mutate_at(.vars = "par.2", .funs = function(x) x + c(0, 0, 0.7, 0.7)) difpar_foc3 <- difpar_ref %>% dplyr::mutate_at(.vars = "par.2", .funs = function(x) x + c(-0.4, -0.4, -0.5, -0.5)) # Combine the 4 DIF and 36 non-DIF item data for both reference and focal groups. # Thus, the first four items have uniform DIF for thee three focal groups par_ref <- rbind(difpar_ref, par_nstd) par_foc1 <- rbind(difpar_foc1, par_nstd) par_foc2 <- rbind(difpar_foc2, par_nstd) par_foc3 <- rbind(difpar_foc3, par_nstd) # Generate the true thetas from the different ability distributions set.seed(128) theta_ref <- rnorm(500, 0.0, 1.0) theta_foc1 <- rnorm(500, -1.0, 1.0) theta_foc2 <- rnorm(500, 1.0, 1.0) theta_foc3 <- rnorm(500, 0.5, 1.0) # Generate the response data resp_ref <- irtQ::simdat(par_ref, theta = theta_ref, D = 1) resp_foc1 <- irtQ::simdat(par_foc1, theta = theta_foc1, D = 1) resp_foc2 <- irtQ::simdat(par_foc2, theta = theta_foc2, D = 1) resp_foc3 <- irtQ::simdat(par_foc3, theta = theta_foc3, D = 1) data <- rbind(resp_ref, resp_foc1, resp_foc2, resp_foc3) ######################################################## # (2) Estimate the item and ability parameters # using the aggregate data ######################################################## # Estimate the item parameters est_mod <- irtQ::est_irt(data = data, D = 1, model = "3PLM") est_par <- est_mod$par.est # Estimate the ability parameters using MLE score <- irtQ::est_score(x = est_par, data = data, method = "ML")$est.theta ######################################################## # (3) Conduct DIF analysis ######################################################## # Create a vector of group membership indicators, # where 1, 2 and 3 indicate the three focal groups group <- c(rep(0, 500), rep(1, 500), rep(2, 500), rep(3, 500)) # (a) Compute GRDIF statistics without purification # and implement the post-hoc two-groups comparison analysis for # the flagged items dif_nopuri <- grdif( x = est_par, data = data, score = score, group = group, focal.name = c(1, 2, 3), D = 1, alpha = 0.05, purify = FALSE, post.hoc = TRUE ) print(dif_nopuri) # Print the post-hoc analysis results for the fagged items print(dif_nopuri$no_purify$post.hoc) # (b) Compute GRDIF statistics with purification # based on \eqn{GRDIF_{R}} and implement the post-hoc # two-groups comparison analysis for flagged items dif_puri_r <- grdif( x = est_par, data = data, score = score, group = group, focal.name = c(1, 2, 3), D = 1, alpha = 0.05, purify = TRUE, purify.by = "grdifr", post.hoc = TRUE ) print(dif_puri_r) # Print the post-hoc analysis results without purification print(dif_puri_r$no_purify$post.hoc) # Print the post-hoc analysis results with purification print(dif_puri_r$with_purify$post.hoc)
This function computes both item and test information functions (Hambleton et al., 1991) given a set of theta values.
info(x, ...) ## Default S3 method: info(x, theta, D = 1, tif = TRUE, ...) ## S3 method for class 'est_item' info(x, theta, tif = TRUE, ...) ## S3 method for class 'est_irt' info(x, theta, tif = TRUE, ...)
info(x, ...) ## Default S3 method: info(x, theta, D = 1, tif = TRUE, ...) ## S3 method for class 'est_item' info(x, theta, tif = TRUE, ...) ## S3 method for class 'est_irt' info(x, theta, tif = TRUE, ...)
x |
A data frame containing the item metadata (e.g., item parameters, number of categories, models ...), an object of class |
... |
Further arguments passed to or from other methods. |
theta |
A vector of theta values where item and test information values are computed. |
D |
A scaling factor in IRT models to make the logistic function as close as possible to the normal ogive function (if set to 1.7). Default is 1. |
tif |
A logical value. If TRUE, the test information function is computed. Default is TRUE. |
A specific form of a data frame should be used for the argument x
. The first column should have item IDs,
the second column should contain unique score category numbers of the items, and the third column should include IRT models being fit to the items.
The available IRT models are "1PLM", "2PLM", "3PLM", and "DRM" for dichotomous item data, and "GRM" and "GPCM" for polytomous item data.
Note that "DRM" covers all dichotomous IRT models (i.e, "1PLM", "2PLM", and "3PLM") and "GRM" and "GPCM" represent the graded
response model and (generalized) partial credit model, respectively. The next columns should include the item parameters of the fitted IRT models.
For dichotomous items, the fourth, fifth, and sixth columns represent the item discrimination (or slope), item difficulty, and
item guessing parameters, respectively. When "1PLM" and "2PLM" are specified in the third column, NAs should be inserted in the sixth column
for the item guessing parameters. For polytomous items, the item discrimination (or slope) parameters should be included in the
fourth column and the item difficulty (or threshold) parameters of category boundaries should be contained from the fifth to the last columns.
When the number of unique score categories differs between items, the empty cells of item parameters should be filled with NAs.
In the irtQ package, the item difficulty (or threshold) parameters of category boundaries for GPCM are expressed as
the item location (or overall difficulty) parameter subtracted by the threshold parameter for unique score categories of the item.
Note that when an GPCM item has K unique score categories, K-1 item difficulty parameters are necessary because
the item difficulty parameter for the first category boundary is always 0. For example, if an GPCM item has five score categories,
four item difficulty parameters should be specified. An example of a data frame with a single-format test is as follows:
ITEM1 | 2 | 1PLM | 1.000 | 1.461 | NA |
ITEM2 | 2 | 2PLM | 1.921 | -1.049 | NA |
ITEM3 | 2 | 3PLM | 1.736 | 1.501 | 0.203 |
ITEM4 | 2 | 3PLM | 0.835 | -1.049 | 0.182 |
ITEM5 | 2 | DRM | 0.926 | 0.394 | 0.099 |
And an example of a data frame for a mixed-format test is as follows:
ITEM1 | 2 | 1PLM | 1.000 | 1.461 | NA | NA | NA |
ITEM2 | 2 | 2PLM | 1.921 | -1.049 | NA | NA | NA |
ITEM3 | 2 | 3PLM | 0.926 | 0.394 | 0.099 | NA | NA |
ITEM4 | 2 | DRM | 1.052 | -0.407 | 0.201 | NA | NA |
ITEM5 | 4 | GRM | 1.913 | -1.869 | -1.238 | -0.714 | NA |
ITEM6 | 5 | GRM | 1.278 | -0.724 | -0.068 | 0.568 | 1.072 |
ITEM7 | 4 | GPCM | 1.137 | -0.374 | 0.215 | 0.848 | NA |
ITEM8 | 5 | GPCM | 1.233 | -2.078 | -1.347 | -0.705 | -0.116 |
See IRT Models
section in the page of irtQ-package
for more details about the IRT models used in the irtQ package.
An easier way to create a data frame for the argument x
is by using the function shape_df
.
This function returns an object of class info
. This object contains item and test information values
given the specified theta values.
info(default)
: Default method to compute item and test information functions for
a data frame x
containing the item metadata.
info(est_item)
: An object created by the function est_item
.
info(est_irt)
: An object created by the function est_irt
.
Hwanggyu Lim [email protected]
Hambleton, R. K., & Swaminathan, H. (1985) Item response theory: Principles and applications. Boston, MA: Kluwer.
Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991) Fundamentals of item response theory. Newbury Park, CA: Sage.
## example 1. ## using the function "shape_df" to create a data frame of test metadata # create a list containing the dichotomous item parameters par.drm <- list( a = c(1.1, 1.2, 0.9, 1.8, 1.4), b = c(0.1, -1.6, -0.2, 1.0, 1.2), g = rep(0.2, 5) ) # create a list containing the polytomous item parameters par.prm <- list( a = c(1.4, 0.6), d = list(c(-1.9, 0.0, 1.2), c(0.4, -1.1, 1.5, 0.2)) ) # create a numeric vector of score categories for the items cats <- c(2, 4, 2, 2, 5, 2, 2) # create a character vector of IRT models for the items model <- c("DRM", "GRM", "DRM", "DRM", "GPCM", "DRM", "DRM") # create an item metadata set test <- shape_df( par.drm = par.drm, par.prm = par.prm, cats = cats, model = model ) # create a data frame # set theta values theta <- seq(-2, 2, 0.1) # compute item and test information values given the theta values info(x = test, theta = theta, D = 1, tif = TRUE) ## example 2. ## using a "-prm.txt" file obtained from a flexMIRT # import the "-prm.txt" output file from flexMIRT flex_prm <- system.file("extdata", "flexmirt_sample-prm.txt", package = "irtQ" ) # read item parameters and transform them to item metadata test_flex <- bring.flexmirt(file = flex_prm, "par")$Group1$full_df # set theta values theta <- seq(-2, 2, 0.1) # compute item and test information values given the theta values info(x = test_flex, theta = theta, D = 1, tif = TRUE)
## example 1. ## using the function "shape_df" to create a data frame of test metadata # create a list containing the dichotomous item parameters par.drm <- list( a = c(1.1, 1.2, 0.9, 1.8, 1.4), b = c(0.1, -1.6, -0.2, 1.0, 1.2), g = rep(0.2, 5) ) # create a list containing the polytomous item parameters par.prm <- list( a = c(1.4, 0.6), d = list(c(-1.9, 0.0, 1.2), c(0.4, -1.1, 1.5, 0.2)) ) # create a numeric vector of score categories for the items cats <- c(2, 4, 2, 2, 5, 2, 2) # create a character vector of IRT models for the items model <- c("DRM", "GRM", "DRM", "DRM", "GPCM", "DRM", "DRM") # create an item metadata set test <- shape_df( par.drm = par.drm, par.prm = par.prm, cats = cats, model = model ) # create a data frame # set theta values theta <- seq(-2, 2, 0.1) # compute item and test information values given the theta values info(x = test, theta = theta, D = 1, tif = TRUE) ## example 2. ## using a "-prm.txt" file obtained from a flexMIRT # import the "-prm.txt" output file from flexMIRT flex_prm <- system.file("extdata", "flexmirt_sample-prm.txt", package = "irtQ" ) # read item parameters and transform them to item metadata test_flex <- bring.flexmirt(file = flex_prm, "par")$Group1$full_df # set theta values theta <- seq(-2, 2, 0.1) # compute item and test information values given the theta values info(x = test_flex, theta = theta, D = 1, tif = TRUE)
This function computes traditional IRT item fit statistics (i.e., fit statistic (e.g., Bock, 1960; Yen, 1981),
loglikelihood ratio
fit statistic (
; McKinley & Mills, 1985), and infit and outfit statistics (Ames et al., 2015)) and returns
contingency tables to compute the
and
fit statistics. Note that caution is needed in interpreting the infit and
outfit statistics for non-Rasch models. The saved object of this function, especially the object of contingency tables,
is used in the function of
plot.irtfit
to draw a raw and standardized residual plots (Hambleton et al., 1991).
irtfit(x, ...) ## Default S3 method: irtfit( x, score, data, group.method = c("equal.width", "equal.freq"), n.width = 10, loc.theta = "average", range.score = NULL, D = 1, alpha = 0.05, missing = NA, overSR = 2, min.collapse = 1, pcm.loc = NULL, ... ) ## S3 method for class 'est_item' irtfit( x, group.method = c("equal.width", "equal.freq"), n.width = 10, loc.theta = "average", range.score = NULL, alpha = 0.05, missing = NA, overSR = 2, min.collapse = 1, pcm.loc = NULL, ... ) ## S3 method for class 'est_irt' irtfit( x, score, group.method = c("equal.width", "equal.freq"), n.width = 10, loc.theta = "average", range.score = NULL, alpha = 0.05, missing = NA, overSR = 2, min.collapse = 1, pcm.loc = NULL, ... )
irtfit(x, ...) ## Default S3 method: irtfit( x, score, data, group.method = c("equal.width", "equal.freq"), n.width = 10, loc.theta = "average", range.score = NULL, D = 1, alpha = 0.05, missing = NA, overSR = 2, min.collapse = 1, pcm.loc = NULL, ... ) ## S3 method for class 'est_item' irtfit( x, group.method = c("equal.width", "equal.freq"), n.width = 10, loc.theta = "average", range.score = NULL, alpha = 0.05, missing = NA, overSR = 2, min.collapse = 1, pcm.loc = NULL, ... ) ## S3 method for class 'est_irt' irtfit( x, score, group.method = c("equal.width", "equal.freq"), n.width = 10, loc.theta = "average", range.score = NULL, alpha = 0.05, missing = NA, overSR = 2, min.collapse = 1, pcm.loc = NULL, ... )
x |
A data frame containing the item metadata (e.g., item parameters, number of categories, models ...), an object of class |
... |
Further arguments passed to or from other methods. |
score |
A vector of examinees' ability estimates. |
data |
A matrix containing examinees' response data for the items in the argument |
group.method |
A character string indicating how to group examinees along the ability scale for computing the |
n.width |
An integer value to specify the number of divided groups along the ability scale. Default is 10. See below for more detail. |
loc.theta |
A character string to indicate the location of ability point at each group (or interval) where the expected probabilities of score categories are calculated using the IRT models. Available locations are "average" for computing the expected probability at the average point of examinees' ability estimates in each group and "middle" for computing the expected probability at the midpoint of each group. Default is "average". |
range.score |
A vector of two numeric values to restrict the range of ability scale. All ability estimates less than
the first value are transformed to the first value. All ability estimates greater than the second value are transformed to the second value.
If NULL, the minimum and maximum values of ability estimates in the argument |
D |
A scaling factor in IRT models to make the logistic function as close as possible to the normal ogive function (if set to 1.7). Default is 1. |
alpha |
A numeric value to specify significance |
missing |
A value indicating missing values in the response data set. Default is NA. |
overSR |
A numeric value to specify a criterion to find ability groups (or intervals) which have standardized residuals greater than the specified value. Default is 2. |
min.collapse |
An integer value to indicate the minimum frequency of cells to be collapsed when computing the |
pcm.loc |
A vector of integer values indicating the locations of partial credit model (PCM) items whose slope parameters are fixed |
A specific form of a data frame should be used for the argument x
. The first column should have item IDs,
the second column should contain unique score category numbers of the items, and the third column should include IRT models being fit to the items.
The available IRT models are "1PLM", "2PLM", "3PLM", and "DRM" for dichotomous item data, and "GRM" and "GPCM" for polytomous item data.
Note that "DRM" covers all dichotomous IRT models (i.e, "1PLM", "2PLM", and "3PLM") and "GRM" and "GPCM" represent the graded
response model and (generalized) partial credit model, respectively. The next columns should include the item parameters of the fitted IRT models.
For dichotomous items, the fourth, fifth, and sixth columns represent the item discrimination (or slope), item difficulty, and
item guessing parameters, respectively. When "1PLM" and "2PLM" are specified in the third column, NAs should be inserted in the sixth column
for the item guessing parameters. For polytomous items, the item discrimination (or slope) parameters should be included in the
fourth column and the item difficulty (or threshold) parameters of category boundaries should be contained from the fifth to the last columns.
When the number of unique score categories differs between items, the empty cells of item parameters should be filled with NAs.
In the irtQ package, the item difficulty (or threshold) parameters of category boundaries for GPCM are expressed as
the item location (or overall difficulty) parameter subtracted by the threshold parameter for unique score categories of the item.
Note that when an GPCM item has K unique score categories, K-1 item difficulty parameters are necessary because
the item difficulty parameter for the first category boundary is always 0. For example, if an GPCM item has five score categories,
four item difficulty parameters should be specified. An example of a data frame with a single-format test is as follows:
ITEM1 | 2 | 1PLM | 1.000 | 1.461 | NA |
ITEM2 | 2 | 2PLM | 1.921 | -1.049 | NA |
ITEM3 | 2 | 3PLM | 1.736 | 1.501 | 0.203 |
ITEM4 | 2 | 3PLM | 0.835 | -1.049 | 0.182 |
ITEM5 | 2 | DRM | 0.926 | 0.394 | 0.099 |
And an example of a data frame for a mixed-format test is as follows:
ITEM1 | 2 | 1PLM | 1.000 | 1.461 | NA | NA | NA |
ITEM2 | 2 | 2PLM | 1.921 | -1.049 | NA | NA | NA |
ITEM3 | 2 | 3PLM | 0.926 | 0.394 | 0.099 | NA | NA |
ITEM4 | 2 | DRM | 1.052 | -0.407 | 0.201 | NA | NA |
ITEM5 | 4 | GRM | 1.913 | -1.869 | -1.238 | -0.714 | NA |
ITEM6 | 5 | GRM | 1.278 | -0.724 | -0.068 | 0.568 | 1.072 |
ITEM7 | 4 | GPCM | 1.137 | -0.374 | 0.215 | 0.848 | NA |
ITEM8 | 5 | GPCM | 1.233 | -2.078 | -1.347 | -0.705 | -0.116 |
See IRT Models
section in the page of irtQ-package
for more detail about the IRT models used in the irtQ package.
An easier way to create a data frame for the argument x
is by using the function shape_df
.
To calculate the and
fit statistics, two methods are used in the argument
group.method
to divide the ability scale
into several groups. If group.method = "equal.width"
, the examinees are grouped based on equal length of intervals.
If group.method = "equal.freq"
, the examinees are grouped so that all groups have equal frequencies. However, the grouping method
of "equal.freq" does guarantee that every group has the exactly same frequency of examinees. This is because the examinees are divided by
the same size of quantile.
When dividing the ability scale into intervals to compute the and
fit statistics, the intervals should be wide enough not to include
too small number of examinees. On the other hand, the interval should be narrow enough to include homogeneous examinees in terms of ability
(Hambleton et al, 1991). Thus, if you want to divide the ability scale into other than ten groups, you need to specify the number of groups
in the argument
n.width
. Yen (1981) fixed the number of groups to 10, whereas Bock (1960) allowed for any number of groups.
Regarding degrees of freedom (df), the is assumed to be distributed approximately as a chi-square with df equal to
the number of groups less the number of the IRT model parameters (Ames et al., 2015) whereas the
is assumed to be distributed approximately
as a chi-square with df equal to the number of groups (Ames et al., 2015; Muraki & Bock, 2003)
Note that if "DRM" is specified for an item in the item metadata set, the item is considered as "3PLM" to compute degrees of freedom of
the fit statistic.
This function returns an object of class irtfit
. Within this object, several internal objects are contained such as:
fit_stat |
A data frame containing the results of three IRT fit statistics (i.e., |
contingency.fitstat |
A list of contingency tables used to compute the |
contingency.plot |
A list of contingency tables used to draw a raw and standardized residual plots (Hambleton et al., 1991) in the function of
|
individual.info |
A list of data frames including individual residual and variance values. Those information are used to compute infit and outfit statistics. |
item_df |
The item metadata specified in the argument |
ancillary |
A list of ancillary information used in the item fit analysis. |
irtfit(default)
: Default method to compute the traditional IRT item fit statistics for a data frame x
containing the item metadata.
irtfit(est_item)
: An object created by the function est_item
.
irtfit(est_irt)
: An object created by the function est_irt
.
Hwanggyu Lim [email protected]
Ames, A. J., & Penfield, R. D. (2015). An NCME Instructional Module on Item-Fit Statistics for Item Response Theory Models. Educational Measurement: Issues and Practice, 34(3), 39-48.
Bock, R.D. (1960), Methods and applications of optimal scaling. Chapel Hill, NC: L.L. Thurstone Psychometric Laboratory.
Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991).Fundamentals of item response theory. Newbury Park, CA: Sage.
McKinley, R., & Mills, C. (1985). A comparison of several goodness-of-fit statistics. Applied Psychological Measurement, 9, 49-57.
Muraki, E. & Bock, R. D. (2003). PARSCALE 4: IRT item analysis and test scoring for rating scale data [Computer Program]. Chicago, IL: Scientific Software International. URL http://www.ssicentral.com
Wells, C. S., & Bolt, D. M. (2008). Investigation of a nonparametric procedure for assessing goodness-of-fit in item response theory. Applied Measurement in Education, 21(1), 22-40.
Yen, W. M. (1981). Using simulation results to choose a latent trait model. Applied Psychological Measurement, 5, 245-262.
plot.irtfit
, shape_df
, est_item
## example 1 ## use the simulated CAT data # find the location of items that have more than 10,000 responses over10000 <- which(colSums(simCAT_MX$res.dat, na.rm = TRUE) > 10000) # select the items that have more than 10,000 responses x <- simCAT_MX$item.prm[over10000, ] # select the response data for the items data <- simCAT_MX$res.dat[, over10000] # select the examinees' abilities score <- simCAT_MX$score # compute fit statistics fit1 <- irtfit( x = x, score = score, data = data, group.method = "equal.width", n.width = 10, loc.theta = "average", range.score = NULL, D = 1, alpha = 0.05, missing = NA, overSR = 2 ) # fit statistics fit1$fit_stat # contingency tables fit1$contingency.fitstat ## example 2 ## import the "-prm.txt" output file from flexMIRT flex_sam <- system.file("extdata", "flexmirt_sample-prm.txt", package = "irtQ") # select the first two dichotomous items and last polytomous item x <- bring.flexmirt(file = flex_sam, "par")$Group1$full_df[c(1:2, 55), ] # generate examinees' abilities from N(0, 1) set.seed(10) score <- rnorm(1000, mean = 0, sd = 1) # simulate the response data data <- simdat(x = x, theta = score, D = 1) # compute fit statistics fit2 <- irtfit( x = x, score = score, data = data, group.method = "equal.freq", n.width = 11, loc.theta = "average", range.score = c(-4, 4), D = 1, alpha = 0.05 ) # fit statistics fit2$fit_stat # contingency tables fit2$contingency.fitstat # residual plots for the first item (dichotomous item) plot(x = fit2, item.loc = 1, type = "both", ci.method = "wald", show.table = TRUE, ylim.sr.adjust = TRUE) # residual plots for the third item (polytomous item) plot(x = fit2, item.loc = 3, type = "both", ci.method = "wald", show.table = FALSE, ylim.sr.adjust = TRUE)
## example 1 ## use the simulated CAT data # find the location of items that have more than 10,000 responses over10000 <- which(colSums(simCAT_MX$res.dat, na.rm = TRUE) > 10000) # select the items that have more than 10,000 responses x <- simCAT_MX$item.prm[over10000, ] # select the response data for the items data <- simCAT_MX$res.dat[, over10000] # select the examinees' abilities score <- simCAT_MX$score # compute fit statistics fit1 <- irtfit( x = x, score = score, data = data, group.method = "equal.width", n.width = 10, loc.theta = "average", range.score = NULL, D = 1, alpha = 0.05, missing = NA, overSR = 2 ) # fit statistics fit1$fit_stat # contingency tables fit1$contingency.fitstat ## example 2 ## import the "-prm.txt" output file from flexMIRT flex_sam <- system.file("extdata", "flexmirt_sample-prm.txt", package = "irtQ") # select the first two dichotomous items and last polytomous item x <- bring.flexmirt(file = flex_sam, "par")$Group1$full_df[c(1:2, 55), ] # generate examinees' abilities from N(0, 1) set.seed(10) score <- rnorm(1000, mean = 0, sd = 1) # simulate the response data data <- simdat(x = x, theta = score, D = 1) # compute fit statistics fit2 <- irtfit( x = x, score = score, data = data, group.method = "equal.freq", n.width = 11, loc.theta = "average", range.score = c(-4, 4), D = 1, alpha = 0.05 ) # fit statistics fit2$fit_stat # contingency tables fit2$contingency.fitstat # residual plots for the first item (dichotomous item) plot(x = fit2, item.loc = 1, type = "both", ci.method = "wald", show.table = TRUE, ylim.sr.adjust = TRUE) # residual plots for the third item (polytomous item) plot(x = fit2, item.loc = 3, type = "both", ci.method = "wald", show.table = FALSE, ylim.sr.adjust = TRUE)
This function computes the loglikelihood of ability parameters given the item parameters and response data.
llike_score( x, data, theta, D = 1, method = "ML", norm.prior = c(0, 1), fence.a = 3, fence.b = NULL, missing = NA )
llike_score( x, data, theta, D = 1, method = "ML", norm.prior = c(0, 1), fence.a = 3, fence.b = NULL, missing = NA )
x |
A data frame containing the item metadata (e.g., item parameters, number
of score categories, models). This can be created easily using the |
data |
A matrix representing examinees' response data for the items in |
theta |
A numeric vector of ability parameters for which the loglikelihood values will be computed. |
D |
A scaling factor in IRT models that adjusts the logistic function to approximate the normal ogive function (set to 1.7). The default is 1. |
method |
A character string specifying the scoring method. Options include "ML" for maximum likelihood estimation, "MLF" for maximum likelihood estimation with fences, and "MAP" for maximum a posteriori estimation. The default method is "MLE". |
norm.prior |
A numeric vector of two elements indicating the mean and standard
deviation of the normal prior distribution. These parameters are used to obtain
the Gaussian quadrature points and corresponding weights from the normal distribution.
Default is c(0,1). This parameter is ignored if |
fence.a |
A numeric value defining the item slope parameter (a-parameter) for the two imaginary items in the MLF method. Default is 3.0. |
fence.b |
A numeric vector of two elements specifying the lower and upper fences
of item difficulty parameters (b-parameters) for the two imaginary items in the MLF method.
If |
missing |
A value used to denote missing values in the response data set. Default is NA. |
The function computes the loglikelihood value of the ability parameter given
the item parameters and response data for each item. As an example, to assess the loglikelihoods
of abilities for two examinees who have taken the same test items specified in x
, supply
their item response data matrix with two rows in data
and a vector of ability values
for which loglikelihood needs to be computed in theta
.
A data frame of loglikelihood values. Each row indicates the ability parameter for which the loglikelihood was computed, and each column represents a response pattern.
## Import the "-prm.txt" output file from flexMIRT flex_sam <- system.file("extdata", "flexmirt_sample-prm.txt", package = "irtQ") # Read item parameters and transform them into item metadata x <- bring.flexmirt(file=flex_sam, "par")$Group1$full_df # Generate examinees' abilities from N(0, 1) set.seed(10) score <- rnorm(5, mean=0, sd=1) # Simulate the response data data <- simdat(x=x, theta=score, D=1) # Specify the ability values for which the loglikelihood values will be computed theta <- seq(-3, 3, 0.5) # Compute the loglikelihood values (using the MLE method) llike_score(x=x, data=data, theta=theta, D=1, method="ML")
## Import the "-prm.txt" output file from flexMIRT flex_sam <- system.file("extdata", "flexmirt_sample-prm.txt", package = "irtQ") # Read item parameters and transform them into item metadata x <- bring.flexmirt(file=flex_sam, "par")$Group1$full_df # Generate examinees' abilities from N(0, 1) set.seed(10) score <- rnorm(5, mean=0, sd=1) # Simulate the response data data <- simdat(x=x, theta=score, D=1) # Specify the ability values for which the loglikelihood values will be computed theta <- seq(-3, 3, 0.5) # Compute the loglikelihood values (using the MLE method) llike_score(x=x, data=data, theta=theta, D=1, method="ML")
Well-known LSAT6 dichotomous response data set from Thissen (1982).
LSAT6
LSAT6
This data contains 1,000 dichotomous response patterns of five items obtained from the Law School Admissions Test, section 6.
Hwanggyu Lim [email protected]
Thissen, D. (1982). Marginal maximum likelihood estimation for the one-parameter logistic model. Psychometrika, 47, 175-186.
This function computes the conditional distributions of number-correct (or observed) scores given probabilities of category responses to items or given a set of theta values using Lord and Wingersky recursion formula (1984).
lwrc(x = NULL, theta, prob = NULL, cats, D = 1)
lwrc(x = NULL, theta, prob = NULL, cats, D = 1)
x |
A data frame containing the item metadata (e.g., item parameters, number of categories, models ...).
See |
theta |
A vector of theta values where the conditional distribution of observed scores are computed.
The theta values are only required when a data frame is specified in the argument |
prob |
A matrix containing the probability of answering each category of an item. Each row indicates an item and
each column represents each category of the item. When the number of categories differs between items, the empty cells
should be filled with zeros or NA values. If |
cats |
A numeric vector specifying the number of categories for each item. For example, a dichotomous
item has two categories. This information is only required when a probability matrix is specified in the argument
|
D |
A scaling factor in IRT models to make the logistic function as close as possible to the normal ogive function (if set to 1.7). Default is 1. |
The Lord and Wingersky recursive algorithm is an efficient way of calculating the compound probabilities of any number-correct scores on a test based on IRT models. This algorithm is particularly useful when computing the IRT model-based observed score distribution for a test.
To compute the conditional distributions of observed scores, either the item metadata set specified in x
or
the probability matrix specified in prob
can be used.
When the prob
argument is provided, this function returns a vector of the probabilities of obtaining every
observed score on a test. When the x
argument is specified, the function returns a matrix of conditional probabilities
across all possible observed scores and theta values.
Hwanggyu Lim [email protected]
Kolen, M. J. & Brennan, R. L. (2004) Test Equating, Scaling, and Linking (2nd ed.). New York: Springer.
Lord, F. & Wingersky, M. (1984). Comparison of IRT true score and equipercentile observed score equatings. Applied Psychological Measurement, 8(4), 453-461.
## example 1: when a matrix of probabilities is used as a data set ## this is an example from Kolen and Brennan (2004, p. 183) # create a matrix of probabilities of getting correct and incorrect answers for three items probs <- matrix(c(.74, .73, .82, .26, .27, .18), nrow = 3, ncol = 2, byrow = FALSE) # create a vector of score categories for the three items cats <- c(2, 2, 2) # compute the conditional distributions of observed scores lwrc(prob = probs, cats = cats) ## example 2: when a matrix of probabilities is used as a data set ## with a mixed-format test # category probabilities for a dichotomous item p1 <- c(0.2, 0.8, 0, 0, 0) # category probabilities for a dichotomous item p2 <- c(0.4, 0.6, NA, NA, NA) # category probabilities for a polytomous item with five categories p3 <- c(0.1, 0.2, 0.2, 0.4, 0.1) # category probabilities for a polytomous item with three categories p4 <- c(0.5, 0.3, 0.2, NA, NA) # rbind the probability vectors p <- rbind(p1, p2, p3, p4) # create a vector of score categories for the four items cats <- c(2, 2, 5, 3) # compute the conditional distributions of observed scores lwrc(prob = p, cats = cats) ## example 3: when a data frame for the item metadata of ## a mixed-format test is used. # import the "-prm.txt" output file from flexMIRT flex_prm <- system.file("extdata", "flexmirt_sample-prm.txt", package = "irtQ") # read item parameters and transform them to item metadata x <- bring.flexmirt(file = flex_prm, "par")$Group1$full_df # compute the conditional distributions of observed scores lwrc(x = x, theta = seq(-4, 4, 0.2), D = 1)
## example 1: when a matrix of probabilities is used as a data set ## this is an example from Kolen and Brennan (2004, p. 183) # create a matrix of probabilities of getting correct and incorrect answers for three items probs <- matrix(c(.74, .73, .82, .26, .27, .18), nrow = 3, ncol = 2, byrow = FALSE) # create a vector of score categories for the three items cats <- c(2, 2, 2) # compute the conditional distributions of observed scores lwrc(prob = probs, cats = cats) ## example 2: when a matrix of probabilities is used as a data set ## with a mixed-format test # category probabilities for a dichotomous item p1 <- c(0.2, 0.8, 0, 0, 0) # category probabilities for a dichotomous item p2 <- c(0.4, 0.6, NA, NA, NA) # category probabilities for a polytomous item with five categories p3 <- c(0.1, 0.2, 0.2, 0.4, 0.1) # category probabilities for a polytomous item with three categories p4 <- c(0.5, 0.3, 0.2, NA, NA) # rbind the probability vectors p <- rbind(p1, p2, p3, p4) # create a vector of score categories for the four items cats <- c(2, 2, 5, 3) # compute the conditional distributions of observed scores lwrc(prob = p, cats = cats) ## example 3: when a data frame for the item metadata of ## a mixed-format test is used. # import the "-prm.txt" output file from flexMIRT flex_prm <- system.file("extdata", "flexmirt_sample-prm.txt", package = "irtQ") # read item parameters and transform them to item metadata x <- bring.flexmirt(file = flex_prm, "par")$Group1$full_df # compute the conditional distributions of observed scores lwrc(x = x, theta = seq(-4, 4, 0.2), D = 1)
This function calculates the Pseudo-count statistic
to evaluate item parameter drift, as described by Cappaert et al. (2018) and Stone (2000).
The Pseudo-count
statistic is designed to detect item parameter drift efficiently
without requiring item recalibration, making it especially valuable in computerized adaptive
testing (CAT) environments. This method compares observed and expected response frequencies
across quadrature points, which represent latent ability levels. The expected frequencies are
computed using the posterior distribution of each examinee's ability (Stone, 2000), providing
a robust and sensitive measure of item parameter drift, ensuring the stability and accuracy
of the test over time.
pcd2( x, data, D = 1, missing = NA, Quadrature = c(49, 6), weights = NULL, group.mean = 0, group.var = 1 )
pcd2( x, data, D = 1, missing = NA, Quadrature = c(49, 6), weights = NULL, group.mean = 0, group.var = 1 )
x |
A data frame containing the metadata for the item bank, which includes
important information for each item such as the number of score categories and the
IRT model applied. See |
data |
A matrix containing examinees' response data of the items in the
argument |
D |
A scaling factor in IRT models to make the logistic function as close as possible to the normal ogive function (if set to 1.7). Default is 1. |
missing |
A value indicating missing values in the response data set. Default is NA. |
Quadrature |
A numeric vector of two components specifying the number of quadrature points (in the first component) and the symmetric minimum and maximum values of these points (in the second component). For example, a vector of c(49, 6) indicates 49 rectangular quadrature points over -6 and 6. |
weights |
A two-column matrix or data frame containing the quadrature points
(in the first column) and the corresponding weights (in the second column) of the latent
variable prior distribution. If not NULL, the scale of the latent ability distribution will
be will be fixed to the scale of the provided quadrature points and weights.
The weights and quadrature points can be easily obtained using the function |
group.mean |
A numeric value to set the mean of latent variable prior distribution
when |
group.var |
A positive numeric value to set the variance of latent variable
prior distribution when |
The Pseudo-count values are calculated by summing the
weighted squared differences between the observed and expected frequencies
for each score category across all items. The expected frequencies are determined
using the posterior distribution of each examinee's ability (Stone, 2000).
The Pseudo-count statistic is calculated as:
where and
are the pseudo-counts for the incorrect and correct responses
at each ability level
,
is the expected proportion of correct responses at each ability level
,
calculated using item parameters from the item bank, and
is the total count of examinees
who received each item
The pcd2
function is designed to be flexible and allows for detailed control over
the computation process through its various arguments:
The metadata should include key information such as the number of score categories for each item, the IRT model applied (e.g., "1PLM", "2PLM", "3PLM"), and the item parameters (e.g., discrimination, difficulty, guessing). This data frame is crucial because it defines the structure of the items and the parameters used in the calculation of expected frequencies.
The response matrix should be preprocessed to ensure that missing values are correctly coded (using the 'missing' argument if needed). The matrix is the foundation for calculating both observed and expected frequencies for each item and score category. Ensure that the number of items in this matrix matches the number specified in the 'x' argument.
The quadrature points are used to approximate the latent ability distribution, which is essential for accurately estimating the posterior distribution of abilities. Adjust the number and range of quadrature points based on the precision required and the characteristics of the examinee population.
This argument allows you to provide a custom two-column matrix or data frame where the first column contains quadrature points and the second column contains the corresponding weights. This provides flexibility in defining the latent ability distribution. If 'weights' is set to 'NULL', the function generates a normal prior distribution based on the values provided in 'Quadrature', 'group.mean', and 'group.var'. Use this argument when you have a specific latent ability distribution you wish to apply, such as one derived from empirical data or an alternative theoretical distribution.
These numeric values define the mean and variance of the latent ability distribution when 'weights' is not provided. The default values are 'group.mean = 0' and 'group.var = 1', which assume a standard normal distribution. These values are used to generate quadrature points and weights internally if 'weights' is 'NULL'. Adjust 'group.mean' and 'group.var' to reflect the expected distribution of abilities in your examinee population, particularly if you suspect that the latent trait distribution deviates from normality.
When setting these arguments, consider the specific characteristics of your item
bank, the distribution of abilities in your examinee population, and the computational
precision required. Properly configured, the function provides robust and accurate
Pseudo-count statistics, enabling effective monitoring of item parameter
drift in CAT or other IRT-based testing environments.
A data frame containing the Pseudo-count statistic for each item
in the analysis. The data frame includes the following columns:
id |
The unique identifier for each item, corresponding to the item IDs provided in
the |
pcd2 |
The computed Pseudo-count |
N |
The number of examinees whose responses were used to calculate the Pseudo-count
|
Hwanggyu Lim [email protected]
Cappaert, K. J., Wen, Y., & Chang, Y. F. (2018). Evaluating CAT-adjusted approaches for suspected item parameter drift detection. Measurement: Interdisciplinary Research and Perspectives, 16(4), 226-238.
Stone, C. A. (2000). Monte Carlo based null distribution for an alternative goodness-of-fit test statistic in IRT models. Journal of educational measurement, 37(1), 58-75.
## Compute the pseudo-count D2 statistics for the dichotomous items ## import the "-prm.txt" output file from flexMIRT flex_sam <- system.file("extdata", "flexmirt_sample-prm.txt", package = "irtQ") # select the first 30 3PLM item metadata to be examined x <- bring.flexmirt(file = flex_sam, "par")$Group1$full_df[1:30, 1:6] # generate examinees' abilities from N(0, 1) set.seed(25) score <- rnorm(500, mean = 0, sd = 1) # simulate the response data data <- simdat(x = x, theta = score, D = 1) # compute the pseudo-count D2 statistics ps_d2 <- pcd2(x = x, data = data) print(ps_d2)
## Compute the pseudo-count D2 statistics for the dichotomous items ## import the "-prm.txt" output file from flexMIRT flex_sam <- system.file("extdata", "flexmirt_sample-prm.txt", package = "irtQ") # select the first 30 3PLM item metadata to be examined x <- bring.flexmirt(file = flex_sam, "par")$Group1$full_df[1:30, 1:6] # generate examinees' abilities from N(0, 1) set.seed(25) score <- rnorm(500, mean = 0, sd = 1) # simulate the response data data <- simdat(x = x, theta = score, D = 1) # compute the pseudo-count D2 statistics ps_d2 <- pcd2(x = x, data = data) print(ps_d2)
This method function plots item or test information function given a specified theta values. In addition, it displays the conditional standard errors at a test level.
## S3 method for class 'info' plot( x, item.loc = NULL, overlap = FALSE, csee = FALSE, xlab.text, ylab.text, main.text, lab.size = 15, main.size = 15, axis.size = 15, line.color, line.size = 1, layout.col = 4, strip.size = 12, ... )
## S3 method for class 'info' plot( x, item.loc = NULL, overlap = FALSE, csee = FALSE, xlab.text, ylab.text, main.text, lab.size = 15, main.size = 15, axis.size = 15, line.color, line.size = 1, layout.col = 4, strip.size = 12, ... )
x |
x An object of class |
item.loc |
A vector of numeric values indicating that the item information functions of the nth items (or the location of items in a test form) are plotted. If NULL, the test information function for the total test form is drawn. Default is NULL. |
overlap |
Logical value indicating whether multiple item information functions are plotted in one panel. If FALSE, multiple item information functions are displayed in multiple panels, one for each. |
csee |
Logical value indicating whether the function displays the conditional standard error of estimation (CSEE) at a test level. If FALSE, item/test information function is plotted. Note that the CSEE plot is displayed only at a test level. |
xlab.text , ylab.text
|
A title for the x and y axes. |
main.text |
An overall title for the plot. |
lab.size |
The size of xlab and ylab. Default is 15. |
main.size |
The size of |
axis.size |
The size of labels along the x and y axes. Default is 15. |
line.color |
A character string specifying a color for the line. See http://www.cookbook-r.com/Graphs/Colors_(ggplot2)/ for more details about colors used in ggplot2. |
line.size |
The size of lines. Default is 1. |
layout.col |
An integer value indicating the number of columns in the panel when displaying the item information functions of the multiple items. Default is 4. |
strip.size |
The size of facet labels when the item information functions of the multiple items are drawn. |
... |
Further arguments passed from the function |
All of the plots are drawn using the ggplot2 package.
The object of class info
can be obtained from the function info
.
This method function displays the item or test information function plot. When csee = TRUE
,
the conditional standard error is returned at the test level.
Hwanggyu Lim [email protected]
## the use of a "-prm.txt" file obtained from a flexMIRT # import the "-prm.txt" output file from flexMIRT flex_prm <- system.file("extdata", "flexmirt_sample-prm.txt", package = "irtQ") # read item parameters and transform them to item metadata test_flex <- bring.flexmirt(file = flex_prm, "par")$Group1$full_df # set theta values theta <- seq(-4, 4, 0.1) # compute item and test information values given the theta values x <- info(x = test_flex, theta = theta, D = 1, tif = TRUE) # draw a plot of the test information function plot(x) # draw a plot of the item information function for the second item plot(x, item.loc = 2) # draw a plot of multiple item information functions across the multiple panels plot(x, item.loc = 1:8, overlap = FALSE) # draw a plot of multiple item information functions in one panel plot(x, item.loc = 1:8, overlap = TRUE) # draw a plot of conditional standard error at a test level plot(x, csee = TRUE)
## the use of a "-prm.txt" file obtained from a flexMIRT # import the "-prm.txt" output file from flexMIRT flex_prm <- system.file("extdata", "flexmirt_sample-prm.txt", package = "irtQ") # read item parameters and transform them to item metadata test_flex <- bring.flexmirt(file = flex_prm, "par")$Group1$full_df # set theta values theta <- seq(-4, 4, 0.1) # compute item and test information values given the theta values x <- info(x = test_flex, theta = theta, D = 1, tif = TRUE) # draw a plot of the test information function plot(x) # draw a plot of the item information function for the second item plot(x, item.loc = 2) # draw a plot of multiple item information functions across the multiple panels plot(x, item.loc = 1:8, overlap = FALSE) # draw a plot of multiple item information functions in one panel plot(x, item.loc = 1:8, overlap = TRUE) # draw a plot of conditional standard error at a test level plot(x, csee = TRUE)
This method function provides graphical displays to look at residuals between the observed data and model-based predictions (Hambleton, Swaminathan, & Rogers, 1991). This function gives two residual plots for each score category of an item: (a) the raw residual plot and (b) the standardized residual plot. Note that for dichotomous items the residual plots are drawn only for the score category of 1.
## S3 method for class 'irtfit' plot( x, item.loc = NULL, type = "both", ci.method = c("wald", "wilson", "wilson.cr"), show.table = TRUE, layout.col = 2, xlab.text, ylab.text, main.text, lab.size = 15, main.size = 15, axis.size = 15, line.size = 1, point.size = 2.5, strip.size = 12, ylim.icc = c(0, 1), ylim.sr.adjust = FALSE, ylim.sr = c(-4, 4), ... )
## S3 method for class 'irtfit' plot( x, item.loc = NULL, type = "both", ci.method = c("wald", "wilson", "wilson.cr"), show.table = TRUE, layout.col = 2, xlab.text, ylab.text, main.text, lab.size = 15, main.size = 15, axis.size = 15, line.size = 1, point.size = 2.5, strip.size = 12, ylim.icc = c(0, 1), ylim.sr.adjust = FALSE, ylim.sr = c(-4, 4), ... )
x |
An object of class |
item.loc |
An integer value indicating that the nth item (or the location of the item) is plotted. See below for details. |
type |
A character string indicating what type of residual plot is returned. Available options are "icc" for the raw residual plot, "sr" for the standardized residual plot, and "both" for both of them. Default is "both". |
ci.method |
A character string indicating what method is used to estimate the confidence interval for the raw residual plot. Available options are "wald" for Wald method, "wilson" for Wilson score interval, and "wilson.cr" for Wilson score interval with continuity correction. Default is "wald". See below for details. |
show.table |
A logical value. If TRUE, a contingency table containing the information used to draw the residual
plots for the studied item is returned. This contingency table is the same as one contained in the internal object of |
layout.col |
An integer value indicating the number of columns in the panel when a polytomous item is used. Default is 2. |
xlab.text |
A title for the x axis. If missing, the default string is used. |
ylab.text |
A title for the y axis. If |
main.text |
An overall title for the plot. If |
lab.size |
The size of xlab and ylab. Default is 15. |
main.size |
The size of |
axis.size |
The size of labels along the x and y axes. Default is 15. |
line.size |
The size of lines. Default is 1. |
point.size |
The size of points. Default is 2.5. |
strip.size |
The size of facet labels. Default is 12. |
ylim.icc |
A vector of two numeric values specifying the range of y axis for the raw residual plot. Default is c(0, 1). |
ylim.sr.adjust |
A logical value. If TRUE, the range of y axis for the standardized residual plot is adjusted for each item.
If FALSE, the range of y axis for the standardized residual plot is fixed to the values specified in the argument |
ylim.sr |
A vector of two numeric values specifying the range of y axis for the standardized residual plot. Default is c(-4, 4). |
... |
Further arguments passed from the function |
All of the plots are drawn using the ggplot2 package.
Once the results of the IRT model fit analysis are obtained from the function irtfit
,
an object of class irtfit
can be used to draw the IRT raw residual and standardized residual plots. Especially, the information
contained in an internal object of contingency.plot
are mainly used to draw the residual plots.
Because the residual plots are drawn for an item at a time, you have to indicate which item will be evaluated. For this,
you should specify an integer value, which is the location of the studied item, in the argument item.loc
.
For example, if you want to draw the residual plots for the third item, then item.loc = 3
.
In terms of the raw residual plot, the argument ci.method
is used to select a method to estimate the confidence intervals
among four methods. Those methods are "wald" for the Wald interval, which is based on the normal approximation (Laplace, 1812),
"wilson" for Wilson score interval (Wilson, 1927), and "wilson.cr" for Wilson score interval with continuity correction (Newcombe, 1998).
See https://en.wikipedia.org/wiki/Binomial_proportion_confidence_interval for more details about
the binomial proportion confidence intervals. Note that the width of confidence interval is determined by the -level
specified in the argument
alpha
of the function irtfit
.
Regarding the standardized residual plot, any standardized residuals greater than the specified criterion value
in the argument overSR
of the function irtfit
are displayed with circles. Otherwise,
they are displayed with crosses.
This method function displays the IRT raw residual plot, the standard residual plot, or both of the studied item.
when show.table = TRUE
, a contingency table used to draw the residual plots is also returned. See irtfit
for more detail about the contingency table.
Hwanggyu Lim [email protected]
Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991).Fundamentals of item response theory. Newbury Park, CA: Sage.
Laplace, P. S. (1820).Theorie analytique des probabilites (in French). Courcier.
Newcombe, R. G. (1998). Two-sided confidence intervals for the single proportion: comparison of seven methods. Statistics in medicine, 17(8), 857-872.
Wilson, E. B. (1927). Probable inference, the law of succession, and statistical inference. Journal of the American Statistical Association, 22(158), 209-212.
## import the "-prm.txt" output file from flexMIRT flex_sam <- system.file("extdata", "flexmirt_sample-prm.txt", package = "irtQ") # select the first two dichotomous items and last polytomous item x <- bring.flexmirt(file = flex_sam, "par")$Group1$full_df[c(1:2, 55), ] # generate examinees' abilities from N(0, 1) set.seed(23) score <- rnorm(1000, mean = 0, sd = 1) # simulate the response data data <- simdat(x = x, theta = score, D = 1) # compute fit statistics fit <- irtfit( x = x, score = score, data = data, group.method = "equal.freq", n.width = 11, loc.theta = "average", range.score = c(-4, 4), D = 1, alpha = 0.05, overSR = 1.5 ) # residual plots for the first item (dichotomous item) plot(x = fit, item.loc = 1, type = "both", ci.method = "wald", show.table = TRUE, ylim.sr.adjust = TRUE) # residual plots for the third item (polytomous item) plot(x = fit, item.loc = 3, type = "both", ci.method = "wald", show.table = FALSE, ylim.sr.adjust = TRUE) # raw residual plot for the third item (polytomous item) plot(x = fit, item.loc = 3, type = "icc", ci.method = "wald", show.table = TRUE, ylim.sr.adjust = TRUE) # standardized residual plot for the third item (polytomous item) plot(x = fit, item.loc = 3, type = "sr", ci.method = "wald", show.table = TRUE, ylim.sr.adjust = TRUE)
## import the "-prm.txt" output file from flexMIRT flex_sam <- system.file("extdata", "flexmirt_sample-prm.txt", package = "irtQ") # select the first two dichotomous items and last polytomous item x <- bring.flexmirt(file = flex_sam, "par")$Group1$full_df[c(1:2, 55), ] # generate examinees' abilities from N(0, 1) set.seed(23) score <- rnorm(1000, mean = 0, sd = 1) # simulate the response data data <- simdat(x = x, theta = score, D = 1) # compute fit statistics fit <- irtfit( x = x, score = score, data = data, group.method = "equal.freq", n.width = 11, loc.theta = "average", range.score = c(-4, 4), D = 1, alpha = 0.05, overSR = 1.5 ) # residual plots for the first item (dichotomous item) plot(x = fit, item.loc = 1, type = "both", ci.method = "wald", show.table = TRUE, ylim.sr.adjust = TRUE) # residual plots for the third item (polytomous item) plot(x = fit, item.loc = 3, type = "both", ci.method = "wald", show.table = FALSE, ylim.sr.adjust = TRUE) # raw residual plot for the third item (polytomous item) plot(x = fit, item.loc = 3, type = "icc", ci.method = "wald", show.table = TRUE, ylim.sr.adjust = TRUE) # standardized residual plot for the third item (polytomous item) plot(x = fit, item.loc = 3, type = "sr", ci.method = "wald", show.table = TRUE, ylim.sr.adjust = TRUE)
This method function plots item or test characteristic curve using the ggplot2 package. The item characteristic (or category) curve (ICC) or item score curve is drawn for an individual item. The test characteristic curve (TCC) is drawn based on a total test form.
## S3 method for class 'traceline' plot( x, item.loc = NULL, score.curve = FALSE, overlap = FALSE, layout.col = 2, xlab.text, ylab.text, main.text, lab.size = 15, main.size = 15, axis.size = 15, line.color, line.size = 1, strip.size = 12, ... )
## S3 method for class 'traceline' plot( x, item.loc = NULL, score.curve = FALSE, overlap = FALSE, layout.col = 2, xlab.text, ylab.text, main.text, lab.size = 15, main.size = 15, axis.size = 15, line.color, line.size = 1, strip.size = 12, ... )
x |
An object of class |
item.loc |
A numeric value indicating that the nth item (or the location of item) is plotted. If NULL, the TCC based on a total test form is drawn. Default is NULL. |
score.curve |
Logical value. If TRUE, item score curve (i.e., a weighted sum of item category probabilities over the item scores) is plotted
in a panel. Otherwise, ICCs for all score categories are plotted in separate panels. For a dichotomous item, the item score curve is the same as
the ICC of score category 1. Ignored when |
overlap |
Logical value indicating whether multiple item score curves are plotted in one panel. If FALSE, the multiple item score curves are displayed with multiple panels, one for each. |
layout.col |
An integer value indicating the number of columns in the panel when displaying ICCs for an item or when displaying multiple item scores with multiple panels. |
xlab.text , ylab.text
|
A title for the x and y axes. |
main.text |
An overall title for the plot. |
lab.size |
The size of xlab and ylab. Default is 15. |
main.size |
The size of |
axis.size |
The size of labels along the x and y axes. Default is 15. |
line.color |
A character string specifying the color for a line. See http://www.cookbook-r.com/Graphs/Colors_(ggplot2)/ for more details about colors used in ggplot2. |
line.size |
The size of lines. Default is 1. |
strip.size |
The size of facet labels when ICCs for an item are plotted. |
... |
Further arguments passed from the function |
All of the plots are drawn using the ggplot2 package.
If item.loc = NULL
, the TCC based on the total test form is plotted. In the argument item.loc
,
a vector of positive integer values should be specified to indicate the nth items among the total test form. For example,
if there are ten items in the test form and the score curves of the 1st, 2nd, and 3rd items should be plotted, then item.loc = 1:3
.
This method function displays ICC or TCC plots of the studied item(s).
Hwanggyu Lim [email protected]
## example ## using a "-prm.txt" file obtained from a flexMIRT # import the "-prm.txt" output file from flexMIRT flex_prm <- system.file("extdata", "flexmirt_sample-prm.txt", package = "irtQ") # read item parameters and transform them to item metadata test_flex <- bring.flexmirt(file = flex_prm, "par")$Group1$full_df # set theta values theta <- seq(-3, 3, 0.1) # compute the item category probabilities and item/test # characteristic functions given the theta values x <- traceline(x = test_flex, theta, D = 1) # plot TCC based on the total test form plot(x, item.loc = NULL) # plot ICCs for the first item (dichotomous item) plot(x, item.loc = 1, score.curve = FALSE, layout.col = 2) # plot item score curve for the first item (dichotomous item) plot(x, item.loc = 1, score.curve = TRUE) # plot item score curves for the first six dichotomous items # with multiple panels plot(x, item.loc = 1:6, score.curve = TRUE, overlap = FALSE) # plot item score curve for the first six dichotomous items # in one panel plot(x, item.loc = 1:6, score.curve = TRUE, overlap = TRUE) # plot ICCs for the last item (polytomous item) plot(x, item.loc = 55, score.curve = FALSE, layout.col = 2) # plot item score curve for the last item (polytomous item) plot(x, item.loc = 55, score.curve = TRUE) # plot item score curves for the last three polytomous items # with multiple panels plot(x, item.loc = 53:55, score.curve = TRUE, overlap = FALSE) # plot item score curves for the last three polytomous items # in one panel plot(x, item.loc = 53:55, score.curve = TRUE, overlap = TRUE)
## example ## using a "-prm.txt" file obtained from a flexMIRT # import the "-prm.txt" output file from flexMIRT flex_prm <- system.file("extdata", "flexmirt_sample-prm.txt", package = "irtQ") # read item parameters and transform them to item metadata test_flex <- bring.flexmirt(file = flex_prm, "par")$Group1$full_df # set theta values theta <- seq(-3, 3, 0.1) # compute the item category probabilities and item/test # characteristic functions given the theta values x <- traceline(x = test_flex, theta, D = 1) # plot TCC based on the total test form plot(x, item.loc = NULL) # plot ICCs for the first item (dichotomous item) plot(x, item.loc = 1, score.curve = FALSE, layout.col = 2) # plot item score curve for the first item (dichotomous item) plot(x, item.loc = 1, score.curve = TRUE) # plot item score curves for the first six dichotomous items # with multiple panels plot(x, item.loc = 1:6, score.curve = TRUE, overlap = FALSE) # plot item score curve for the first six dichotomous items # in one panel plot(x, item.loc = 1:6, score.curve = TRUE, overlap = TRUE) # plot ICCs for the last item (polytomous item) plot(x, item.loc = 55, score.curve = FALSE, layout.col = 2) # plot item score curve for the last item (polytomous item) plot(x, item.loc = 55, score.curve = TRUE) # plot item score curves for the last three polytomous items # with multiple panels plot(x, item.loc = 53:55, score.curve = TRUE, overlap = FALSE) # plot item score curves for the last three polytomous items # in one panel plot(x, item.loc = 53:55, score.curve = TRUE, overlap = TRUE)
This function computes the probability of selecting a specific category for an item for a given set of theta values using the graded response model and (generalized) partial credit model.
prm(theta, a, d, D = 1, pr.model = c("GRM", "GPCM"))
prm(theta, a, d, D = 1, pr.model = c("GRM", "GPCM"))
theta |
A vector of ability values. |
a |
A numeric value of item discrimination (or slope) parameter. |
d |
A vector of item difficulty (or threshold) parameters. |
D |
A scaling factor in IRT models to make the logistic function as close as possible to the normal ogive function (if set to 1.7). Default is 1. |
pr.model |
A character string indicating the polytomous model being used. Available models are "GRM" for the the graded response model and "GPCM" for the (generalized) partial credit model. |
When the category probabilities are computed for an item with the partial credit model, provide a = 1
for that item.
When model = "GPCM"
, d
should include the item difficulty (or threshold) parameters. In the irtQ package,
the item difficulty (or threshold) parameters of category boundaries for GPCM are expressed as the item location (or overall difficulty)
parameter subtracted by the threshold parameter for unique score categories of the item. Note that when an GPCM item has K
unique score categories, K-1 item difficulty parameters are necessary because the item difficulty parameter for the first category
boundary is always 0. For example, if an GPCM item has five score categories, four item difficulty parameters should be specified.
For more details about the parameterization of the (generalized) partial credit model, See IRT Models
section
in the page of irtQ-package
.
This function returns a matrix where a row indicates the ability and a column represents score categories of the item.
Hwanggyu Lim [email protected]
## Category probabilities for an item with four categories ## using a generalized partial credit model prm(theta = c(-0.2, 0, 0.5), a = 1.4, d = c(-0.2, 0, 0.5), D = 1, pr.model = "GPCM") ## Category probabilities for an item with five categories ## using a graded response model prm(theta = c(-0.2, 0, 0.5), a = 1.2, d = c(-0.4, -0.2, 0.4, 1.5), D = 1, pr.model = "GRM")
## Category probabilities for an item with four categories ## using a generalized partial credit model prm(theta = c(-0.2, 0, 0.5), a = 1.4, d = c(-0.2, 0, 0.5), D = 1, pr.model = "GPCM") ## Category probabilities for an item with five categories ## using a graded response model prm(theta = c(-0.2, 0, 0.5), a = 1.2, d = c(-0.4, -0.2, 0.4, 1.5), D = 1, pr.model = "GRM")
This function computes three RDIF statistics (Lim & Choe, In press; Lim, Choe, & Han, 2022),
which are ,
, and
, for each item.
primarily
captures the typical contrast in raw residual pattern between two groups caused by uniform DIF whereas
primarily captures the typical contrast in squared residual pattern between two groups caused
by nonuniform DIF.
can reasonably capture both types of DIF.
rdif(x, ...) ## Default S3 method: rdif( x, data, score = NULL, group, focal.name, D = 1, alpha = 0.05, missing = NA, purify = FALSE, purify.by = c("rdifrs", "rdifr", "rdifs"), max.iter = 10, min.resp = NULL, method = "ML", range = c(-5, 5), norm.prior = c(0, 1), nquad = 41, weights = NULL, ncore = 1, verbose = TRUE, ... ) ## S3 method for class 'est_irt' rdif( x, score = NULL, group, focal.name, alpha = 0.05, missing = NA, purify = FALSE, purify.by = c("rdifrs", "rdifr", "rdifs"), max.iter = 10, min.resp = NULL, method = "ML", range = c(-5, 5), norm.prior = c(0, 1), nquad = 41, weights = NULL, ncore = 1, verbose = TRUE, ... ) ## S3 method for class 'est_item' rdif( x, group, focal.name, alpha = 0.05, missing = NA, purify = FALSE, purify.by = c("rdifrs", "rdifr", "rdifs"), max.iter = 10, min.resp = NULL, method = "ML", range = c(-5, 5), norm.prior = c(0, 1), nquad = 41, weights = NULL, ncore = 1, verbose = TRUE, ... )
rdif(x, ...) ## Default S3 method: rdif( x, data, score = NULL, group, focal.name, D = 1, alpha = 0.05, missing = NA, purify = FALSE, purify.by = c("rdifrs", "rdifr", "rdifs"), max.iter = 10, min.resp = NULL, method = "ML", range = c(-5, 5), norm.prior = c(0, 1), nquad = 41, weights = NULL, ncore = 1, verbose = TRUE, ... ) ## S3 method for class 'est_irt' rdif( x, score = NULL, group, focal.name, alpha = 0.05, missing = NA, purify = FALSE, purify.by = c("rdifrs", "rdifr", "rdifs"), max.iter = 10, min.resp = NULL, method = "ML", range = c(-5, 5), norm.prior = c(0, 1), nquad = 41, weights = NULL, ncore = 1, verbose = TRUE, ... ) ## S3 method for class 'est_item' rdif( x, group, focal.name, alpha = 0.05, missing = NA, purify = FALSE, purify.by = c("rdifrs", "rdifr", "rdifs"), max.iter = 10, min.resp = NULL, method = "ML", range = c(-5, 5), norm.prior = c(0, 1), nquad = 41, weights = NULL, ncore = 1, verbose = TRUE, ... )
x |
A data frame containing the item metadata (e.g., item parameters, number of categories, models ...), an object of class |
... |
Additional arguments that will be forwarded to the |
data |
A matrix containing examinees' response data for the items in the argument |
score |
A vector of examinees' ability estimates. If the abilities are not provided, |
group |
A numeric or character vector indicating group membership of examinees. The length of the vector should be the same with the number of rows in the response data matrix. |
focal.name |
A single numeric or character scalar representing the level associated with the focal group. For instance,
given |
D |
A scaling factor in IRT models to make the logistic function as close as possible to the normal ogive function (if set to 1.7). Default is 1. |
alpha |
A numeric value to specify significance |
missing |
A value indicating missing values in the response data set. Default is NA. |
purify |
A logical value indicating whether a purification process will be implemented or not. Default is FALSE. |
purify.by |
A character string specifying a RDIF statistic with which the purification is implemented. Available statistics
are "rdifrs" for |
max.iter |
A positive integer value specifying the maximum number of iterations for the purification process. Default is 10. |
min.resp |
A positive integer value specifying the minimum number of item responses for an examinee required to compute the ability estimate. Default is NULL. See details below for more information. |
method |
A character string indicating a scoring method. Available methods are "ML" for the maximum likelihood estimation, "WL" for the weighted likelihood estimation, "MAP" for the maximum a posteriori estimation, and "EAP" for the expected a posteriori estimation. Default method is "ML". |
range |
A numeric vector of two components to restrict the range of ability scale for the ML, WL, EAP, and MAP scoring methods. Default is c(-5, 5). |
norm.prior |
A numeric vector of two components specifying a mean and standard deviation of the normal prior distribution.
These two parameters are used to obtain the gaussian quadrature points and the corresponding weights from the normal distribution. Default is
c(0,1). Ignored if |
nquad |
An integer value specifying the number of gaussian quadrature points from the normal prior distribution. Default is 41.
Ignored if |
weights |
A two-column matrix or data frame containing the quadrature points (in the first column) and the corresponding weights
(in the second column) of the latent variable prior distribution. The weights and quadrature points can be easily obtained
using the function |
ncore |
The number of logical CPU cores to use. Default is 1. See |
verbose |
A logical value. If TRUE, the progress messages of purification procedure are suppressed. Default is TRUE. |
The RDIF framework (Lim et al., 2022) consists of three IRT residual-based statistics: ,
,
and
. Under the null hypothesis that a test contains no DIF items,
and
follow
normal distributions asymptotically.
is a based on a bivariate normal distribution of
and
statistics. Under the null hypothesis of no DIF items, it follows a
distribution asymptotically
with 2 degrees of freedom. See Lim et al. (2022) for more details about RDIF framework.
The rdif
function computes all three RDIF statistics of ,
, and
. The current
version of
rdif
function supports both dichotomous and polytomous item response data. To compute the three statistics, the rdif
function
requires (1) item parameter estimates obtained from aggregate data regardless of group membership, (2) examinees' ability estimates
(e.g., ML), and (3) examinees' item response data. Note that the ability estimates need to be computed using the aggregate data-based
item parameter estimates. The item parameter estimates should be provided in the x
argument, the ability estimates should
be provided in the score
argument, and the response data should be provided in the data
argument. When the abilities
are not given in the score
argument (i.e., score = NULL
), the rdif
function estimates examinees' abilities
automatically using the scoring method specified in the method
argument (e.g., method = "ML"
).
The group
argument accepts a vector of either two distinct numeric or character variables. Between two distinct variable, one is to
represent the reference group and another one is to represent the focal group. The length of the vector should be the same with the number
of rows in the response data and each value in the vector should indicate each examinee of the response data. Once the gruop
is
specified, a single numeric or character value needs to be provided in the focal.name
argument to define which group variable in
the group
argument represents the focal group.
As other DIF detection approaches, an iterative purification process can be implemented for the RDIF framework.
When purify = TRUE
, the purification process is implemented based on one of RDIF statistics specified in the purify.by
argument (e.g, purify.by="rdifrs"
). At each iterative purification, examinees' latent abilities are computed using purified items and
scoring method specified in the method
argument. The iterative purification process stops when no further DIF items are found or
the process reaches a predetermined limit of iteration, which can be specified in the max.iter
argument. See Lim et al. (2022)
for more details about the purification procedure.
Scoring with a limited number of items can result in large standard errors, which may impact the effectiveness of DIF detection within
the RDIF framework. The min.resp
argument can be employed to avoid using scores with significant standard errors when calculating
the RDIF statistics, particularly during the purification process. For instance, if min.resp
is not NULL (e.g., min.resp=5
),
item responses from examinees whose total item responses fall below the specified minimum number are treated as missing values (i.e., NA).
Consequently, their ability estimates become missing values and are not utilized in computing the RDIF statistics. If min.resp=NULL
,
an examinee's score will be computed as long as there is at least one item response for the examinee.
This function returns a list of four internal objects. The four objects are:
no_purify |
A list of several sub-objects containing the results of DIF analysis without a purification procedure. The sub-objects are:
|
purify |
A logical value indicating whether the purification process was used. |
with_purify |
A list of several sub-objects containing the results of DIF analysis with a purification procedure. The sub-objects are:
|
alpha |
A significance |
rdif(default)
: Default method to computes three RDIF statistics using a data frame x
containing the item metadata.
rdif(est_irt)
: An object created by the function est_irt
.
rdif(est_item)
: An object created by the function est_item
.
Hwanggyu Lim [email protected]
Lim, H., & Choe, E. M. (2023). Detecting differential item functioning in CAT using IRT residual DIF approach. Journal of Educational Measurement. doi:10.1111/jedm.12366.
Lim, H., Choe, E. M., & Han, K. T. (2022). A residual-based differential item functioning detection framework in item response theory. Journal of Educational Measurement, 59(1), 80-104. doi:10.1111/jedm.12313.
est_item
, info
, simdat
, shape_df
,
gen.weight
, est_score
# call library library("dplyr") ## Uniform DIF detection ############################################### # (1) manipulate true uniform DIF data ############################################### # import the "-prm.txt" output file from flexMIRT flex_sam <- system.file("extdata", "flexmirt_sample-prm.txt", package = "irtQ") # select 36 of 3PLM items which are non-DIF items par_nstd <- bring.flexmirt(file = flex_sam, "par")$Group1$full_df %>% dplyr::filter(.data$model == "3PLM") %>% dplyr::filter(dplyr::row_number() %in% 1:36) %>% dplyr::select(1:6) par_nstd$id <- paste0("nondif", 1:36) # generate four new items to inject uniform DIF difpar_ref <- shape_df( par.drm = list(a = c(0.8, 1.5, 0.8, 1.5), b = c(0.0, 0.0, -0.5, -0.5), g = 0.15), item.id = paste0("dif", 1:4), cats = 2, model = "3PLM" ) # manipulate uniform DIF on the four new items by adding constants to b-parameters # for the focal group difpar_foc <- difpar_ref %>% dplyr::mutate_at(.vars = "par.2", .funs = function(x) x + rep(0.7, 4)) # combine the 4 DIF and 36 non-DIF items for both reference and focal groups # thus, the first four items have uniform DIF par_ref <- rbind(difpar_ref, par_nstd) par_foc <- rbind(difpar_foc, par_nstd) # generate the true thetas set.seed(123) theta_ref <- rnorm(500, 0.0, 1.0) theta_foc <- rnorm(500, 0.0, 1.0) # generate the response data resp_ref <- simdat(par_ref, theta = theta_ref, D = 1) resp_foc <- simdat(par_foc, theta = theta_foc, D = 1) data <- rbind(resp_ref, resp_foc) ############################################### # (2) estimate the item and ability parameters # using the aggregate data ############################################### # estimate the item parameters est_mod <- est_irt(data = data, D = 1, model = "3PLM") est_par <- est_mod$par.est # estimate the ability parameters using ML score <- est_score(x = est_par, data = data, method = "ML")$est.theta ############################################### # (3) conduct DIF analysis ############################################### # create a vector of group membership indicators # where '1' indicates the focal group group <- c(rep(0, 500), rep(1, 500)) # (a)-1 compute RDIF statistics by providing scores, # and without a purification dif_nopuri_1 <- rdif( x = est_par, data = data, score = score, group = group, focal.name = 1, D = 1, alpha = 0.05 ) print(dif_nopuri_1) # (a)-2 compute RDIF statistics by not providing scores # and without a purification dif_nopuri_2 <- rdif( x = est_par, data = data, score = NULL, group = group, focal.name = 1, D = 1, alpha = 0.05, method = "ML" ) print(dif_nopuri_2) # (b)-1 compute RDIF statistics with a purification # based on RDIF(R) dif_puri_r <- rdif( x = est_par, data = data, score = score, group = group, focal.name = 1, D = 1, alpha = 0.05, purify = TRUE, purify.by = "rdifr" ) print(dif_puri_r) # (b)-2 compute RDIF statistics with a purification # based on RDIF(S) dif_puri_s <- rdif( x = est_par, data = data, score = score, group = group, focal.name = 1, D = 1, alpha = 0.05, purify = TRUE, purify.by = "rdifs" ) print(dif_puri_s) # (b)-3 compute RDIF statistics with a purification # based on RDIF(RS) dif_puri_rs <- rdif( x = est_par, data = data, score = score, group = group, focal.name = 1, D = 1, alpha = 0.05, purify = TRUE, purify.by = "rdifrs" ) print(dif_puri_rs)
# call library library("dplyr") ## Uniform DIF detection ############################################### # (1) manipulate true uniform DIF data ############################################### # import the "-prm.txt" output file from flexMIRT flex_sam <- system.file("extdata", "flexmirt_sample-prm.txt", package = "irtQ") # select 36 of 3PLM items which are non-DIF items par_nstd <- bring.flexmirt(file = flex_sam, "par")$Group1$full_df %>% dplyr::filter(.data$model == "3PLM") %>% dplyr::filter(dplyr::row_number() %in% 1:36) %>% dplyr::select(1:6) par_nstd$id <- paste0("nondif", 1:36) # generate four new items to inject uniform DIF difpar_ref <- shape_df( par.drm = list(a = c(0.8, 1.5, 0.8, 1.5), b = c(0.0, 0.0, -0.5, -0.5), g = 0.15), item.id = paste0("dif", 1:4), cats = 2, model = "3PLM" ) # manipulate uniform DIF on the four new items by adding constants to b-parameters # for the focal group difpar_foc <- difpar_ref %>% dplyr::mutate_at(.vars = "par.2", .funs = function(x) x + rep(0.7, 4)) # combine the 4 DIF and 36 non-DIF items for both reference and focal groups # thus, the first four items have uniform DIF par_ref <- rbind(difpar_ref, par_nstd) par_foc <- rbind(difpar_foc, par_nstd) # generate the true thetas set.seed(123) theta_ref <- rnorm(500, 0.0, 1.0) theta_foc <- rnorm(500, 0.0, 1.0) # generate the response data resp_ref <- simdat(par_ref, theta = theta_ref, D = 1) resp_foc <- simdat(par_foc, theta = theta_foc, D = 1) data <- rbind(resp_ref, resp_foc) ############################################### # (2) estimate the item and ability parameters # using the aggregate data ############################################### # estimate the item parameters est_mod <- est_irt(data = data, D = 1, model = "3PLM") est_par <- est_mod$par.est # estimate the ability parameters using ML score <- est_score(x = est_par, data = data, method = "ML")$est.theta ############################################### # (3) conduct DIF analysis ############################################### # create a vector of group membership indicators # where '1' indicates the focal group group <- c(rep(0, 500), rep(1, 500)) # (a)-1 compute RDIF statistics by providing scores, # and without a purification dif_nopuri_1 <- rdif( x = est_par, data = data, score = score, group = group, focal.name = 1, D = 1, alpha = 0.05 ) print(dif_nopuri_1) # (a)-2 compute RDIF statistics by not providing scores # and without a purification dif_nopuri_2 <- rdif( x = est_par, data = data, score = NULL, group = group, focal.name = 1, D = 1, alpha = 0.05, method = "ML" ) print(dif_nopuri_2) # (b)-1 compute RDIF statistics with a purification # based on RDIF(R) dif_puri_r <- rdif( x = est_par, data = data, score = score, group = group, focal.name = 1, D = 1, alpha = 0.05, purify = TRUE, purify.by = "rdifr" ) print(dif_puri_r) # (b)-2 compute RDIF statistics with a purification # based on RDIF(S) dif_puri_s <- rdif( x = est_par, data = data, score = score, group = group, focal.name = 1, D = 1, alpha = 0.05, purify = TRUE, purify.by = "rdifs" ) print(dif_puri_s) # (b)-3 compute RDIF statistics with a purification # based on RDIF(RS) dif_puri_rs <- rdif( x = est_par, data = data, score = score, group = group, focal.name = 1, D = 1, alpha = 0.05, purify = TRUE, purify.by = "rdifrs" ) print(dif_puri_rs)
This function evaluates the measurement precision and bias in Multistage-adaptive Test (MST) panels using a recursion-based evaluation method introduced by Lim et al. (2020). This function computes conditional biases and standard errors of measurement (CSEMs) across a range of IRT ability levels, facilitating efficient and accurate MST panel assessments without extensive simulations.
reval_mst( x, D = 1, route_map, module, cut_score, theta = seq(-5, 5, 1), intpol = TRUE, range.tcc = c(-7, 7), tol = 1e-04 )
reval_mst( x, D = 1, route_map, module, cut_score, theta = seq(-5, 5, 1), intpol = TRUE, range.tcc = c(-7, 7), tol = 1e-04 )
x |
A data frame containing the metadata for the item bank, which includes
important information for each item such as the number of score categories and the
IRT model applied. This metadata is essential for evaluating the MST panel, with
items selected based on the specifications in the |
D |
A scaling factor in IRT models to make the logistic function as close as possible to the normal ogive function (if set to 1.7). Default is 1. |
route_map |
A binary square matrix that defines the MST structure, illustrating
transitions between modules and stages. This concept and structure are inspired by
the |
module |
A binary matrix that maps items from the item bank specified in |
cut_score |
A list defining cut scores for routing test takers through MST stages.
Each list element is a vector of cut scores for advancing participants to subsequent
stage modules. In a 1-3-3 MST configuration, for example, |
theta |
A vector of ability levels (theta) at which the MST panel's performance
is assessed. This allows for the evaluation of measurement precision and bias
across a continuum of ability levels. The default range is |
intpol |
A logical value to enable linear interpolation in the inverse
test characteristic curve (TCC) scoring, facilitating ability estimate approximation
for observed sum scores not directly obtainable from the TCC, such as those below
the sum of item guessing parameters. Default is TRUE, applying interpolation
to bridge gaps in the TCC. Refer to |
range.tcc |
A vector to define the range of ability estimates for inverse TCC scoring, expressed as the two numeric values for lower and upper bounds. Default is to c(-7, 7). |
tol |
A numeric value of the convergent tolerance for the inverse TCC scoring. For the inverse TCC scoring, the bisection method is used for optimization. Default is 1e-4. |
The reval_mst
function evaluates an MST panel by
implementing a recursion-based method to assess measurement precision
across IRT ability levels. This approach, detailed in Lim et al. (2020),
enables the computation of conditional biases and CSEMs efficiently,
bypassing the need for extensive simulations traditionally required
for MST evaluation.
The module
argument, used in conjunction with the item bank metadata
x
, systematically organizes items into modules for MST panel evaluation.
Each row of x
corresponds to an item, detailing its characteristics
like score categories and IRT model. The module
matrix, structured
with the same number of rows as x
and columns representing modules,
indicates item assignments with 1s. This precise mapping enables
the reval_mst
function to evaluate the MST panel's
performance by analyzing how items within each module contribute
to measurement precision and bias, reflecting the tailored progression
logic inherent in MST designs.
The route_map
argument is essential for defining the MST's structure
by indicating possible transitions between modules. Similar to the transMatrix
in the mstR package (Magis et al., 2017), route_map
is a binary matrix that outlines
which module transitions are possible within an MST design. Each "1" in the matrix
represents a feasible transition from one module (row) to another (column),
effectively mapping the flow of test takers through the MST based on their
performance. For instance, a "1" at the intersection of row i and
column j indicates the possibility for test takers to progress from
the module corresponding to row i directly to the module denoted
by column j. This structure allows reval_mst
to simulate and evaluate
the dynamic routing of test takers through various stages and modules of the MST panel.
To further detail the cut_score
argument with an illustration:
In a 1-3-3 MST configuration, the list cut_score = list(c(-0.5, 0.5), c(-0.6, 0.6))
operates as a decision guide at each stage. Initially, all test takers start
in the first module. Upon completion, their scores determine their next stage module:
scores below -0.5 route to the first module of the next stage, between -0.5
and 0.5 to the second, and above 0.5 to the third. This pattern allows for
dynamic adaptation, tailoring the test path to individual performance levels.
This function returns a list of seven internal objects. The four objects are:
panel.info |
A list of several sub-objects containing detailed information about the MST panel configuration, including:
|
item.by.mod |
A list where each entry represents a module in the MST panel, detailing the item metadata within that module. Each module's metadata includes item IDs, the number of categories, the IRT model used (model), and the item parameters (e.g., par.1, par.2, par.3). |
item.by.path |
A list containing item metadata arranged according to the paths through the MST structure. This detailed breakdown allows for an analysis of item characteristics along specific MST paths. Each list entry corresponds to a testing stage and path, providing item metadata. This structure facilitates the examination of how items function within the context of each unique path through the MST. |
eq.theta |
Estimated ability levels (
|
cdist.by.mod |
A list where each entry contains the conditional distributions of the observed scores for each module given the ability levels. |
jdist.by.path |
Joint distributions of observed scores for different paths
at each stage in a MST panel. The example below outlines the organization of
|
eval.tb |
A table summarizing the measurement precision of the MST panel. It contains the true
ability levels ( |
Hwanggyu Lim [email protected]
Magis, D., Yan, D., & Von Davier, A. A. (2017). Computerized adaptive and multistage testing with R: Using packages catR and mstR. Springer.
Lim, H., Davey, T., & Wells, C. S. (2020). A recursion-based analytical approach to evaluate the performance of MST. Journal of Educational Measurement, 58(2), 154-178.
## ------------------------------------------------------------------------------ # Evaluation of a 1-3-3 MST panel using simMST data. # This simulation dataset was utilized in Lim et al.'s (2020) simulation study. # Details: # (a) Panel configuration: 1-3-3 MST panel # (b) Test length: 24 items (each module contains 8 items across all stages) # (c) IRT model: 3-parameter logistic model (3PLM) ## ------------------------------------------------------------------------------ # Load the necessary library library(dplyr) library(tidyr) library(ggplot2) # Import item bank metadata x <- simMST$item_bank # Import module information module <- simMST$module # Import routing map route_map <- simMST$route_map # Import cut scores for routing to subsequent modules cut_score <- simMST$cut_score # Import ability levels (theta) for evaluating measurement precision theta <- simMST$theta # Evaluate MST panel using the reval_mst() function eval <- reval_mst(x, D = 1.702, route_map = route_map, module = module, cut_score = cut_score, theta = theta, range.tcc = c(-5, 5) ) # Review evaluation results # The evaluation result table below includes conditional biases and # standard errors of measurement (CSEMs) across ability levels print(eval$eval.tb) # Generate plots for biases and CSEMs p_eval <- eval$eval.tb %>% dplyr::select(theta, bias, csem) %>% tidyr::pivot_longer( cols = c(bias, csem), names_to = "criterion", values_to = "value" ) %>% ggplot2::ggplot(mapping = ggplot2::aes(x = theta, y = value)) + ggplot2::geom_point(mapping = ggplot2::aes(shape = criterion), size = 3) + ggplot2::geom_line( mapping = ggplot2::aes( color = criterion, linetype = criterion ), linewidth = 1.5 ) + ggplot2::labs(x = expression(theta), y = NULL) + ggplot2::theme_classic() + ggplot2::theme_bw() + ggplot2::theme(legend.key.width = unit(1.5, "cm")) print(p_eval)
## ------------------------------------------------------------------------------ # Evaluation of a 1-3-3 MST panel using simMST data. # This simulation dataset was utilized in Lim et al.'s (2020) simulation study. # Details: # (a) Panel configuration: 1-3-3 MST panel # (b) Test length: 24 items (each module contains 8 items across all stages) # (c) IRT model: 3-parameter logistic model (3PLM) ## ------------------------------------------------------------------------------ # Load the necessary library library(dplyr) library(tidyr) library(ggplot2) # Import item bank metadata x <- simMST$item_bank # Import module information module <- simMST$module # Import routing map route_map <- simMST$route_map # Import cut scores for routing to subsequent modules cut_score <- simMST$cut_score # Import ability levels (theta) for evaluating measurement precision theta <- simMST$theta # Evaluate MST panel using the reval_mst() function eval <- reval_mst(x, D = 1.702, route_map = route_map, module = module, cut_score = cut_score, theta = theta, range.tcc = c(-5, 5) ) # Review evaluation results # The evaluation result table below includes conditional biases and # standard errors of measurement (CSEMs) across ability levels print(eval$eval.tb) # Generate plots for biases and CSEMs p_eval <- eval$eval.tb %>% dplyr::select(theta, bias, csem) %>% tidyr::pivot_longer( cols = c(bias, csem), names_to = "criterion", values_to = "value" ) %>% ggplot2::ggplot(mapping = ggplot2::aes(x = theta, y = value)) + ggplot2::geom_point(mapping = ggplot2::aes(shape = criterion), size = 3) + ggplot2::geom_line( mapping = ggplot2::aes( color = criterion, linetype = criterion ), linewidth = 1.5 ) + ggplot2::labs(x = expression(theta), y = NULL) + ggplot2::theme_classic() + ggplot2::theme_bw() + ggplot2::theme(legend.key.width = unit(1.5, "cm")) print(p_eval)
This function implements flexMIRT (Cai, 2017) to run a model specified in the syntax file of flexMIRT (i.e., *.flexmirt) through R. To run this function, flexMIRT software must be installed in advance. This function will be useful especially when conducting a simulation study using flexMIRT.
run_flexmirt(file.syntax, dir.flex = NULL, show.output.on.console = FALSE, ...)
run_flexmirt(file.syntax, dir.flex = NULL, show.output.on.console = FALSE, ...)
file.syntax |
A single string or vector containing the file path(s) of a flexmirt syntax file(s) to be run. An example is “C:/Users/Data/irtmodel.flexmirt". |
dir.flex |
A path of directory where flexMIRT is installed. The path may include a folder name with "flexMIRT" (e.g, flexMIRT3, flexMIRT 3.6). If NULL, a path where flexMIRT is installed will be searched in "C:/Program Files" and it will be used as a default path (e.g., "C:/Program Files/flexMIRT3", "C:/Program Files/flexMIRT 3.6"). |
show.output.on.console |
A logical value to indicate whether to capture the output of the command and show it on the R console.
Default is FALSE. See |
... |
Further arguments passed from the function |
When a path of directory where flexMIRT (with a version < 3.6) is installed is provided
in the argument dir.flex
, the directory must include following six file of
WinFlexMIRT.exe
FlexMIRT_x64.exe
FlexMIRT_x86.exe
vpg.dll
vpg.licensing.client.dll
vpg.licensing.dll
When a path of directory where flexMIRT (with a version >= 3.6) is installed is provided
in the argument dir.flex
, the directory must include following six files of
WinFlexMIRT.exe
vpg.dll
vpg.licensing.client.dll
vpg.licensing.dll
VPGLicenseClientNet.dll
and an additional directory of "Resources" that contains two files which are
flexMIRT_x64_AVX.exe
flexMIRT_x86_AVX.exe
output files of flexMIRT
Hwanggyu Lim [email protected]
Cai, L. (2017). flexMIRT 3.5 Flexible multilevel multidimensional item analysis and test scoring [Computer software]. Chapel Hill, NC: Vector Psychometric Group.
# Emxaples below will run when flexMIRT software is installed # in a default path of "C:/Program Files/flexMIRT3". # Otherwise provide a path where flexMIRT software is installed # in the argument 'dir.flex'. ## Not run: # (1) run a single syntax file # import an example of flexMIRT syntax file to run the item parameter estimation of IRT 3PL model file.syntax <- system.file("extdata", "2PLM_example.flexmirt", package = "irtQ") # run flexMIRT to estimate the item parameters of IRT 3PL model run_flexmirt(file.syntax = file.syntax, dir.flex = NULL, show.output = TRUE) # check the output file out.file <- system.file("extdata", "2PLM_example-prm.txt", package = "irtQ") bring.flexmirt(out.file, type = "par") # (2) run multiple syntax files # import two examples of flexMIRT syntax files file.syntax1 <- system.file("extdata", "2PLM_example.flexmirt", package = "irtQ") file.syntax2 <- system.file("extdata", "3PLM_example.flexmirt", package = "irtQ") # run flexMIRT to estimate the item parameters run_flexmirt(file.syntax = c(file.syntax1, file.syntax2), dir.flex = NULL, show.output = FALSE) # check the output file out.file1 <- system.file("extdata", "2PLM_example-prm.txt", package = "irtQ") out.file2 <- system.file("extdata", "3PLM_example-prm.txt", package = "irtQ") bring.flexmirt(out.file1, type = "par") bring.flexmirt(out.file2, type = "par") ## End(Not run)
# Emxaples below will run when flexMIRT software is installed # in a default path of "C:/Program Files/flexMIRT3". # Otherwise provide a path where flexMIRT software is installed # in the argument 'dir.flex'. ## Not run: # (1) run a single syntax file # import an example of flexMIRT syntax file to run the item parameter estimation of IRT 3PL model file.syntax <- system.file("extdata", "2PLM_example.flexmirt", package = "irtQ") # run flexMIRT to estimate the item parameters of IRT 3PL model run_flexmirt(file.syntax = file.syntax, dir.flex = NULL, show.output = TRUE) # check the output file out.file <- system.file("extdata", "2PLM_example-prm.txt", package = "irtQ") bring.flexmirt(out.file, type = "par") # (2) run multiple syntax files # import two examples of flexMIRT syntax files file.syntax1 <- system.file("extdata", "2PLM_example.flexmirt", package = "irtQ") file.syntax2 <- system.file("extdata", "3PLM_example.flexmirt", package = "irtQ") # run flexMIRT to estimate the item parameters run_flexmirt(file.syntax = c(file.syntax1, file.syntax2), dir.flex = NULL, show.output = FALSE) # check the output file out.file1 <- system.file("extdata", "2PLM_example-prm.txt", package = "irtQ") out.file2 <- system.file("extdata", "3PLM_example-prm.txt", package = "irtQ") bring.flexmirt(out.file1, type = "par") bring.flexmirt(out.file2, type = "par") ## End(Not run)
This function creates a data frame which includes item meta (e.g., item parameter, categories, models ...) to be used for the IRT model-data fit analysis as well as other analyses.
shape_df( par.drm = list(a = NULL, b = NULL, g = NULL), par.prm = list(a = NULL, d = NULL), item.id = NULL, cats, model, default.par = FALSE )
shape_df( par.drm = list(a = NULL, b = NULL, g = NULL), par.prm = list(a = NULL, d = NULL), item.id = NULL, cats, model, default.par = FALSE )
par.drm |
A list containing three vectors of dichotomous item parameters. Namely, the item discrimination (a), item difficulty (b), and item guessing parameters. |
par.prm |
A list containing a vector of polytomous item discrimination (or slope) parameters and a list of polytomous item threshold
(or step) parameters. In this list, the argument |
item.id |
A character vector of item IDs. If NULL, an ID is automatically given to each item. |
cats |
A vector containing the number of score categories for items. |
model |
A character vector of IRT models corresponding to items. The available IRT models are "1PLM", "2PLM", "3PLM", and "DRM" for dichotomous items, and "GRM" and "GPCM" for polytomous items. Note that "DRM" covers all dichotomous IRT models (i.e, "1PLM", "2PLM", and "3PLM") and "GRM" and "GPCM" represent the graded response model and (generalized) partial credit model, respectively. |
default.par |
A logical value to create an item meta with default item parameters. If TRUE, the number of score categories
and corresponding IRT models should be specified in the arguments of |
For any item where "1PLM" or "2PLM" is specified in model
, the item guessing parameter will be NA. If model
is
a vector of , the specified model is replicated across all items. As in the function
simdat
, it is important
to clearly specify cats
according to the order of items in the test form when a data frame for a mixed-format test needs to be created.
See simdat
for more details about how to specify cats
.
When specifying item parameters in par.drm
and/or par.prm
, keep the order of item parameter types. For example,
in the par.drm
argument, the first argument a
should contain the slope parameter vector, the second argument b
should contain the difficulty vector, and the third argument g
should contain the guessing parameter vector.
In the par.drm
argument, the first argument a
should contain the slope parameter vector and the second argument d
should contain a list including vectors of item threshold (or step) parameters for polytomous response IRT models. Note that when an item follows
the (generalized) partial credit model, the item step parameters are the overall item difficulty (or location) parameter subtracted by
the difficulty (or threshold) parameter for each category. Thus, the number of step parameters for item with m categories is m-1 because
a step parameter for the first category does not affect the category probabilities.
This function returns a data frame.
Hwanggyu Lim [email protected]
## a mixed-item format test form ## with five dichotomous and two polytomous items # create a list containing the dichotomous item parameters par.drm <- list( a = c(1.1, 1.2, 0.9, 1.8, 1.4), b = c(0.1, -1.6, -0.2, 1.0, 1.2), g = rep(0.2, 5) ) # create a list containing the polytomous item parameters par.prm <- list( a = c(1.4, 0.6), d = list( c(0.0, -1.9, 1.2), c(0.4, -1.1, 1.5, 0.2) ) ) # create a numeric vector of score categories for the items cats <- c(2, 4, 2, 2, 5, 2, 2) # create a character vector of IRT models for the items model <- c("DRM", "GRM", "DRM", "DRM", "GPCM", "DRM", "DRM") # create an item meta set shape_df(par.drm = par.drm, par.prm = par.prm, cats = cats, model = model) ## an empty item meta with five dichotomous and two polytomous items # create a numeric vector of score categories for the items cats <- c(2, 4, 3, 2, 5, 2, 2) # create a character vector of IRT models for the items model <- c("1PLM", "GRM", "GRM", "2PLM", "GPCM", "DRM", "3PLM") # create an empty item meta set shape_df(cats = cats, model = model, default.par = TRUE) ## an item meta for a single-item format test form with five dichotomous shape_df(par.drm = par.drm, cats = rep(2, 5), model = "DRM")
## a mixed-item format test form ## with five dichotomous and two polytomous items # create a list containing the dichotomous item parameters par.drm <- list( a = c(1.1, 1.2, 0.9, 1.8, 1.4), b = c(0.1, -1.6, -0.2, 1.0, 1.2), g = rep(0.2, 5) ) # create a list containing the polytomous item parameters par.prm <- list( a = c(1.4, 0.6), d = list( c(0.0, -1.9, 1.2), c(0.4, -1.1, 1.5, 0.2) ) ) # create a numeric vector of score categories for the items cats <- c(2, 4, 2, 2, 5, 2, 2) # create a character vector of IRT models for the items model <- c("DRM", "GRM", "DRM", "DRM", "GPCM", "DRM", "DRM") # create an item meta set shape_df(par.drm = par.drm, par.prm = par.prm, cats = cats, model = model) ## an empty item meta with five dichotomous and two polytomous items # create a numeric vector of score categories for the items cats <- c(2, 4, 3, 2, 5, 2, 2) # create a character vector of IRT models for the items model <- c("1PLM", "GRM", "GRM", "2PLM", "GPCM", "DRM", "3PLM") # create an empty item meta set shape_df(cats = cats, model = model, default.par = TRUE) ## an item meta for a single-item format test form with five dichotomous shape_df(par.drm = par.drm, cats = rep(2, 5), model = "DRM")
This data set contains an item pool information, response data, and examinee's ability estimates.
simCAT_DC
simCAT_DC
This data includes a list of length three. The first internal object is a data.frame of the item pool consisting of 100 dichotomous items. The item parameters of the first 90 items were generated with the IRT 2PL model and calibrated with the same model. However, the item parameters of the last 10 items were generated with the IRT 3PL model but calibrated with the IRT 2PL model. The second internal object is the response data set including a sparse response data set of 10,000 examinees for the items in the item pool. The third internal object is the examinee's ability estimates for 10,000 examinees.
Hwanggyu Lim [email protected]
This data set contains an item pool information, response data, and examinee's ability estimates.
simCAT_MX
simCAT_MX
This data includes a list of length three. The first internal object is a data.frame of the item pool consisting of 200 dichotomous items and 30 polytomous items. The dichotomous items were calibrated with the IRT 3PL model and the polytomous items were calibrated with the generalized partial credit model. All polytomous items have three score categories (i.e., 0, 1, 2). The second internal object is the response data set including a sparse response data set of 30,000 examinees for the items in the item pool. The third internal object is the examinee's ability estimates for 30,000 examinees.
Hwanggyu Lim [email protected]
This function generates a simulated response data for a single- or a mixed-format test forms. For dichotomous item response data, the IRT 1PL, 2PL, and 3PL models are available. For polytomous item response data, the graded response model, the partial credit model, and the generalized partial credit model are available.
simdat( x = NULL, theta, a.drm, b.drm, g.drm = NULL, a.prm, d.prm, cats, pr.model, D = 1 )
simdat( x = NULL, theta, a.drm, b.drm, g.drm = NULL, a.prm, d.prm, cats, pr.model, D = 1 )
x |
A data frame containing the item metadata (e.g., item parameters, number of categories, models ...). This data frame
can be easily obtained using the function |
theta |
A vector of theta values. |
a.drm |
A vector of item discrimination (or slope) parameters for dichotomous response IRT models. |
b.drm |
A vector of item difficulty (or threshold) parameters for dichotomous response IRT models. |
g.drm |
A vector of item guessing parameters for dichotomous IRT models. |
a.prm |
A vector of item discrimination (or slope) parameters for polytomous response IRT models. |
d.prm |
A list containing vectors of item threshold (or step) parameters for polytomous response IRT models. |
cats |
A vector containing the number of score categories for items. |
pr.model |
A vector of character strings specifying the polytomous model with which response data are simulated. For each polytomous model, "GRM" for the graded response model or "GPCM" for the (generalized) partial credit model can be specified. |
D |
A scaling factor in IRT models to make the logistic function as close as possible to the normal ogive function (if set to 1.7). Default is 1. |
There are two ways of generating the simulated response data.
The first way is by using the argument x
to read in a data frame of item metadata. In the data frame, the first column should have item IDs,
the second column should contain unique score category numbers of the items, and the third column should include IRT models being fit to the items.
The available IRT models are "1PLM", "2PLM", "3PLM", and "DRM" for dichotomous item data, and "GRM" and "GPCM" for polytomous item data.
Note that "DRM" covers all dichotomous IRT models (i.e, "1PLM", "2PLM", and "3PLM") and "GRM" and "GPCM" represent the graded
response model and (generalized) partial credit model, respectively. The next columns should include the item parameters of the fitted IRT models.
For dichotomous items, the fourth, fifth, and sixth columns represent the item discrimination (or slope), item difficulty, and
item guessing parameters, respectively. When "1PLM" and "2PLM" are specified in the third column, NAs should be inserted in the sixth column
for the item guessing parameters. For polytomous items, the item discrimination (or slope) parameters should be included in the
fourth column and the item difficulty (or threshold) parameters of category boundaries should be contained from the fifth to the last columns.
When the number of unique score categories differs between items, the empty cells of item parameters should be filled with NAs.
In the irtQ package, the item difficulty (or threshold) parameters of category boundaries for GPCM are expressed as
the item location (or overall difficulty) parameter subtracted by the threshold parameter for unique score categories of the item.
Note that when an GPCM item has K unique score categories, K-1 item difficulty parameters are necessary because
the item difficulty parameter for the first category boundary is always 0. For example, if an GPCM item has five score categories,
four item difficulty parameters should be specified. An example of a data frame with a single-format test is as follows:
ITEM1 | 2 | 1PLM | 1.000 | 1.461 | NA |
ITEM2 | 2 | 2PLM | 1.921 | -1.049 | NA |
ITEM3 | 2 | 3PLM | 1.736 | 1.501 | 0.203 |
ITEM4 | 2 | 3PLM | 0.835 | -1.049 | 0.182 |
ITEM5 | 2 | DRM | 0.926 | 0.394 | 0.099 |
And an example of a data frame for a mixed-format test is as follows:
ITEM1 | 2 | 1PLM | 1.000 | 1.461 | NA | NA | NA |
ITEM2 | 2 | 2PLM | 1.921 | -1.049 | NA | NA | NA |
ITEM3 | 2 | 3PLM | 0.926 | 0.394 | 0.099 | NA | NA |
ITEM4 | 2 | DRM | 1.052 | -0.407 | 0.201 | NA | NA |
ITEM5 | 4 | GRM | 1.913 | -1.869 | -1.238 | -0.714 | NA |
ITEM6 | 5 | GRM | 1.278 | -0.724 | -0.068 | 0.568 | 1.072 |
ITEM7 | 4 | GPCM | 1.137 | -0.374 | 0.215 | 0.848 | NA |
ITEM8 | 5 | GPCM | 1.233 | -2.078 | -1.347 | -0.705 | -0.116 |
See IRT Models
section in the page of irtQ-package
for more details about the IRT models used in the irtQ package.
An easier way to create a data frame for the argument x
is by using the function shape_df
.
The second way is by directly specifying item parameters for each item for which response data should be simulated
(i.e., without using a data frame, as shown in the examples that follow). In addition to item parameters,
theta
, cats
, pr.model
, and D
should be specified as well. g.drm
does not need to be specified when only
the 1PL and 2PL models are used for dichotomous item response data. For dichotomous items, 2s should be specified in cats
.
For polytomous items, the number of unique score categories should be specified in cats
. When a response data set is generated with
a mixed-format test, it is important to clearly specify cats
according to the order of items in the test form. Suppose that the response
data of ten examinees are simulated with five items, including three dichotomous items and two polytomous items with three categories.
Also, suppose that the second and the forth items are the polytomous items. Then, cats = c(2, 3, 2, 3, 2)
should be used.
Additionally, among those two polytomous items, if the first and second item response data are simulated from the graded response model
and generalized partial credit model, respectively, then pr.model = c('GRM', 'GPCM')
.
This function returns a vector or a matrix. When a matrix is returned, rows indicate theta values and columns represent items.
Hwanggyu Lim [email protected]
## example 1. ## simulates response data with a mixed-format test. ## for the first two polytomous items, the generalized partial credit model is used ## for the last polytomous item, the graded response model is used # 100 examinees are sampled theta <- rnorm(100) # set item parameters for three dichotomous items with the 3PL model a.drm <- c(1, 1.2, 1.3) b.drm <- c(-1, 0, 1) g.drm <- rep(0.2, 3) # set item parameters for three polytomous item parameters # note that 4, 4, and 5 categories are used for polytomous items a.prm <- c(1.3, 1.2, 1.7) d.prm <- list(c(-1.2, -0.3, 0.4), c(-0.2, 0.5, 1.6), c(-1.7, 0.2, 1.1, 2.0)) # create a numeric vector of score categories for both dichotomous and polytomous item data # this score category vector is used to specify the location of the polytomous items cats <- c(2, 2, 4, 4, 5, 2) # create a character vector of the IRT model for the polytomous items pr.model <- c("GPCM", "GPCM", "GRM") # simulate the response data simdat( theta = theta, a.drm = a.drm, b.drm = b.drm, g.drm = NULL, a.prm = a.prm, d.prm = d.prm, cats = cats, pr.model = pr.model, D = 1 ) ## example 2. ## simulates response data with a single-format test with the 2PL model. # create a numeric vector of score categories for the three 2PL model items cats <- rep(2, 3) # simulate the response data simdat(theta = theta, a.drm = a.drm, b.drm = b.drm, cats = cats, D = 1) ## example 3. ## the use of a "-prm.txt" file obtained from a flexMIRT # import the "-prm.txt" output file from flexMIRT flex_prm <- system.file("extdata", "flexmirt_sample-prm.txt", package = "irtQ") # read item parameters and transform them to item metadata test_flex <- bring.flexmirt(file = flex_prm, "par")$Group1$full_df # simulate the response data simdat(x = test_flex, theta = theta, D = 1) # use a data.frame of item meta information
## example 1. ## simulates response data with a mixed-format test. ## for the first two polytomous items, the generalized partial credit model is used ## for the last polytomous item, the graded response model is used # 100 examinees are sampled theta <- rnorm(100) # set item parameters for three dichotomous items with the 3PL model a.drm <- c(1, 1.2, 1.3) b.drm <- c(-1, 0, 1) g.drm <- rep(0.2, 3) # set item parameters for three polytomous item parameters # note that 4, 4, and 5 categories are used for polytomous items a.prm <- c(1.3, 1.2, 1.7) d.prm <- list(c(-1.2, -0.3, 0.4), c(-0.2, 0.5, 1.6), c(-1.7, 0.2, 1.1, 2.0)) # create a numeric vector of score categories for both dichotomous and polytomous item data # this score category vector is used to specify the location of the polytomous items cats <- c(2, 2, 4, 4, 5, 2) # create a character vector of the IRT model for the polytomous items pr.model <- c("GPCM", "GPCM", "GRM") # simulate the response data simdat( theta = theta, a.drm = a.drm, b.drm = b.drm, g.drm = NULL, a.prm = a.prm, d.prm = d.prm, cats = cats, pr.model = pr.model, D = 1 ) ## example 2. ## simulates response data with a single-format test with the 2PL model. # create a numeric vector of score categories for the three 2PL model items cats <- rep(2, 3) # simulate the response data simdat(theta = theta, a.drm = a.drm, b.drm = b.drm, cats = cats, D = 1) ## example 3. ## the use of a "-prm.txt" file obtained from a flexMIRT # import the "-prm.txt" output file from flexMIRT flex_prm <- system.file("extdata", "flexmirt_sample-prm.txt", package = "irtQ") # read item parameters and transform them to item metadata test_flex <- bring.flexmirt(file = flex_prm, "par")$Group1$full_df # simulate the response data simdat(x = test_flex, theta = theta, D = 1) # use a data.frame of item meta information
This data set has a list consisting of item metadata, item response data, and group names of three simulated groups.
simMG
simMG
This data set includes a list of three internal objects: (1) a list of item metadata (item.prm) for three groups, (2) a list of item response data (res.dat) for the three groups, and (3) a vector of group names (group.name) for the three groups.
The first internal object (item.prm) contains a list of item metadata of three test forms for the three groups. In terms of test forms, the test forms for the first and second groups have fifty items consisting of forty seven 3PLM items and three GRM items. The test form for the third group has thirty eight items consisting of thirty seven 3PLM items and one GRM item. Among the three forms, the first and second test forms share twelve common items (C1I1 through C1I12) and the second and third test forms share ten common items (C2I1 through c2I10). There is no common item between the first and third forms. The item parameters in the item metadata were used to simulate the item response data sets for the three groups (see the second object of the list).
Regrading the second internal object, all three response data sets were simulated with 2,000 latent abilities randomly sampled
from (Group 1),
(Group 2), and
(Group 3), respectively, using the
true item parameters provided in the item metadata.
The third internal object is a vector of three group names which are "Group1", "Group2", and "Group3".
Hwanggyu Lim [email protected]
This simulated 1-3-3 MST panel data set was used in Lim et al.' (2020) simulation study.
simMST
simMST
This data set includes a list of five internal objects:
item_bank
The item bank metadata, containing item parameters and other information.
module
A binary matrix that maps items from the item bank
to modules within the MST panel. This parameter enables precise
item-to-module assignments for MST configurations, analogous to
the modules
argument in the randomMST
function
of the mstR package (Magis et al., 2017).
route_map
A binary square matrix that defines the MST structure,
illustrating transitions between modules and stages. This concept is inspired
by the transMatrix
argument in the randomMST
function from the mstR package (Magis et al., 2017).
cut_score
A list defining cut scores for routing test takers through MST stages. Each list element is a vector of cut scores for advancing test takers to subsequent stage modules.
theta
A vector of ability levels (theta) at which the MST panel's performance is assessed, allowing for the evaluation of measurement precision across a continuum of ability levels.
This 1-3-3 MST panel consists of 7 modules in total with 3 stages. Each module contains eight items calibrated with the IRT 3 parameter logistic model.
Hwanggyu Lim [email protected]
Magis, D., Yan, D., & Von Davier, A. A. (2017). Computerized adaptive and multistage testing with R: Using packages catR and mstR. Springer.
Lim, H., Davey, T., & Wells, C. S. (2020). A recursion-based analytical approach to evaluate the performance of MST. Journal of Educational Measurement, 58(2), 154-178.
This method function summarizes the IRT calibration results of est_irt
or est_item
object.
summary(object, ...) ## S3 method for class 'est_irt' summary(object, ...) ## S3 method for class 'est_mg' summary(object, ...) ## S3 method for class 'est_item' summary(object, ...)
summary(object, ...) ## S3 method for class 'est_irt' summary(object, ...) ## S3 method for class 'est_mg' summary(object, ...) ## S3 method for class 'est_item' summary(object, ...)
object |
|
... |
Further arguments passed to or from other methods. |
The method function returns a list of internal objects extracted from
est_irt
or est_item
object and displays a summary of
the IRT calibration results on the console panel.
summary(est_irt)
: An object created by the function est_irt
.
summary(est_mg)
: An object created by the function est_mg
.
summary(est_item)
: An object created by the function est_item
.
Hwanggyu Lim [email protected]
# fit the 1PL model to LSAT6 data and constrain the slope parameters to be equal fit.1pl <- est_irt(data = LSAT6, D = 1, model = "1PLM", cats = 2, fix.a.1pl = FALSE) # summary of the estimation summary(fit.1pl)
# fit the 1PL model to LSAT6 data and constrain the slope parameters to be equal fit.1pl <- est_irt(data = LSAT6, D = 1, model = "1PLM", cats = 2, fix.a.1pl = FALSE) # summary of the estimation summary(fit.1pl)
This function computes (Orlando & Thissen, 2000, 2003) item fit statistic.
sx2_fit(x, ...) ## Default S3 method: sx2_fit( x, data, D = 1, alpha = 0.05, min.collapse = 1, norm.prior = c(0, 1), nquad = 30, weights, pcm.loc = NULL, ... ) ## S3 method for class 'est_item' sx2_fit( x, alpha = 0.05, min.collapse = 1, norm.prior = c(0, 1), nquad = 30, weights, pcm.loc = NULL, ... ) ## S3 method for class 'est_irt' sx2_fit( x, alpha = 0.05, min.collapse = 1, norm.prior = c(0, 1), nquad = 30, weights, pcm.loc = NULL, ... )
sx2_fit(x, ...) ## Default S3 method: sx2_fit( x, data, D = 1, alpha = 0.05, min.collapse = 1, norm.prior = c(0, 1), nquad = 30, weights, pcm.loc = NULL, ... ) ## S3 method for class 'est_item' sx2_fit( x, alpha = 0.05, min.collapse = 1, norm.prior = c(0, 1), nquad = 30, weights, pcm.loc = NULL, ... ) ## S3 method for class 'est_irt' sx2_fit( x, alpha = 0.05, min.collapse = 1, norm.prior = c(0, 1), nquad = 30, weights, pcm.loc = NULL, ... )
x |
A data frame containing the item metadata (e.g., item parameters, number of categories, models ...), an object
of class |
... |
Further arguments passed to or from other methods. |
data |
A matrix containing examinees' response data for the items in the argument |
D |
A scaling factor in IRT models to make the logistic function as close as possible to the normal ogive function (if set to 1.7). Default is 1. |
alpha |
A numeric value to specify significance |
min.collapse |
An integer value to indicate the minimum frequency of cells to be collapsed. Default is 1. See below for details. |
norm.prior |
A numeric vector of two components specifying a mean and standard deviation of the normal prior distribution. These two parameters are used to obtain the gaussian quadrature points and the corresponding weights from the normal distribution. Default is c(0,1). |
nquad |
An integer value specifying the number of gaussian quadrature points from the normal prior distribution. Default is 30. |
weights |
A two-column matrix or data frame containing the quadrature points (in the first column) and the corresponding weights
(in the second column) of the latent variable prior distribution. The weights and quadrature points can be easily obtained
using the function |
pcm.loc |
A vector of integer values indicating the locations of partial credit model (PCM) items whose slope parameters are fixed to certain values. Default is NULL. |
Often, very small expected frequencies in the contingency tables used to compute fit statistics could
compromise the accuracy of the
approximation for their distribution (Orlando & Thissen, 2000).
To avoid this problem, Orlando and Thissen (2000) used an algorithm of collapsing adjacent test score groups to maintain
a minimum expected category frequency of 1. However, if Orlando and Thissen's cell collapsing approach is applied to polytomous data,
too much information would be lost (Kang & Chen, 2008). Thus, Kang and Chen (2008) collapsed adjacent cells of item score categories
for a specific score group to ensure a minimum expected category frequency of 1. The same collapsing strategies were applied
in the function
sx2_fit
. If a minimum expected category frequency needs to be set to different number, you can specify
the minimum value in the argument min.collapse
.
Note that if "DRM" is specified for an item in the item metadata set, the item is considered as "3PLM" to compute degree of freedom of
the fit statistic.
Also, any missing responses in data
are replaced with incorrect responses (i.e., 0s).
This function returns a list. Within a list, several internal objects are contained such as:
fit_stat |
A data frame containing the results of |
item_df |
The item metadata specified in the argument |
exp_freq |
A list containing the collapsed expected frequency tables for all items. |
obs_freq |
A list containing the collapsed observed frequency tables for all items. |
exp_prob |
A list containing the collapsed expected probability tables for all items. |
obs_prop |
A list containing the collapsed observed proportion tables for all items. |
sx2_fit(default)
: Default method to compute fit statistics for a data frame
x
containing the item metadata.
sx2_fit(est_item)
: An object created by the function est_item
.
sx2_fit(est_irt)
: An object created by the function est_irt
.
Hwanggyu Lim [email protected]
Kang, T., & Chen, T. T. (2008). Performance of the generalized S-X2 item fit index for polytomous IRT models. Journal of Educational Measurement, 45(4), 391-406.
Orlando, M., & Thissen, D. (2000). Likelihood-based item-fit indices for dichotomous item response theory models. Applied Psychological Measurement, 24(1), 50-64.
Orlando, M., & Thissen, D. (2003). Further investigation of the performance of S-X2: An item fit index for use with dichotomous item response theory models. Applied Psychological Measurement, 27(4), 289-298.
irtfit
, info
, simdat
, shape_df
, est_item
## example 1: all five polytomous IRT models are GRM ## import the "-prm.txt" output file from flexMIRT flex_sam <- system.file("extdata", "flexmirt_sample-prm.txt", package = "irtQ") # select the item metadata x <- bring.flexmirt(file = flex_sam, "par")$Group1$full_df # generate examinees' abilities from N(0, 1) set.seed(23) score <- rnorm(500, mean = 0, sd = 1) # simulate the response data data <- simdat(x = x, theta = score, D = 1) # compute fit statistics fit1 <- sx2_fit(x = x, data = data, nquad = 30) # fit statistics fit1$fit_stat ## example 2: first 39th and 40th items follows GRM and 53rd, 54th, and 55th items ## follow PCM (thus, the slope parameters are fixed to 1) # replace the model names with GPCM and # assign 1 to the slope parameters for the 53rd, 54th, and 55th items x[53:55, 3] <- "GPCM" x[53:55, 4] <- 1 # generate examinees' abilities from N(0, 1) set.seed(25) score <- rnorm(1000, mean = 0, sd = 1) # simulate the response data data <- simdat(x = x, theta = score, D = 1) # compute fit statistics fit2 <- sx2_fit(x = x, data = data, nquad = 30, pcm.loc = 53:55) # fit statistics fit2$fit_stat
## example 1: all five polytomous IRT models are GRM ## import the "-prm.txt" output file from flexMIRT flex_sam <- system.file("extdata", "flexmirt_sample-prm.txt", package = "irtQ") # select the item metadata x <- bring.flexmirt(file = flex_sam, "par")$Group1$full_df # generate examinees' abilities from N(0, 1) set.seed(23) score <- rnorm(500, mean = 0, sd = 1) # simulate the response data data <- simdat(x = x, theta = score, D = 1) # compute fit statistics fit1 <- sx2_fit(x = x, data = data, nquad = 30) # fit statistics fit1$fit_stat ## example 2: first 39th and 40th items follows GRM and 53rd, 54th, and 55th items ## follow PCM (thus, the slope parameters are fixed to 1) # replace the model names with GPCM and # assign 1 to the slope parameters for the 53rd, 54th, and 55th items x[53:55, 3] <- "GPCM" x[53:55, 4] <- 1 # generate examinees' abilities from N(0, 1) set.seed(25) score <- rnorm(1000, mean = 0, sd = 1) # simulate the response data data <- simdat(x = x, theta = score, D = 1) # compute fit statistics fit2 <- sx2_fit(x = x, data = data, nquad = 30, pcm.loc = 53:55) # fit statistics fit2$fit_stat
This function computes the item category probabilities, item characteristic function, and
test characteristic function given a set of theta values. The returned object of this function can be used
to draw the item or test characteristic curve using the function plot.traceline
.
traceline(x, ...) ## Default S3 method: traceline(x, theta, D = 1, ...) ## S3 method for class 'est_item' traceline(x, theta, ...) ## S3 method for class 'est_irt' traceline(x, theta, ...)
traceline(x, ...) ## Default S3 method: traceline(x, theta, D = 1, ...) ## S3 method for class 'est_item' traceline(x, theta, ...) ## S3 method for class 'est_irt' traceline(x, theta, ...)
x |
A data frame containing the item metadata (e.g., item parameters, number of categories, models ...), an object
of class |
... |
Further arguments passed to or from other methods. |
theta |
A vector of theta values. |
D |
A scaling factor in IRT models to make the logistic function as close as possible to the normal ogive function (if set to 1.7). Default is 1. |
This function returns an object of class traceline
. This object contains a list containing
the item category probabilities, item characteristic function, and test characteristic function.
traceline(default)
: Default method to compute the item category probabilities, item characteristic function, and
test characteristic function for a data frame x
containing the item metadata.
traceline(est_item)
: An object created by the function est_item
.
traceline(est_irt)
: An object created by the function est_irt
.
Hwanggyu Lim [email protected]
## example ## using a "-prm.txt" file obtained from a flexMIRT # import the "-prm.txt" output file from flexMIRT flex_prm <- system.file("extdata", "flexmirt_sample-prm.txt", package = "irtQ") # read item parameters and transform them to item metadata test_flex <- bring.flexmirt(file = flex_prm, "par")$Group1$full_df # set theta values theta <- seq(-3, 3, 0.5) # compute the item category probabilities and item/test # characteristic functions given the theta values traceline(x = test_flex, theta, D = 1)
## example ## using a "-prm.txt" file obtained from a flexMIRT # import the "-prm.txt" output file from flexMIRT flex_prm <- system.file("extdata", "flexmirt_sample-prm.txt", package = "irtQ") # read item parameters and transform them to item metadata test_flex <- bring.flexmirt(file = flex_prm, "par")$Group1$full_df # set theta values theta <- seq(-3, 3, 0.5) # compute the item category probabilities and item/test # characteristic functions given the theta values traceline(x = test_flex, theta, D = 1)
This function writes an output file of "-prm.txt" for flexMIRT (Cai, 2017). The current version of this function
can be used only for the unidimensional IRT models. This function was written by modifying the function read.flexmirt
(Pritikin & Falk, 2020).
write.flexmirt( x, file = NULL, norm.pop = c(0, 1), rePar = TRUE, mgroup = FALSE, group.name = NULL )
write.flexmirt( x, file = NULL, norm.pop = c(0, 1), rePar = TRUE, mgroup = FALSE, group.name = NULL )
x |
A data frame containing the item metadata (e.g., item parameters, number of categories, models ...) for a single group or
a list of the item metadata for multiple groups. See |
file |
The destination file name. |
norm.pop |
A numeric vector of two components specifying a mean and standard deviation of the normal
population ability distribution for a single group or a list of the numeric vectors of length two for multiple groups.
When a list is provided, each internal numeric vector should contain a mean and standard deviation of the ability
distribution of each group (e.g., |
rePar |
A logical value indicating whether the item parameters in the item metadata are the reparameterized item parameters. If TRUE, the item intercepts and logits of item guessing parameters should be included in the item metadata. If FALSE, the item difficulty and item guessing parameters should be included in the item metadata. |
mgroup |
A logical value indicating whether a "-prm.txt" file is created for a single group or multiple groups. Default is FALSE. |
group.name |
A character vector of group names. If NULL, the group names are automatically generated (e.g., Group1). |
A "-prm.txt" file.
Hwanggyu Lim [email protected]
Cai, L. (2017). flexMIRT 3.5 Flexible multilevel multidimensional item analysis and test scoring [Computer software]. Chapel Hill, NC: Vector Psychometric Group.
Pritikin, J. N., & Falk, C. F. (2020). OpenMx: A modular research environment for item response theory method development. Applied Psychological Measurement, 44(7-8), 561-562.
## 1. Create "-prm.txt" file for a single group ## using the simulated CAT data # 1-(1) extract the item metadata x <- simCAT_MX$item.prm # 1-(2) set a name of "-prm.txt" file temp_prm <- file.path(tempdir(), "single_group_temp-prm.txt") # 1-(3) write out the "-prm.txt" file write.flexmirt(x, file = temp_prm, norm.pop = c(0, 1), rePar = FALSE) ## 2. Create "-prm.txt" file for multiple groups ## using the simulated three multiple group data # 2-(1) extract the item metadata x <- simMG$item.prm # set a name of "-prm.txt" file temp_prm <- file.path(tempdir(), "mg_group_temp-prm1.txt") # write out the "-prm.txt" file write.flexmirt(x, file = temp_prm, norm.pop = list(c(0, 1), c(0.5, 0.8), c(-0.3, 1.3)), rePar = FALSE, mgroup = TRUE, group.name = c("GR1", "GR2", "GR3") ) # or write out the "-prm.txt" file so that # all groups have the same ability distributions # and the group names are generate autoumatically temp_prm <- file.path(tempdir(), "mg_group_temp-prm2.txt") write.flexmirt(x, file = temp_prm, norm.pop = c(0, 1), rePar = FALSE, mgroup = TRUE, group.name = NULL )
## 1. Create "-prm.txt" file for a single group ## using the simulated CAT data # 1-(1) extract the item metadata x <- simCAT_MX$item.prm # 1-(2) set a name of "-prm.txt" file temp_prm <- file.path(tempdir(), "single_group_temp-prm.txt") # 1-(3) write out the "-prm.txt" file write.flexmirt(x, file = temp_prm, norm.pop = c(0, 1), rePar = FALSE) ## 2. Create "-prm.txt" file for multiple groups ## using the simulated three multiple group data # 2-(1) extract the item metadata x <- simMG$item.prm # set a name of "-prm.txt" file temp_prm <- file.path(tempdir(), "mg_group_temp-prm1.txt") # write out the "-prm.txt" file write.flexmirt(x, file = temp_prm, norm.pop = list(c(0, 1), c(0.5, 0.8), c(-0.3, 1.3)), rePar = FALSE, mgroup = TRUE, group.name = c("GR1", "GR2", "GR3") ) # or write out the "-prm.txt" file so that # all groups have the same ability distributions # and the group names are generate autoumatically temp_prm <- file.path(tempdir(), "mg_group_temp-prm2.txt") write.flexmirt(x, file = temp_prm, norm.pop = c(0, 1), rePar = FALSE, mgroup = TRUE, group.name = NULL )