dcmdata 0.2.0

Several new data sets, including data from the PIE through-year assessment project.
package
dcmdata
Author
Affiliation
Published

March 11, 2026

Doi

We are excited to announce the release of dcmdata 0.2.0. The goal of dcmdata is to provide easy access to data sets that can be used for demonstrating and testing diagnostic classification models (DCM; also called cognitive diagnostic models [CDMs]).

You can install dcmdata from CRAN with:

install.packages("dcmdata")

This blog post highlights the major changes in this release, which is mainly focused on the addition of several new data sets. You can see a full list of changes in the release notes.

Pathways for Instructionally Embedded Assessment

The Pathways for Instructionally Embedded Assessment (PIE) project was a proof-of-concept partnership between ATLAS at the University of Kansas and the Missouri Department of Elementary and Secondary Education. The project explored whether through-year instructionally embedded assessments can be used to both inform instructional decisions and support summative reporting requirements.

The PIE system is built on learning maps, which describe different ways for students to acquire knowledge, skills, and understandings around a learning target (Swinburne Romine et al., 2025). For PIE, a small map was built around each of the Missouri learning standards included in the project, and a three-level learning pathway that summarizes critical junctures in the learning map that serve as assessment targets. The PIE data included in dcmdata comes from a learning pathway aligned to the 5.RA.A.1b learning standard, a grade 5 mathematics standard covering patterns and relationships. The three levels of the learning pathway are:

  • Level 1: Recognize the order of elements in a repeating pattern.
  • Level 2: Organize two numeric patterns in a table.
  • Level 3: Translate two numeric patterns into ordered pairs.

There are two PIE data sets representing different phases of the project:

  • pie_ft_data and pie_ft_qmatrix contain data from the Spring 2024 field test, where each student responded to items measuring all three levels at a single time point.

  • pie_pilot_data and pie_pilot_qmatrix contain data from the 2024–2025 pilot administration, where assessments were spread across three time points: baseline (before instruction, measuring Level 1 only), midway (measuring Level 2, and re-assessing Level 1 for students who hadn’t yet demonstrated mastery), and end_of_unit (measuring Level 3, and re-assessing Level 2 as needed).

pie_ft_data
#> # A tibble: 172 × 16
#>    student `00592` `14415` `56400` `64967` `06238` `10231` `54596` `96748`
#>    <chr>     <int>   <int>   <int>   <int>   <int>   <int>   <int>   <int>
#>  1 8978593       1       1       1       1       1       0       1       1
#>  2 5231294       1       1       1       1       1       1       1       1
#>  3 3681220       1       1       1       1       1       1       1       1
#>  4 7763384       1       0       1       1       1       1       1       1
#>  5 1913897       1       1       1       1       1       1       1       1
#>  6 0692477       1       1       1       1       1       1       1       1
#>  7 6961042       1       1       0       1       1       1       1       1
#>  8 4241777       1       1       1       1       1       1       1       1
#>  9 3068583       1       1       1       1       1       1       1       1
#> 10 6607413       1       1       1       1       1       1       1       1
#> # ℹ 162 more rows
#> # ℹ 7 more variables: `97634` <int>, `13080` <int>, `27971` <int>,
#> #   `56741` <int>, `63088` <int>, `81175` <int>, `88063` <int>

pie_ft_qmatrix
#> # A tibble: 15 × 4
#>    task     L1    L2    L3
#>    <chr> <int> <int> <int>
#>  1 00592     1     0     0
#>  2 14415     1     0     0
#>  3 56400     1     0     0
#>  4 64967     1     0     0
#>  5 06238     0     1     0
#>  6 10231     0     1     0
#>  7 54596     0     1     0
#>  8 96748     0     1     0
#>  9 97634     0     1     0
#> 10 13080     0     0     1
#> 11 27971     0     0     1
#> 12 56741     0     0     1
#> 13 63088     0     0     1
#> 14 81175     0     0     1
#> 15 88063     0     0     1

The pilot data includes a time column, which makes it straightforward to filter to a particular assessment window:

pie_pilot_data
#> # A tibble: 2,370 × 19
#>    student time  `00592` `14415` `56400` `33008` `49531` `96568` `06238` `54596`
#>    <chr>   <fct>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
#>  1 9774034 base…       1       1       1      NA      NA      NA      NA      NA
#>  2 3719616 base…       1       1       1      NA      NA      NA      NA      NA
#>  3 6300607 base…       1       1       0      NA      NA      NA      NA      NA
#>  4 9006170 base…       1       1       1      NA      NA      NA      NA      NA
#>  5 3852370 base…       1       1       1      NA      NA      NA      NA      NA
#>  6 7020068 base…       1       1       1      NA      NA      NA      NA      NA
#>  7 7347730 base…       1       1       1      NA      NA      NA      NA      NA
#>  8 7115840 base…       1       0       0      NA      NA      NA      NA      NA
#>  9 3483273 base…       1       0       0      NA      NA      NA      NA      NA
#> 10 8994582 base…       1       1       1      NA      NA      NA      NA      NA
#> # ℹ 2,360 more rows
#> # ℹ 9 more variables: `96748` <dbl>, `97634` <dbl>, `10231` <dbl>,
#> #   `38641` <dbl>, `97673` <dbl>, `13080` <dbl>, `27971` <dbl>, `56741` <dbl>,
#> #   `63088` <dbl>

pie_pilot_qmatrix
#> # A tibble: 17 × 4
#>    task     L1    L2    L3
#>    <chr> <int> <int> <int>
#>  1 00592     1     0     0
#>  2 14415     1     0     0
#>  3 56400     1     0     0
#>  4 33008     1     0     0
#>  5 49531     1     0     0
#>  6 96568     1     0     0
#>  7 06238     0     1     0
#>  8 54596     0     1     0
#>  9 96748     0     1     0
#> 10 97634     0     1     0
#> 11 10231     0     1     0
#> 12 38641     0     1     0
#> 13 97673     0     1     0
#> 14 13080     0     0     1
#> 15 27971     0     0     1
#> 16 56741     0     0     1
#> 17 63088     0     0     1

Because the learning pathway is designed with a hierarchical structure (i.e., Level 1 skills are prerequisite to Level 2, which in turn are prerequisite to Level 3), the PIE data is especially useful for demonstrating attribute hierarchy models such as the hierarchical DCM (Templin & Bradshaw, 2014).

Item Response Warehouse

We’ve also added five new data sets from the Item Response Warehouse (Domingue et al., 2025), a repository of publicly available item response data sets. This release adds data sets spanning educational measurement, clinical psychology, and early reading assessment, reformatted for easy use with r-dcm packages like measr.

  • Fraction subtraction (fraction_data, fraction_qmatrix): An assessment measuring skills related to fraction subtraction like finding a common denominator, borrowing from a whole number, and reducing answers to simplest form. Originally described by Tatsuoka (1990), this data has been widely used in the literature to demonstrate the uses of DCMs.
  • Millon Clinical Multiaxial Inventory-III (MCMI-III; mcmi_data, mcmi_qmatrix): A psychological and clinical assessment measuring four attributes: anxiety disorder, somatoform disorder, thought disorder, and major depression. The data was originally collected by Rossi et al. (2010).
  • Rapid Online Assessment of Reading and Phonological Awareness (ROAR-PA; roarpa_data, roarpa_qmatrix): An online assessment tool designed to support early reading interventions. The data was originally collected by Gijbels et al. (2024).
  • Trends in International Mathematics and Science Study 2003 (TIMSS; timss03_data, timss03_qmatrix): U.S. sample from the 2003 grade 8 mathematics assessment.
  • TIMSS 2007 (timss07_data, timss07_skill_qmatrix, timss07_topic_qmatrix, timss07_domain_qmatrix): U.S. sample from the 2007 grade 4 mathematics assessment. The three Q-matrices reflect different choices about how finely to define the attributes being measured.

Acknowledgments

The research reported here was supported by the Institute of Education Sciences, U.S. Department of Education, through Grants R305D210045 and R305D240032 to the University of Kansas Center for Research, Inc., ATLAS. The opinions expressed are those of the authors and do not represent the views of the Institute or the U.S. Department of Education.

Featured photo by Anna Voss on Unsplash.

References

Domingue, B., Braginsky, M., Caffrey-Maffei, L., Gilbert, J. B., Kanopka, K., Kapoor, R., Lee, H., Liu, Y., Nadela, S., Pan, G., Zhang, L., Zhang, S., & Frank, M. C. (2025). An introduction to the Item Response Warehouse (IRW): A resource for enhancing data usage in psychometrics. Behavior Research Methods, 57, Article 276. https://doi.org/10.3758/s13428-025-02796-y
Gijbels, L., Burkhardt, A., Ma, W. A., & Yeatman, J. D. (2024). Rapid online assessment of reading and phonological awareness (ROAR-PA). Scientific Reports, 14, Article 10249. https://doi.org/10.1038/s41598-024-60834-9
Rossi, G., Elklit, A., & Simonsen, E. (2010). Empirical evidence for a four factor framework of personality disorder organization: Multigroup confirmatory factor analysis of the Millon Clinical Multiaxial Inventory-III personality disorder scale across Belgian and Danish data. Journal of Personality Disorders, 24(1), 128–150. https://doi.org/10.1521/pedi.2010.24.1.128
Swinburne Romine, R., Schuster, J., Karvonen, M., Thompson, W. J., Erickson, K., Simmering, V., & Bechard, S. (2025). Learning maps as cognitive models for instruction and assessment. Education Sciences, 15(3), Article 365. https://doi.org/10.3390/educsci15030365
Tatsuoka, K. K. (1990). Toward an integration of item-response theory and cognitive error diagnosis. In N. Frederiksen, R. Glaser, A. Lesgold, & M. G. Shafto (Eds.), Diagnosing monitoring of skill and knowledge acquisition (pp. 453–488). Lawrence Erlbaum Associates.
Templin, J., & Bradshaw, L. (2014). Hierarchical diagnostic classification models: A family of models for estimating and testing attribute hierarchies. Psychometrika, 79(2), 317–339. https://doi.org/10.1007/s11336-013-9362-0

Citation

BibTeX citation:
@online{thompson2026,
  author = {Thompson, W. Jake},
  title = {Dcmdata 0.2.0},
  date = {2026-03-11},
  url = {https://r-dcm.org/blog/2026-03-dcmdata-0.2.0/},
  doi = {10.59350/18f7y-yjv74},
  langid = {en}
}
For attribution, please cite this work as:
Thompson, W. J. (2026, March 11). dcmdata 0.2.0. https://doi.org/10.59350/18f7y-yjv74