- Research
- Open access
- Published:
Development of computer adaptive tests to assess the psychological status of individuals with an eating disorder or type 2 diabetes
BioPsychoSocial Medicine volume 19, Article number: 2 (2025)
Abstract
Background
Individuals with type 2 diabetes and eating disorders must change their eating behaviors, which are often influenced by psychological factors like depression and anxiety. To efficiently assess daily psychological status, the present study aimed to develop computerized adaptive tests (CAT) based on item response theory (IRT).
Methods
Individuals with depression, anxiety disorders, eating disorders, type 2 diabetes, and healthy persons participated in the study. Participants completed six questionnaires, including momentary and most recent one-week depression, anxiety, and positive affect. We selected items meeting the IRT assumptions, applied a graded response model, and conducted CAT simulations.
Results
Across all six questionnaires, the CAT simulations used a smaller number of items and exhibited substantial Pearson’s correlation coefficients exceeding 0.95 between simulated and full item-set mood status estimates. These estimated mood scores demonstrated satisfactory concurrent validity with the Hospital Anxiety and Depression Scale and sufficient discriminant validity between the clinical group and healthy controls.
Conclusion
These findings suggest that these scales offer efficient measurement of the mood status of individuals with an eating disorder or type 2 diabetes.
Introduction
The increase in the prevalence of type 2 diabetes represents a public health concern, with lifestyle modifications like regular diet and exercise habits being crucial for glycemic management [1]. The eating behaviors of people with type 2 diabetes are influenced by psychological factors, including stress and lack of sleep [2, 3]. Eating disorders are also highly prevalent, especially among women [4], with a reported increase in prevalence after the COVID-19 pandemic [5]. Psychological factors, such as depression and anxiety, can cause uncontrollable eating behavior [6,7,8]. Self-monitoring of both physical status and the psychological aspects of daily life is essential [9, 10], and daily mood assessment is important for individuals with type 2 diabetes or eating disorders.
Self-administered questionnaires are widely used to measure mood status, such as depression and anxiety [11, 12]. These scales were developed based on classical test theory, which has several limitations. These include the inability to compare different scales due to item dependency, the difficulty in comparing different groups of examinees due to sample dependency, the requirement to answer all items regardless of severity for scoring, and restrictions against replacing or adding items. Computer adaptive testing (CAT), based on item response theory (IRT), can overcome these limitations. CAT adjusts the item presentation based on the subject’s responses, allowing for highly accurate measurement with fewer items [13, 14].
The U.S. National Institutes of Health developed the Patient-Reported Outcomes Measurement Information System (PROMIS) project to assess patient-reported outcomes, including emotional distress [15]. The PROMIS CATs have been used in various clinical studies and integrated into electronic health records [16]. However, a practical CAT for measuring mood states in Japanese has yet to be established. Specifically, there are currently no CAT-based mood assessment tools developed based on data from persons with eating disorders or type 2 diabetes, and developing such tools may help understand their daily eating behaviors in future studies.
The present study aimed to develop a momentary CAT for real-time monitoring of mood status under daily life conditions and a CAT for mood over the most recent one-week period for use in outpatient consultations for Japanese individuals with eating disorders or type 2 diabetes.
Methods
Participants
The present study recruited adults (≥ 20 years old) with depression (including adjustment disorder), anxiety disorder, or an eating disorder who were attending the Department of Psychosomatic Medicine of the University of Tokyo Hospital and Nozomi Clinic, a psychiatric clinic. Adults with type 2 diabetes attending the Department of Diabetes and Metabolic Diseases of the University of Tokyo Hospital were also recruited. The recruitment was conducted between September 2014 and March 2016. We excluded the following: (1) those who were unable to understand the explanation and consent documents, (2) those who were unable to communicate, and (3) those deemed inappropriate by the physician.
Healthy individuals were also recruited through the web monitoring site, Macromill (Macromill, Inc., Tokyo, Japan) who were aged ≥ 20 years and not regularly visiting a medical institution (confirmed by self-report) in June 2014. While no other criteria were specified, only individuals who read the explanation documents and provided consent participated in the study.
Data collection
We developed new items to assess momentary depression (74 items), anxiety (27 items), and positive affect (23 items), as well as items to assess recent one-week depression (79 items), anxiety (27 items), and positive affect (23 items). Participants were asked to rate their responses to these items on a 5-point Likert-type scale. For the momentary mood, the options were “not at all applicable,” “not very applicable,” “neutral,” “relatively applicable,” or “very applicable,” For the most recent one-week mood, the choices were “none,” “rarely,” “sometimes,” “often,” or “always.” Table 1 shows examples of these items translated from Japanese to English. In addition, participants also completed the Hospital Anxiety and Depression Scale (HADS), a widely used measurement of depression and anxiety [17].
Overview of statistical analyses
According to analytic methods in the PROMIS project and a relevant study on CAT [18, 19], we conducted several analyses, including (1) descriptive statistics, (2) evaluation of IRT assumption, (3) fitting a graded response model (GRM) to the data, (4) evaluating differential item functioning (DIF), and (5) conducting CAT simulations. The IRT analyses excluded participants who had missing answers to any question. All analyses were performed using R version 4.3.1. Statistical significance was set at P < 0.05.
Descriptive statistics
Items with unanswered categories were excluded from the analysis because their parameters could not be estimated in the IRT analysis. Additionally, items with an item-remainder correlation of < 0.3 were also excluded because they failed to meet the criteria for internal consistency.
Evaluation of the assumptions of the IRT model
We evaluated the assumptions of the IRT model, including unidimensionality, local dependency, and monotonicity.
First, we tested unidimensionality by conducting principal component analysis (PCA). We confirmed that the proportion of variance of the first factor was ≥ 20% and that the ratio of the proportion of variance from the first to the second factor was ≥ 4. Items with low contributions to the first factor were excluded if these criteria were not met.
Next, we assessed local dependency by conducting a one-factor confirmatory factor analysis, which produced a residual correlation matrix (analyzed by the R package “lavaan,” version 0.6–16). Items with residual correlations > 0.2 were regarded as violations of local dependency. In cases of locally dependent pairs of items, we excluded the item with a lower contribution to the PCA’s first factor.
Finally, we examined monotonicity by developing a nonparametric IRT model (analyzed using the R package “mokken,” version 3.1.0). Items with scalability coefficients < 0.3 were excluded as violations of monotonicity.
Application of the GRM
We fitted a GRM to the remaining items using the R package “mirt,” version 1.41. Discrimination and difficulty parameters for each item and latent factor θ (representing the degree of depression, anxiety, and positive affect) for each participant were estimated using maximum a posteriori. Items that had categories without maximum probability across any θ were excluded. We also examined fit statistics (S-X2) for each item and excluded those with a poor fit, as determined by an alpha level of 0.01.
Evaluation of the DIF
We evaluated DIF for sex using the “DIF” function in the R package “mirt,” version 1.41.1. Items with an alpha level of < 0.01 were excluded.
CAT simulations
After the item selection process, Cronbach’s alpha was calculated to assess internal consistency. Then, we redeveloped a GRM and recalculated discrimination and difficulty parameters for each item and the latent factor for each participant (θtrue).
We used the resulting parameters and θtrue to perform a CAT simulation using the R package “catIrt” version 0.5-1. At the simulation’s onset, the estimated latent factor (θest) was set to zero, and a minimum of three items were administrated.
Similar to a relevant study [19], we used Bayesian mode estimation to quantify θ, and item selection was based on unweighted Fisher information. The stopping criteria were set as follows: (a) standard error of measurement of 0.32 or (b) reaching the maximum number of items.
To assess simulation accuracy, we calculated the Pearson’s correlation coefficient (PCC) between θest and θtrue.
Validity of the measurements
To assess concurrent validity, we calculated PCC between θest for positive affect and HADS depression scores, θest for depression and HADS depression scores, and θest for anxiety and HADS anxiety scores.
For discriminant validity, between-group differences in the clinical and healthy groups were determined using t-tests. Specifically, comparisons were made between the healthy group and the depression group for depression scores, between the healthy group and the anxiety group for anxiety scores, and between the healthy group and the combined depression and anxiety group for positive mood scores.
Results
Study participants
The study included 626 participants, 312 in the clinical group (137 male, 175 female; median age 51 years) and 314 healthy individuals (158 male, 156 female; median age 47 years). Table 2 shows the participants’ characteristics. The mean (S.D.) hemoglobin A1c for participants with type 2 diabetes was 7.10 (0.80).
CAT simulations for momentary depression
Out of the total of 626 participants, 586 completed the questionnaire without any missing values. Of the initial 74 items, 54 were ultimately selected, yielding a Cronbach’s alpha of 0.99 for these items. The CAT simulation, presented in Table 3, achieved a PCC of 0.950 (95% confidence interval [CI], 0.942–0.957) between θest and θtrue using an average of 4.816 items.
The θest was positively correlated with the HADS depression score (Table 4). Additionally, the θest was significantly higher in the depression group than in healthy individuals (Table 5).
CAT simulations for momentary anxiety
Out of the participants, 612 completed the questionnaire without missing values. From the initial 27 items, six were ultimately selected, resulting in a Cronbach’s alpha of 0.94. The results of the CAT simulation are presented in Table 3, achieving a PCC of 0.992 (95% CI, 0.990–0.993) between θest and θtrue using an average of 3.724 items.
The θest was positively correlated with the HADS anxiety score (Table 4). Furthermore, the θest was significantly higher in the anxiety group than in healthy individuals (Table 5).
CAT simulations for momentary positive affect
Out of all participants, 603 completed the questionnaire without missing values. Among the initial 23 items, 18 were selected, resulting in a Cronbach’s alpha of 0.93. The CAT simulation, shown in Table 3, achieved a PCC of 0.979 (95% CI, 0.975–0.982) between θest and θtrue using an average of 6.516 items.
The θest was negatively correlated with the HADS depression score (Table 4). The θest was significantly lower in the depression and anxiety group than in healthy individuals (Table 5).
CAT simulations for most recent one-week depression
Out of all participants, 584 completed the questionnaire without missing values. From the initial 79 items, 57 items were selected, with a Cronbach’s alpha of 0.99. The CAT simulation, shown in Table 3, achieved a PCC of 0.969 (95% CI, 0.964–0.974) between θest and θtrue with an average of 8.038 items.
The θest was positively correlated with the HADS depression score (Table 4). In addition, the θest was significantly higher in the depression group than in healthy participants (Table 5).
CAT simulations for most recent one-week anxiety
Out of all participants, 610 completed the questionnaire without missing values. From the original 26 items, 14 were selected, resulting in a Cronbach’s alpha of 0.97. The CAT simulation, as detailed in Table 3, achieved a PCC of 0.976 (95% CI, 0.972–0.980) between θest and θtrue using an average of 6.330 items.
The θest was positively correlated with the HADS anxiety score (Table 4). Furthermore, the θest was significantly higher in the anxiety group than in healthy participants (Table 5).
CAT simulations for most recent one-week positive affect
Out of all participants, 604 completed the questionnaire without missing values. Among the 23 items, 16 were ultimately selected, yielding a Cronbach’s alpha of 0.95. The CAT simulation achieved a PCC of 0.967 (95% CI, 0.962–0.972) between θest and θtrue using an average of 4.565 items (Table 3).
The θest was negatively correlated with the HADS depression score (Table 4). Additionally, the θest was lower in the depression and anxiety group than in healthy individuals (Table 5).
Discussion
In the present study, we prepared six types of item pools for the CAT based on IRT. After item selection to satisfy the assumptions of IRT, the items exhibited a high Cronbach’s alpha coefficient, indicating adequate internal consistency. The CAT simulations demonstrated a strong correlation between estimated latent factors based on a small number of items and those based on the full item set. Each scale had sufficient concurrent validity with the HADS scores and discriminant validity between the clinical group and healthy controls.
The CAT developed in this study has the potential to improve measurement accuracy and reduce the time needed for assessment. Additionally, automated scoring can reduce the burden on clinicians and provide immediate feedback to testees. It would be desirable to apply the CAT to monitor changes in psychological status and provide prompt feedback. Notably, the momentary version is promising for detecting psychological status in daily life, which has been a black box in clinical practice. This approach can also evaluate longitudinal relations between psychological status and eating behaviors, potentially enabling clinicians to offer more tailored treatment.
Individuals with depression and anxiety disorders exhibited higher scores for depression and anxiety and lower scores for positive affect than healthy individuals, both in the momentary and most recent one-week, confirming the scales’ discriminant validity.
The present study has several limitations. First, while the minimum sample size for the GRM is set at 500 cases [20], which this study met, a larger sample size would be preferable. This study did not fully examine the severity of each disease. Incorporating more severe cases may allow more accurate measurement across a broader spectrum of characteristics. Second, the questionnaire was designed to take about 20 min to complete; however, some participants left incomplete questions, possibly due to boredom or a desire to finish quickly, which may have influenced the results. Third, although the momentary version of the CAT aimed to assess psychological status in daily life, the present study conducted a one-time survey without specifying any particular situation. The absence of context regarding the circumstances under which participants responded could have influenced the outcomes. Fourth, this study does not include a detailed assessment of participants’ eating behaviors or daily activities. Future studies should investigate the association between daily mood status and eating behaviors using the developed CAT. Fifth, we did not evaluate the presence of mental disorders in persons with type 2 diabetes. Finally, we did not evaluate the treatment stage or duration of illness for participants with eating disorders.
In conclusion, we successfully developed a momentary version of CAT for real-time monitoring of mood status in daily life and a CAT for assessing the most recent one-week mood status that is suitable for outpatient consultations for individuals with eating disorders or type 2 diabetes.
Abbreviations
- CAT:
-
Computerized adaptive testing
- DIF:
-
Differential item functioning
- GRM:
-
Graded response model
- HADS:
-
Hospital Anxiety and Depression Scale
- IRT:
-
Item response theory
- PCA:
-
Principal component analysis
- PCC:
-
Pearson’s correlation coefficient
- PROMIS:
-
Patient-Reported Outcomes Measurement Information System
References
Magkos F, Hjorth MF, Astrup A. Diet and exercise in the prevention and treatment of type 2 diabetes mellitus. Nat Rev Endocrinol. 2020;16(10):545–55.
Geiker NRW, Astrup A, Hjorth MF, Sjödin A, Pijls L, Markus CR. Does stress influence sleep patterns, food intake, weight gain, abdominal obesity and weight loss interventions and vice versa? Obes Rev. 2018;19(1):81–97.
Fava GA, Cosci F, Sonino N. Current psychosomatic practice. Psychother Psychosom. 2017;86(1):13–30.
Treasure J, Duarte TA, Schmidt U. Eating disorders. Lancet. 2020;395(10227):899–911.
Kurisu K, Matsuoka M, Sato K, et al. Increased prevalence of eating disorders in Japan since the start of the COVID-19 pandemic. Eat Weight Disord. 2022;27(6):2251–5.
Wallis DJ, Hetherington MM. Emotions and eating. Self-reported and experimentally induced changes in food intake under stress. Appetite. 2009;52(2):355–62.
Hughes EK, Goldschmidt AB, Labuschagne Z, Loeb KL, Sawyer SM, Le Grange D. Eating disorders with and without comorbid depression and anxiety: similarities and differences in a clinical sample of children and adolescents. Eur Eat Disord Rev. 2013;21(5):386–94.
Cardi V, Leppanen J, Treasure J. The effects of negative and positive mood induction on eating behaviour: a meta-analysis of laboratory studies in the healthy population and eating and weight disorders. Neurosci Biobehav Rev. 2015;57:299–309.
Davis J, Fischl AH, Beck J, et al. 2022 national standards for diabetes self-management education and support. Diabetes Care. 2022;45(2):484–94.
Fairburn C. Cognitive behavior therapy and eating disorders. Guilford Press; 2008.
Smarr KL, Keefer AL. Measures of depression and depressive symptoms: Beck Depression Inventory-II (BDI‐II), Center for epidemiologic studies Depression Scale (CES‐D), geriatric Depression Scale (GDS), hospital anxiety and Depression Scale (HADS), and Patient Health Questionna. Arthritis Care Res. 2011;63(Suppl 11):S454–66.
Julian LJ. Measures of anxiety: state-trait anxiety inventory (STAI), Beck anxiety inventory (BAI), and hospital anxiety and depression scale‐anxiety (HADS‐A). Arthritis Care Res. 2011;63(Suppl 11):S467–72.
Reise SP, Waller NG. Item response theory and clinical measurement. Annu Rev Clin Psychol. 2009;5:27–48.
Bjorner JB, Chang CH, Thissen D, Reeve BB. Developing tailored instruments: item banking and computerized adaptive assessment. Qual Life Res. 2007;16(Suppl 1):95–108.
Pilkonis PA, Choi SW, Reise SP, et al. Item banks for measuring emotional distress from the patient-reported outcomes measurement information system (PROMIS): depression, anxiety, and anger. Assessment. 2011;18(3):263–83.
Wagner LI, Schink J, Bass M, et al. Bringing PROMIS to practice: brief and precise symptom screening in ambulatory cancer care. Cancer. 2015;121(6):927–34.
Zigmond AS, Snaith RP. The hospital anxiety and depression scale. Acta Psychiatr Scand. 1983;67(6):361–70.
Reeve BB, Hays RD, Bjorner JB, et al. Psychometric evaluation and calibration of health-related quality of life item banks: plans for the patient-reported outcomes Measurement Information System (PROMIS). Med Care. 2007;45(5 Suppl 1):S22–31.
Kurisu K, Hashimoto M, Ishizawa T, et al. Development of computer adaptive testing for measuring depression in patients with cancer. Sci Rep. 2022;12(1):8247.
Reise SP, Yu J. Parameter recovery in the graded response model using MULTILOG. J Educ Meas. 1990;27:133–44.
Acknowledgements
None.
Funding
This work is supported by JSPS KAKENHI (grant number 25460889).
Author information
Authors and Affiliations
Contributions
All the authors contributed to the study design. Takeshi Horie collected data and performed data analysis. Ken Kurisu performed data analysis. Kazuhiro Yoshiuchi supervised the study. All authors participated in interpreting the results and writing the manuscript and approved the final version of the manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
The study was approved by the ethics committee of the University of Tokyo Hospital (Approval No. 10491) and was conducted in accordance with the ethical standards of the 1964 Helsinki Declaration and its later amendments or comparable ethical standards, and all participants provided informed consent before participating.
Consent for publication
All participants provided informed consent.
Competing interests
The authors have no conflicts of interest relevant to the content of this manuscript.
Data sharing
The datasets analyzed during the current study are not publicly available because the approval of data sharing has not been obtained from the institutional review board but are available from the corresponding authors on reasonable request.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Horie, T., Kurisu, K., Inada, S. et al. Development of computer adaptive tests to assess the psychological status of individuals with an eating disorder or type 2 diabetes. BioPsychoSocial Med 19, 2 (2025). https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s13030-025-00325-z
Received:
Accepted:
Published:
DOI: https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s13030-025-00325-z