Reproducibility of the STARD checklist: an instrument to assess the quality of reporting of diagnostic accuracy studies

IRIS

BACKGROUND: In January 2003, STAndards for the Reporting of Diagnostic accuracystudies (STARD) were published in a number of journals, to improve the quality ofreporting in diagnostic accuracy studies. We designed a study to investigate the inter-assessment reproducibility, and intra- and inter-observer reproducibilityof the items in the STARD statement.METHODS: Thirty-two diagnostic accuracy studies published in 2000 in medicaljournals with an impact factor of at least 4 were included. Two reviewersindependently evaluated the quality of reporting of these studies using the 25items of the STARD statement. A consensus evaluation was obtained by discussingand resolving disagreements between reviewers. Almost two years later, the samestudies were evaluated by the same reviewers. For each item, percentagesagreement and Cohen's kappa between first and second consensus assessments(inter-assessment) were calculated. Intraclass Correlation coefficients (ICC)were calculated to evaluate its reliability.RESULTS: The overall inter-assessment agreement for all items of the STARDstatement was 85% (Cohen's kappa 0.70) and varied from 63% to 100% for individualitems. The largest differences between the two assessments were found for thereporting of the rationale of the reference standard (kappa 0.37), number ofincluded participants that underwent tests (kappa 0.28), distribution of theseverity of the disease (kappa 0.23), a cross tabulation of the results of theindex test by the results of the reference standard (kappa 0.33) and howindeterminate results, missing data and outliers were handled (kappa 0.25).Within and between reviewers, also large differences were observed for theseitems. The inter-assessment reliability of the STARD checklist was satisfactory(ICC = 0.79 [95% CI: 0.62 to 0.89]).CONCLUSION: Although the overall reproducibility of the quality of reporting ondiagnostic accuracy studies using the STARD statement was found to be good,substantial disagreements were found for specific items. These disagreements werenot so much caused by differences in interpretation of the items by the reviewersbut rather by difficulties in assessing the reporting of these items due to lack of clarity within the articles. Including a flow diagram in all reports ondiagnostic accuracy studies would be very helpful in reducing confusion betweenreaders and among reviewers.

Reproducibility of the STARD checklist: an instrument to assess the quality of reporting of diagnostic accuracy studies

Smidt N;Rutjes A;van der Windt DA;Ostelo RW;Bossuyt PM;Reitsma JB;Bouter LM;de Vet HC

2006-01-01

Abstract

BACKGROUND: In January 2003, STAndards for the Reporting of Diagnostic accuracystudies (STARD) were published in a number of journals, to improve the quality ofreporting in diagnostic accuracy studies. We designed a study to investigate the inter-assessment reproducibility, and intra- and inter-observer reproducibilityof the items in the STARD statement.METHODS: Thirty-two diagnostic accuracy studies published in 2000 in medicaljournals with an impact factor of at least 4 were included. Two reviewersindependently evaluated the quality of reporting of these studies using the 25items of the STARD statement. A consensus evaluation was obtained by discussingand resolving disagreements between reviewers. Almost two years later, the samestudies were evaluated by the same reviewers. For each item, percentagesagreement and Cohen's kappa between first and second consensus assessments(inter-assessment) were calculated. Intraclass Correlation coefficients (ICC)were calculated to evaluate its reliability.RESULTS: The overall inter-assessment agreement for all items of the STARDstatement was 85% (Cohen's kappa 0.70) and varied from 63% to 100% for individualitems. The largest differences between the two assessments were found for thereporting of the rationale of the reference standard (kappa 0.37), number ofincluded participants that underwent tests (kappa 0.28), distribution of theseverity of the disease (kappa 0.23), a cross tabulation of the results of theindex test by the results of the reference standard (kappa 0.33) and howindeterminate results, missing data and outliers were handled (kappa 0.25).Within and between reviewers, also large differences were observed for theseitems. The inter-assessment reliability of the STARD checklist was satisfactory(ICC = 0.79 [95% CI: 0.62 to 0.89]).CONCLUSION: Although the overall reproducibility of the quality of reporting ondiagnostic accuracy studies using the STARD statement was found to be good,substantial disagreements were found for specific items. These disagreements werenot so much caused by differences in interpretation of the items by the reviewersbut rather by difficulties in assessing the reporting of these items due to lack of clarity within the articles. Including a flow diagram in all reports ondiagnostic accuracy studies would be very helpful in reducing confusion betweenreaders and among reviewers.

Scheda breve

Scheda completa

Scheda completa (DC)

Anno

2006

Appare nelle tipologie:

1.1 Articolo in rivista

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14245/10705

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

53

social impact