Studies employing multiple methods of cognitive assessment have often found low convergent validity, especially between endorsement (e.g., a checklist or questionnaire) and production (e.g., think aloud or thought listing) methods. These results raise serious questions about the construct validity of measures receiving extensive use in cognitive behavior therapy research and practice. Five possible explanations for low convergent validity of cognitive assessments are elaborated: unreliability of production measures due to insufficient aggregation, noncomparable stimulus sampling across measures, use of different judges on each measure to make subjective ratings, invalidity of endorsement measures resulting from the nomothetic assumption that the same cognitions apply equally well to everyone, and invalidity resulting from arbitrary decisions about response formats during test construction. Suggestions regarding how these explanations might be tested are described, and in the case of the last explanation an original study illustrating such a research strategy is presented. |