PHQ-9: Patient Health Questionnaire – Depression Scale

Introduction

The Patient Health Questionnaire-9 (PHQ-9) is the most widely used depression screening and severity measurement tool in healthcare worldwide. Developed by Kroenke, Spitzer, and Williams in 2001, this groundbreaking 9-item self-report questionnaire directly corresponds to the nine DSM-5 diagnostic criteria for major depressive disorder, making it uniquely valuable for both screening and clinical assessment.

The PHQ-9 transformed depression detection and management by providing the first brief, validated tool that directly maps onto diagnostic criteria while serving dual purposes as both a screening instrument and severity measure. Before the PHQ-9, depression assessment often relied on lengthy clinical interviews or instruments that measured symptoms without clear diagnostic relevance.

Depression as a Clinical Syndrome

Major depressive disorder is one of the most common mental health conditions worldwide, affecting approximately 5-7% of adults in any given year. It is characterized by persistent feelings of sadness or loss of interest, along with a constellation of cognitive, behavioral, and physical symptoms that significantly impair functioning.

The PHQ-9 measures depression as defined by the DSM-5, which requires the presence of at least five symptoms during the same two-week period, with at least one symptom being either depressed mood or anhedonia (loss of interest/pleasure). This symptom-based approach allows the PHQ-9 to function not only as a screening tool but also as an aid in diagnostic assessment and treatment monitoring.

Theoretical Foundation

The PHQ-9 is grounded in the DSM-5 diagnostic framework, which emphasizes symptom-based criteria for major depressive disorder. Unlike traditional depression scales that measure general distress or mood symptoms, the PHQ-9 systematically evaluates each of the nine specific criteria required for depression diagnosis.

The nine DSM-5 criteria assessed are:

  • Anhedonia (loss of interest or pleasure)
  • Depressed mood (feeling down or hopeless)
  • Sleep disturbances (insomnia or hypersomnia)
  • Fatigue or loss of energy
  • Appetite changes (increase or decrease)
  • Feelings of worthlessness or excessive guilt
  • Diminished ability to concentrate
  • Psychomotor agitation or retardation
  • Recurrent thoughts of death or suicidal ideation

The two-week timeframe used in the PHQ-9 directly corresponds to the DSM-5 diagnostic requirement, making it a clinically relevant assessment period. The frequency-based response scale (not at all, several days, more than half the days, nearly every day) captures the persistence of symptoms, which is critical for distinguishing clinical depression from transient mood changes.

This alignment with diagnostic criteria makes the PHQ-9 particularly valuable in clinical settings, where it can guide not only screening decisions but also inform diagnostic formulation and track symptom changes during treatment.

🏥 Clinical Standard: The PHQ-9 is recommended by major medical organizations including the US Preventive Services Task Force, American College of Physicians, and World Health Organization for routine depression screening.

Key Features

Assessment Characteristics

  • 9 items corresponding exactly to DSM-5 depression criteria
  • 2-3 minutes administration time
  • Ages 12+ through adult with extensive validation across age groups
  • 4-point frequency scale (0-3) for response options
  • Dual functionality as screening tool and severity measure
  • Public domain – free for all uses worldwide

Depression Dimensions Assessed

  • Anhedonia – Loss of interest or pleasure in doing things
  • Depressed mood – Feeling down, depressed, or hopeless
  • Sleep disturbance – Trouble sleeping or sleeping too much
  • Fatigue – Feeling tired or having little energy
  • Appetite changes – Poor appetite or overeating
  • Guilt/worthlessness – Negative self-evaluation and self-blame
  • Concentration problems – Difficulty focusing on activities
  • Psychomotor changes – Moving/speaking slowly or being restless
  • Suicidal ideation – Thoughts of death or self-harm

Research and Clinical Applications

  • Primary care screening – Standard depression detection in medical settings
  • Mental health assessment – Initial evaluation in psychiatric settings
  • Treatment monitoring – Track symptom changes during therapy
  • Clinical trials – Outcome measure in depression research
  • Healthcare quality – Performance measurement and quality improvement
  • Population health – Community mental health surveillance
  • Collaborative care – Communication tool across care teams

View Testable Demo

► Click here to try the Testable implementation

Assess depression symptoms experienced over the past 2 weeks.

Scoring and Interpretation

Response Format

Participants rate how often they have been bothered by each problem over the last 2 weeks using a 4-point frequency scale:

  • 0 = Not at all
  • 1 = Several days
  • 2 = More than half the days
  • 3 = Nearly every day

Complete PHQ-9 Items

“Over the last 2 weeks, how often have you been bothered by any of the following problems?”

  1. Little interest or pleasure in doing things
  2. Feeling down, depressed, or hopeless
  3. Trouble falling or staying asleep, or sleeping too much
  4. Feeling tired or having little energy
  5. Poor appetite or overeating
  6. Feeling bad about yourself — or that you are a failure or have let yourself or your family down
  7. Trouble concentrating on things, such as reading the newspaper or watching television
  8. Moving or speaking so slowly that other people could have noticed. Or the opposite — being so fidgety or restless that you have been moving around a lot more than usual
  9. Thoughts that you would be better off dead, or of hurting yourself in some way

Functional Impairment Question

After the 9 items, participants answer:

“If you checked off any problems, how difficult have these problems made it for you to do your work, take care of things at home, or get along with other people?”

  • Not difficult at all
  • Somewhat difficult
  • Very difficult
  • Extremely difficult

(This question is not included in total score but provides important clinical context)

Scoring Procedure

  1. Sum all item responses (range: 0-27)
  2. Higher scores indicate greater depression severity
  3. Individual items can be examined for specific symptoms
  4. Item 9 requires special attention regardless of total score

Depression Severity Classification

Total ScoreSeverity LevelClinical Action
0-4None-minimalRoutine screening; no treatment indicated
5-9MildWatchful waiting; repeat PHQ-9 at follow-up
10-14ModerateTreatment plan: counseling, follow-up, and/or pharmacotherapy
15-19Moderately severeActive treatment with pharmacotherapy and/or psychotherapy
20-27SevereImmediate initiation of pharmacotherapy and/or psychotherapy

Diagnostic Algorithm (Research/Clinical Assessment)

Provisional diagnosis of major depressive disorder requires:

5 or more items scored as 2 (more than half the days) or 3 (nearly every day)

PLUS:

  • Must include Item 1 (anhedonia) OR Item 2 (depressed mood)
  • Item 9 (suicidal ideation) counts if present at any frequency (score ≥1)

Suicide Risk Assessment

Any positive response to Item 9 requires immediate clinical attention:

  • Score 1 (“several days”): Enhanced monitoring, follow-up assessment, and safety planning
  • Score 2-3 (“more than half the days” or “nearly every day”): Comprehensive suicide risk evaluation required immediately

Population Norms and Cut-Points

  • Optimal screening cutoff: ≥10 (sensitivity 88%, specificity 88%) (Kroenke et al., 2001)
  • Alternative cutoffs: ≥8 for increased sensitivity; ≥12 for medical populations (Levis et al., 2019)
  • Meaningful change: ≥5 point reduction indicates clinically significant improvement (Löwe et al., 2004)

Research Evidence and Psychometric Properties

Reliability Evidence

  • Internal consistency: α = 0.86 (95% CI [0.85, 0.87]) in meta-analysis of 60 studies with 232,147 participants (Ajele & Idemudia, 2025)
  • Test-retest reliability: r = 0.84 over 48-hour interval with phone interview correlation of 0.84 (Kroenke et al., 2001)
  • Cross-cultural reliability: Consistent internal consistency (α = 0.80-0.89) across 30+ countries and languages (Levis et al., 2019)
  • Age group stability: Reliable across adolescents (α = 0.89), adults (α = 0.86), and elderly populations (α = 0.84) (various studies)

Diagnostic Accuracy

Primary care and general medical settings:

  • Sensitivity: 88% for major depression at ≥10 cutoff using semi-structured interviews as gold standard (Kroenke et al., 2001)
  • Specificity: 88% for major depression at ≥10 cutoff (Kroenke et al., 2001)
  • Area under ROC curve: 0.95 indicating excellent discriminative ability (Kroenke et al., 2001)
  • Positive predictive value: 32-56% depending on prevalence in population (Levis et al., 2019)
  • Negative predictive value: 95-98% across settings (Levis et al., 2019)

Meta-analytic evidence:

  • Individual participant data meta-analysis: 58 studies (17,357 participants) confirming ≥10 cutoff optimal for most settings (Levis et al., 2019)
  • Diagnostic algorithm accuracy: 27 validation studies showing sensitivity 64-88% and specificity 72-88% (Manea et al., 2015)

Validity Evidence

Convergent validity:

  • Beck Depression Inventory-II: r = 0.73-0.84 in multiple studies (Kroenke et al., 2001)
  • Hamilton Depression Rating Scale: r = 0.86 with clinician-administered measure (Kroenke et al., 2001)
  • Structured clinical interviews: High agreement with SCID and CIDI diagnoses (Levis et al., 2019)

Discriminant validity:

  • Anxiety measures: r = 0.60-0.65, showing overlap but distinctiveness (Kroenke et al., 2001)
  • Physical health measures: Lower correlations (r = 0.30-0.40) than with mental health measures (Kroenke et al., 2001)

Treatment Sensitivity

  • Reliable change index: ≥5 point change indicates clinically meaningful improvement (Löwe et al., 2004)
  • Effect size detection: Sensitive to small-to-moderate treatment effects (d = 0.3-0.8) in clinical trials (Löwe et al., 2004)
  • Therapy monitoring: Effectively tracks symptom changes across CBT, IPT, and medication trials (various studies)
  • Remission assessment: Score <5 commonly used as remission criterion in clinical trials (Kroenke et al., 2001)

Cross-Cultural Validation

  • Global validation: Validated in 49 studies across low- and middle-income countries (Carroll & Hook, 2020)
  • Language versions: Available in 80+ languages with consistent psychometric properties (multiple studies)
  • Cultural adaptation: Demonstrated reliability across diverse populations with sensitivity 64-88% depending on population and setting (Carroll & Hook, 2020)
  • Measurement invariance: Consistent factor structure across ethnic and cultural groups (various studies)

Special Populations

Adolescents (12-17 years):

  • Good reliability (α = 0.89) and validity for teen populations (Richardson et al., 2010)
  • Same cutoffs applicable with consideration of developmental context

Older adults (65+):

  • Valid and reliable but may underdetect depression due to somatic symptom overlap (Pocklington et al., 2016)
  • Consider higher cutoffs (≥12) or complementary assessment

Medical populations:

  • Higher cutoffs (≥12) may reduce false positives due to medical symptom overlap (Levis et al., 2019)
  • Remains valid in chronic illness, cancer, cardiac, and pain populations

Pregnant/postpartum women:

  • Adequate psychometric properties but Edinburgh Postnatal Depression Scale may be preferable for postpartum-specific assessment (Matthey et al., 2006)

Clinical Applications and Usage Guidelines

Primary Clinical Applications

  • Annual depression screening in primary care settings (USPSTF Grade B recommendation)
  • Initial mental health assessment in psychiatric and counseling settings
  • Treatment progress monitoring every 2-4 weeks during active therapy
  • Outcome measurement in healthcare quality improvement programs
  • Collaborative care models for systematic tracking across care teams

Clinical Decision Support by Severity

Minimal depression (0-4):

  • No treatment indicated
  • Continue routine screening at annual visits
  • Provide general health maintenance counseling

Mild depression (5-9):

  • Watchful waiting with repeat PHQ-9 in 2-4 weeks
  • Consider lifestyle interventions (exercise, sleep hygiene, stress management)
  • Brief counseling or support groups
  • Psychoeducation about depression

Moderate depression (10-14):

  • Active treatment planning required
  • Options: psychotherapy alone OR medication alone OR combination
  • Discuss treatment preferences with patient
  • Establish follow-up monitoring schedule

Moderately severe depression (15-19):

  • Active treatment with pharmacotherapy and/or evidence-based psychotherapy
  • More frequent monitoring (every 2-4 weeks initially)
  • Consider referral to mental health specialist
  • Assess suicide risk and safety planning

Severe depression (20-27):

  • Immediate treatment initiation
  • Strong consideration for combination therapy
  • Expedited referral to psychiatry
  • Frequent monitoring (weekly if possible)
  • Comprehensive suicide risk assessment

Suicide Risk Protocol

Any endorsement of Item 9 (score ≥1) requires:

  • Immediate clinical follow-up same day
  • Comprehensive suicide risk assessment
  • Safety planning and lethal means counseling
  • Increased monitoring frequency
  • Consider referral to emergency/crisis services if high risk

Treatment Monitoring Guidelines

Frequency:

  • Administer at baseline, then every 2-4 weeks during active treatment
  • Continue monthly once stabilized
  • Resume frequent monitoring if relapse suspected

Interpreting change:

  • ≥5 point decrease = clinically significant improvement (Löwe et al., 2004)
  • <2 point change = treatment not working, consider adjustment
  • Score <5 = treatment target/remission for most patients

Healthcare System Integration

Electronic health record implementation:

  • Automated scoring and clinical decision support
  • Alert systems for positive suicide screening
  • Progress tracking over time with graphical displays

Quality measurement:

  • HEDIS (Healthcare Effectiveness Data and Information Set) measures
  • Patient-Centered Medical Home recognition criteria
  • Value-based care performance metrics

Research Applications

Clinical trials:

  • Primary or secondary outcome measure for depression treatment studies
  • Change scores or remission rates (<5) as endpoints
  • Responder analysis (≥50% reduction from baseline)

Epidemiological research:

  • Population-based depression prevalence screening
  • Large-scale surveys and surveillance studies
  • Cross-cultural comparative research

Special Considerations

Medical comorbidity:

  • Somatic symptoms may overlap with medical illness (fatigue, sleep, appetite)
  • Consider higher cutoffs (≥12) in medical populations
  • Clinical judgment essential for interpretation

Cultural sensitivity:

  • Depression expression varies across cultures
  • Some populations may somaticize emotional distress
  • Consider cultural context when interpreting results

Age considerations:

  • Adolescents: Same scoring but consider developmental context
  • Elderly: May underreport mood symptoms; assess cognitive factors
  • Cultural factors affecting symptom report across age groups

Limitations and Cautions

  • Not diagnostic alone: Clinical interview required for definitive diagnosis
  • Cannot detect bipolar disorder: Additional screening needed for manic/hypomanic episodes
  • Symptom overlap: Medical illness may inflate scores
  • Self-report limitations: Depends on insight and willingness to disclose
  • Two-week timeframe: May miss episodic or rapidly cycling symptoms

Import & Customize Testable Template

► Import scale to your Testable account – Add this scale. Modify instructions, edit questions, adjust presentation. Test anyone (including yourself)

► Try Testable version – View the full implementation of this scale in Testable.

► View detailed implementation guide in Testable – Step by step instructions for complete customization.

► Browse other tests and scales in Testable Library – The largest collection of ready-made psychological tests and scales.

Copyright and Usage Responsibility: Check that you have the proper rights and permissions to use this assessment tool in your research. This may include purchasing appropriate licenses, obtaining permissions from authors/copyright holders, or ensuring your usage falls within fair use guidelines.

The PHQ-9 is in the public domain and freely available for all uses worldwide. Pfizer, which originally held the copyright, released the PHQ-9 “without copyright restriction and at no charge, providing unprecedented access to these valuable and widely used tools.” No permission is required to reproduce, translate, display, or distribute the PHQ-9 for clinical, research, or educational purposes.

Proper Attribution: When using or referencing this scale, cite the original development:

Kroenke, K., Spitzer, R. L., & Williams, J. B. (2001). The PHQ-9: Validity of a brief depression severity measure. Journal of General Internal Medicine, 16(9), 606-613.

Official PHQ Screeners Website

USPSTF Depression Screening Recommendations

APA Depression Practice Guidelines

National Institute of Mental Health – Depression

WHO Depression Resources

References

Primary Development Citation:

  • Kroenke, K., Spitzer, R. L., & Williams, J. B. (2001). The PHQ-9: Validity of a brief depression severity measure. Journal of General Internal Medicine, 16(9), 606-613.

Diagnostic Accuracy Meta-Analyses:

  • Levis, B., Benedetti, A., Thombs, B. D., & DEPRESsion Screening Data (DEPRESSD) Collaboration. (2019). Accuracy of Patient Health Questionnaire-9 (PHQ-9) for screening to detect major depression: individual participant data meta-analysis. BMJ, 365, l1476.
  • Manea, L., Gilbody, S., & McMillan, D. (2015). A diagnostic meta-analysis of the Patient Health Questionnaire-9 (PHQ-9) algorithm scoring method as a screen for depression. General Hospital Psychiatry, 37(1), 67-75.

Reliability Generalization:

  • Ajele, K. W., & Idemudia, E. S. (2025). Charting the course of depression care: a meta-analysis of reliability generalization of the patient health questionnaire (PHQ-9) as the measure. Discover Mental Health, 5(1), 1-18.

Cross-Cultural Validation:

  • Carroll, H. A., & Hook, K. (2020). Establishing reliability and validity for mental health screening instruments in resource-constrained settings: Systematic review of the PHQ-9 and key recommendations. Journal of Affective Disorders, 262, 434-445.

Treatment Monitoring:

  • Löwe, B., Kroenke, K., Herzog, W., & Gräfe, K. (2004). Measuring depression outcome with a brief self-report instrument: Sensitivity to change of the Patient Health Questionnaire (PHQ-9). Journal of Affective Disorders, 81(1), 61-66.

Clinical Guidelines:

  • Siu, A. L., & US Preventive Services Task Force. (2016). Screening for depression in adults: US Preventive Services Task Force recommendation statement. JAMA, 315(4), 380-387.

Special Populations:

  • Richardson, L. P., McCauley, E., Grossman, D. C., McCarty, C. A., Richards, J., Russo, J. E., Rockhill, C., & Katon, W. (2010). Evaluation of the Patient Health Questionnaire-9 Item for detecting major depression among adolescents. Pediatrics, 126(6), 1117-1123.
  • Pocklington, C., Gilbody, S., Manea, L., & McMillan, D. (2016). The diagnostic accuracy of brief versions of the Geriatric Depression Scale: A systematic review and meta-analysis. International Journal of Geriatric Psychiatry, 31(8), 837-857.
Last Updated: