HDRS/HAM-D: Hamilton Depression Rating Scale

Reviewed by: Constantin Rezlescu | Associate Professor | UCL Psychology

TL;DR

  • The Hamilton Depression Rating Scale (HDRS/HAM-D) is a clinician-administered questionnaire developed in 1960 that became the gold standard for measuring depression severity in clinical trials, requiring 15-20 minutes and trained clinical judgment to assess 17-21 items across mood, cognitive, somatic, anxiety, and behavioral domains.
  • The scale demonstrates adequate inter-rater reliability (ICC >0.85 for total scores) and good convergent validity with other depression measures (r=0.72-0.73), but has notable psychometric limitations including variable internal consistency (α=0.46-0.92), inconsistent factor structure across samples, and overemphasis on somatic symptoms that may inflate scores in medically ill populations.
  • Despite psychometric concerns, the HDRS remains the most widely used primary outcome measure in pharmaceutical depression trials due to regulatory acceptance (FDA standard), established treatment response criteria (≥50% reduction or score ≤7 for remission), and decades of accumulated evidence, though structured versions like SIGH-D improve standardization and reliability.

Introduction

The Hamilton Depression Rating Scale (HDRS), also known as the Hamilton Depression Scale (HAM-D), is a multiple-item clinician-administered questionnaire developed by English psychiatrist Max Hamilton in 1960. For over four decades, it served as the “gold standard” for measuring depression severity in clinical trials and pharmaceutical research, fundamentally shaping how depression treatment efficacy is evaluated in medical research.

Historical Impact and Clinical Legacy: Hamilton originally developed this scale to enable psychiatrists to chart changes in diagnosed patients through particular treatment regimes, converting qualitative clinical judgments into quantitative data. The scale’s dominance emerged gradually from an emerging consensus among psychiatrists undertaking clinical trials for depression, particularly with psychopharmaceuticals from the 1960s onward.

Paradigm-Shifting Assessment Approach

The HDRS represented a revolutionary shift in depression assessment by providing the first systematic, clinician-administered tool for measuring treatment response. Unlike self-report measures, the HDRS requires trained clinical judgment and structured interview techniques, making it particularly valuable for pharmaceutical research where objective clinical assessment is paramount.

Theoretical Foundation

The HDRS was developed before modern diagnostic systems like the DSM-5, emerging from Hamilton’s clinical experience and the prevailing theoretical understanding of depression in the late 1950s. Rather than mapping onto specific diagnostic criteria, the scale reflects a broad clinical assessment approach that emphasizes observable signs and symptoms across multiple domains.

The HDRS evaluates depression through several core clinical domains:

  • Mood and affect – Depressed mood, feelings of guilt, suicidal ideation
  • Cognitive functioning – Agitation, retardation, insight, psychomotor change
  • Somatic symptoms – Insomnia patterns, appetite and weight changes, loss of energy
  • Anxiety components – Psychological anxiety, somatic anxiety, hypochondriasis
  • Behavioral functioning – Work and activities, sexual interest, general somatic symptoms

The clinician-administered format was intentional: Hamilton believed that trained observers could more reliably detect subtle changes in symptom severity than patients could self-report, particularly for symptoms like psychomotor changes, which patients might lack insight into or accurately judge.

🏥 Clinical Trials Standard: The HDRS remains the most widely used primary outcome measure in depression clinical trials, required by regulatory agencies including the FDA for antidepressant approval studies.

Key Features

Assessment Characteristics

  • 17-21 items depending on version (HDRS-17 most common)
  • 15-20 minutes administration time by trained clinician
  • Adult populations (originally designed for diagnosed depressed patients)
  • 3-5 point scales varying by individual item complexity
  • Clinician-administered requiring structured interview skills

Clinical Assessment Domains

  • Mood evaluation – Core depressive symptoms and guilt assessment
  • Sleep disturbances – Multiple insomnia patterns and sleep quality
  • Somatic symptoms – Physical manifestations and energy levels
  • Anxiety components – Both psychological and somatic anxiety features
  • Behavioral functioning – Work capacity and social engagement
  • Suicidal ideation – Risk assessment and safety evaluation

Research and Clinical Applications

  • Pharmaceutical trials – Primary endpoint in antidepressant efficacy studies
  • Clinical research – Gold standard comparator for new depression measures
  • Treatment monitoring – Before-and-after assessment in clinical settings
  • Regulatory approval – FDA standard for drug development programs
  • International research – Standardized measure across global clinical trials

View Testable Demo

► Click here to try the Testable implementation

Conduct standardized clinician-administered depression severity assessment for clinical research participants.

Scoring and Interpretation

Response Format

Each item is scored by the clinician based on structured interview information, with items using either 3-point scales (0-2) or 5-point scales (0-4) depending on the specific symptom being assessed. The clinician rates severity based on patient responses and clinical observation during the assessment period.

Sample HDRS Item Structure

Item 1: Depressed Mood (5-point scale)

  • 0: Absent
  • 1: These feeling states indicated only on questioning
  • 2: These feeling states spontaneously reported verbally
  • 3: Communicates feeling states non-verbally through facial expression, posture, voice, tendency to weep
  • 4: Patient reports virtually only these feeling states in spontaneous verbal and non-verbal communication

Item 3: Suicide (5-point scale)

  • 0: Absent
  • 1: Feels life is not worth living
  • 2: Wishes he were dead or any thoughts of possible death to self
  • 3: Suicidal ideas or gesture
  • 4: Attempts at suicide (any serious attempt rates 4)

Item 4: Insomnia Early (3-point scale)

  • 0: No difficulty falling asleep
  • 1: Complains of occasional difficulty falling asleep—i.e., more than 1/2 hour
  • 2: Complains of nightly difficulty falling asleep

HDRS-17 Severity Interpretation

Total ScoreSeverity LevelClinical Interpretation
0-7Normal/RemissionNo depression or successful treatment response
8-13Mild depressionMild symptoms requiring monitoring
14-18Moderate depressionClinically significant depression warranting treatment
19-22Severe depressionSubstantial depression requiring active intervention
≥23Very severe depressionSevere depression requiring intensive treatment

Clinical Thresholds and Research Standards

  • Entry criteria: Score ≥20 typically required for clinical trial participation
  • Treatment response: ≥50% reduction from baseline score
  • Remission: Score ≤7 indicating minimal residual symptoms
  • Clinical significance: ≥11-point absolute decrease indicates meaningful improvement

Research Evidence and Psychometric Properties

Reliability Evidence

Internal consistency:

  • Cronbach’s α range: 0.46-0.92 across 70 studies, with 8 of 12 studies ≤0.76, indicating variable and often inadequate internal consistency (Bagby et al., 2004)
  • Pooled internal consistency: α = 0.79 in meta-analysis across 49 years, with substantial heterogeneity across studies (Morriss et al., 2011)

Inter-rater reliability:

  • Total score ICC: Generally exceeding 0.85 for total scores in most studies (Bagby et al., 2004)
  • Item-level agreement: Poor for individual items, particularly those requiring subjective clinical judgment (Bagby et al., 2004)
  • Structured interview reliability: SIGH-D shows improved inter-rater reliability compared to unstructured administration (Williams, 1988)

Test-retest reliability:

  • Short-term stability: Adequate for total scores over brief intervals, though influenced by actual symptom change versus measurement error (Morriss et al., 2011)
  • Item-level stability: Poor for individual items over time (Bagby et al., 2004)

Validity Evidence

Convergent validity:

  • Correlation with BDI and MADRS: r = 0.72-0.73 with other established depression measures (Bagby et al., 2004)
  • Agreement with clinical ratings: Good correspondence with clinician assessments of depression severity (Bagby et al., 2004)

Discriminant validity:

  • Depression vs. non-depression: Adequate ability to distinguish depressed from non-depressed patients (Bagby et al., 2004)
  • Overlap with anxiety: Moderate correlations with anxiety measures, particularly on anxiety-specific items (Bagby et al., 2004)

Predictive validity:

  • Treatment outcomes: Good prediction of treatment response in clinical trials, with HDRS-defined remission predicting functional outcomes (Bagby et al., 2004)

Content validity concerns:

  • Sleep symptom overemphasis: Criticized for including 3 insomnia items while underrepresenting cognitive symptoms (Bagby et al., 2004)
  • Item discrimination: Some items (e.g., insight, genital symptoms) show poor discrimination and contribute minimally to total score (Bagby et al., 2004)

Factor Structure

Multiple factor solutions reported:

  • Unidimensional model: Original single-factor model not well-supported by empirical data (Bagby et al., 2004)
  • Multi-factor solutions: Most consistent findings support 2-3 factor solutions including anxiety/somatization, cognitive, and core depressive symptom factors (Bagby et al., 2004)
  • Cross-sample variability: Factor structure varies across samples and cultures (Bagby et al., 2004)
  • Construct clarity: Questions remain about whether HDRS measures a single construct or multiple dimensions (Bagby et al., 2004)

Treatment Sensitivity

  • Medication trial sensitivity: Sensitive to change in antidepressant medication trials with effect sizes for medication vs. placebo typically d = 0.3-0.5 (Bagby et al., 2004)
  • Clinically meaningful change: Can detect clinically meaningful improvements over 6-12 week treatment periods (Morriss et al., 2011)
  • Response criteria: Standardized definition of response (≥50% reduction) widely used in clinical trials (Bagby et al., 2004)
  • Remission criteria: Score ≤7 established as remission criterion in depression research (Bagby et al., 2004)

Psychometric Limitations

Known psychometric concerns identified in literature:

  • Variable internal consistency: Inconsistent reliability across studies and populations, with many studies showing α < 0.80 (Bagby et al., 2004)
  • Uneven item quality: Some items contributing minimally to scale discrimination and overall performance (Bagby et al., 2004)
  • Factor structure inconsistency: Lack of stable factor structure across different samples (Bagby et al., 2004)
  • Somatic symptom emphasis: Overemphasis on somatic symptoms may inflate scores in medically ill populations (Bagby et al., 2004)
  • Standardization challenges: Some items require clinical inferences that are difficult to standardize across raters (Williams, 1988)

Despite these limitations, the scale continues to be used due to historical precedent, regulatory acceptance, and accumulated evidence base (Healy, 2013).

Usage Guidelines and Applications

Primary Research Applications

  • Pharmaceutical clinical trials – Primary efficacy endpoint in antidepressant development programs
  • Regulatory submissions – FDA standard for demonstrating drug efficacy in depression
  • Academic research – Established comparator for validating new depression assessment tools
  • International studies – Standardized measure enabling cross-cultural clinical research
  • Treatment protocols – Baseline and outcome assessment in structured clinical interventions

Clinical Practice Applications

  • Specialist settings – Depression severity assessment in psychiatric outpatient clinics
  • Hospital settings – Standardized assessment for inpatient depression treatment monitoring
  • Research clinics – Patient selection and treatment response tracking
  • Training programs – Educational tool for clinical interviewing skill development
  • Quality assurance – Standardized outcome measurement in clinical quality programs

Administration Requirements

Clinician Training

  • Structured training: Requires formal training in HDRS administration and scoring
  • Interview skills: Clinical interviewing experience essential for reliable administration
  • Scoring guidelines: Detailed item-by-item scoring rules critical for consistency
  • Ongoing calibration: Regular rater training sessions to prevent drift over time

Assessment Protocol

  • Clinical interview: 15-20 minute structured clinical interview required
  • Time frame: Assessment covers previous week’s symptoms
  • Documentation: Detailed notes supporting each item score recommended
  • Quality control: Double-rating procedures for research applications

Research Design Considerations

  • Sample selection: Typically requires HDRS ≥20 for clinical trial entry
  • Power calculations: Well-established effect sizes available for study planning
  • Statistical analysis: Established conventions for response and remission definitions
  • Regulatory compliance: Meets international standards for depression clinical trials

Clinical Decision Support

  • Baseline assessment: Establish severity and symptom profile before treatment
  • Treatment monitoring: Regular assessment to track therapeutic response
  • Dose optimization: Objective data supporting medication adjustment decisions
  • Endpoint determination: Clear criteria for treatment success or modification

Training and Implementation

Rater Certification

  • Initial training: Comprehensive didactic and practical training program
  • Reliability testing: Achieve specified inter-rater reliability standards
  • Ongoing monitoring: Regular reliability assessments throughout study conduct
  • Refresher training: Periodic retraining to maintain scoring consistency

Quality Assurance

  • Double rating: Independent assessments for quality control
  • Consensus procedures: Systematic resolution of rating discrepancies
  • Data monitoring: Statistical detection of rater drift or bias
  • Protocol adherence: Standardized procedures for consistent administration

Limitations and Considerations

  • Training requirements – Extensive clinician training needed for reliable administration
  • Time intensive – 15-20 minute administration limits practical clinical use
  • Somatic emphasis – May overestimate depression severity in medically ill patients
  • Item limitations – Some items show poor psychometric properties requiring careful interpretation

Alternative Versions and Modifications

Shortened Versions

  • HDRS-7: Seven-item version for increased efficiency while maintaining validity
  • Maier subscale: Six core depression items extracted from HDRS-17
  • Bech subscale: Six-item severity scale focused on core symptoms

Structured Versions

  • SIGH-D: Structured Interview Guide improving reliability and standardization
  • SIGH-SAD: Extended version including atypical depression symptoms
  • GRID-HAMD: Enhanced version with improved anchor points and training materials

Import & Customize Testable Template

► Import scale to your Testable account – Add this scale. Modify instructions, edit questions, adjust presentation. Test anyone (including yourself)

► Try Testable version – View the full implementation of this scale in Testable.

► View detailed implementation guide in Testable – Step by step instructions for complete customization.

► Browse other tests and scales in Testable Library – The largest collection of ready-made psychological tests and scales.

Copyright and Usage Responsibility: Check that you have the proper rights and permissions to use this assessment tool in your research. This may include purchasing appropriate licenses, obtaining permissions from authors/copyright holders, or ensuring your usage falls within fair use guidelines.

The Hamilton Depression Rating Scale was developed by Max Hamilton and first published in 1960. The scale appears to be in the public domain based on its widespread use in clinical research and lack of licensing restrictions. However, users should verify copyright status with legal counsel for commercial applications.

Proper Attribution: When using or referencing this scale, cite the original development:

  • Hamilton, M. (1960). A rating scale for depression. Journal of Neurology, Neurosurgery and Psychiatry, 23, 56-62.

Structured versions: The Structured Interview Guide for the Hamilton Depression Rating Scale (SIGH-D) and other modifications (such as GRID-HAMD) may have separate copyright considerations. Users should verify licensing requirements for specific structured versions.

Training materials: Various training programs and certification courses are available through academic institutions, pharmaceutical companies, and professional organizations. Commercial training materials may require separate licensing agreements.

References

Original Development:

  • Hamilton, M. (1960). A rating scale for depression. Journal of Neurology, Neurosurgery and Psychiatry, 23, 56-62.

Comprehensive Psychometric Review:

  • Bagby, R. M., Ryder, A. G., Schuller, D. R., and Marshall, M. B. (2004). The Hamilton Depression Rating Scale: Has the gold standard become a lead weight? American Journal of Psychiatry, 161(12), 2163-2177.

Reliability Meta-Analysis:

  • Morriss, R., et al. (2011). Reliability of the Hamilton Rating Scale for Depression: A meta-analysis over a period of 49 years. Journal of Psychiatric Research, 45(9), 1131-1137.

Structured Interview Development:

  • Williams, J. B. W. (1988). A structured interview guide for the Hamilton Depression Rating Scale. Archives of General Psychiatry, 45(8), 742-747.

Historical Analysis:

  • Healy, D. (2013). The Hamilton Rating Scale for Depression: The making of a “gold standard” and the unmaking of a chronic illness, 1960–1980. Chronic Illness, 9(3), 202-219.
Illustration of a sad black dog with downcast eyes sitting alone under a dark gray rain cloud with blue raindrops falling, casting a shadow on the ground, with the Testable logo and text "HDRS Hamilton Depression Rating Scale"
A dejected black dog sitting in the rain — embodying sadness, despair, and clinical depression symptoms measured by the HDRS (Hamilton Depression Rating Scale)

Frequently Asked Questions

What does the HDRS/HAM-D measure?

The Hamilton Depression Rating Scale (HDRS/HAM-D) measures depression severity through clinician assessment across multiple domains including mood, cognitive functioning, somatic symptoms, anxiety components, and behavioral functioning. It evaluates 17-21 items depending on the version, providing a quantitative measure of depression severity primarily used in clinical trials and pharmaceutical research.

How long does the HDRS/HAM-D take to complete?

The HDRS/HAM-D takes approximately 15-20 minutes to administer. It requires a structured clinical interview conducted by a trained clinician who rates symptom severity based on patient responses and clinical observations during the assessment period covering the previous week's symptoms.

Is the HDRS/HAM-D free to use?

The original HDRS/HAM-D appears to be in the public domain based on widespread use without licensing restrictions. However, structured versions like SIGH-D and GRID-HAMD may have separate copyright considerations. Users should verify copyright status with legal counsel for commercial applications and cite the original 1960 Hamilton publication when using the scale.

How is the HDRS/HAM-D scored?

The HDRS-17 is scored by summing ratings across 17 items, with individual items using either 3-point (0-2) or 5-point (0-4) scales. Total scores range from 0-52, with severity interpretations: 0-7 (normal/remission), 8-13 (mild), 14-18 (moderate), 19-22 (severe), and ≥23 (very severe depression). Treatment response is defined as ≥50% reduction from baseline.

What's the difference between HDRS/HAM-D and BDI?

The HDRS/HAM-D is clinician-administered requiring trained clinical judgment and structured interview, while the Beck Depression Inventory (BDI) is a self-report questionnaire. The HDRS emphasizes observable signs and somatic symptoms, whereas the BDI focuses on subjective cognitive and emotional experiences. They correlate moderately (r=0.72-0.73) but serve different purposes—HDRS for clinical trials, BDI for screening and self-monitoring.

How reliable is the HDRS/HAM-D?

The HDRS/HAM-D shows variable reliability. Inter-rater reliability for total scores generally exceeds ICC=0.85, but internal consistency varies widely (α=0.46-0.92, pooled α=0.79). Individual item reliability is poor, and factor structure is inconsistent across studies. Despite psychometric limitations, it remains widely used due to historical precedent, regulatory acceptance, and extensive evidence base in pharmaceutical research.
Last Updated: