BFI-S: Big Five Inventory-Short Form (15 items)

Reviewed by: Constantin Rezlescu | Associate Professor | UCL Psychology

TL;DR

The BFI-S is a short form of the Big Five Inventory, developed by Gerlitz and Schupp (2005) for the German Socio-Economic Panel (SOEP) so that the five broad personality dimensions could be measured in large surveys where questionnaire time is scarce.
Independent evaluation by Hahn, Gottschling, and Spinath (2012) found a replicable five-factor structure, good convergent and discriminant validity against the much longer NEO-PI-R, and substantial stability over an 18-month interval.
Internal consistencies are modest, which is expected when a very short scale must cover a broad construct; short-scale methodology treats this as a design trade-off to be matched to the assessment setting rather than a defect (Ziegler et al., 2014).
The BFI-S suits survey, longitudinal, and large-scale population research in which personality is one variable among many; it should not be used for individual clinical assessment or when facet-level detail is required.

At a Glance

Items	15 (3 per Big Five dimension)
Administration time	Under 5 minutes (Hahn et al., 2012)
Response format	7-point scale, 1 = does not apply to me at all, 7 = applies to me perfectly; the English version is anchored 1 = strongly disagree, 7 = strongly agree (Lang et al., 2011)
Scores	Five dimension scores (Extraversion, Agreeableness, Conscientiousness, Neuroticism, Openness to Experience), each the mean of its 3 items, range 1-7; four items reverse-keyed
Suitable ages	16 and older
License	Free for research and educational use; no licensing fees
Original citation	Gerlitz & Schupp (2005), DIW Research Notes 4

Introduction

The Big Five Inventory-Short Form (BFI-S) is a 15-item abbreviated version of the Big Five Inventory, designed to provide efficient assessment of the five major personality dimensions. It was developed by Gerlitz and Schupp (2005) for the German Socio-Economic Panel (SOEP) from the original 44-item BFI, and its psychometric properties have been evaluated independently by Hahn, Gottschling and Spinath (2012). The short form retains acceptable psychometric properties for research use while dramatically reducing administration time, making it useful where brief personality assessment is needed but ultra-brief measures sacrifice too much reliability.

The BFI-S addresses a common dilemma in personality research: comprehensive personality inventories like the NEO-PI-3 (240 items) provide detailed, reliable assessment but are impractical for many research applications, while ultra-brief measures like the TIPI (10 items) are quick but suffer from lower reliability. The BFI-S occupies the middle ground with three items per dimension.

Understanding Brief Big Five Assessment

Brief personality measures trade off comprehensiveness against efficiency. Ziegler, Kemper and Kruyen (2014) argue that the right question is not whether a short scale is good or bad, but whether its psychometric profile matches the assessment setting. The BFI-S sits between ultra-brief and full-length inventories:

Compared to ultra-brief measures (TIPI, 10 items):

Three items per dimension rather than two (50% more items per dimension)
Adequate for group-level research analyses

Compared to comprehensive measures (NEO-PI-3, 240 items):

Roughly 94% fewer items (15 vs 240)
Minimal participant burden and fatigue
Domain-level Big Five coverage suitable for large panel studies where longer inventories cannot be fielded (Hahn et al., 2012)
Practical for repeated measurement designs

Compared to medium-length measures (NEO-FFI, 60 items):

75% fewer items (15 vs 60)
Better suited for large-scale surveys and panel studies
Feasible where time constraints rule out the NEO-PI-R or NEO-FFI (Hahn et al., 2012)

This positioning makes the BFI-S particularly valuable for longitudinal research requiring multiple personality assessments and for large-scale surveys where personality is one of several measured constructs.

Theoretical Foundation

The BFI-S is based on the Big Five model of personality, which organizes personality traits into five broad dimensions: Extraversion, Agreeableness, Conscientiousness, Neuroticism (Emotional Stability), and Openness to Experience. This model emerged from decades of lexical research and is a widely used framework for describing personality.

Item selection: the 15 items were selected and constructed by Gerlitz and Schupp (2005) from the 44-item BFI for use in the German Socio-Economic Panel; the item-selection and construction procedure is documented in that source (Hahn et al., 2012). In an independent German sample, all 15 items loaded substantially on their intended factor (mean loading .74) with low secondary loadings (Hahn et al., 2012).

Hierarchical trait structure: with 3 items per dimension, the BFI-S measures personality at the broad domain level rather than attempting to assess specific facets within each dimension. This is an appropriate design choice—3 items cannot adequately measure 6 facets per domain (as in the NEO-PI-3), but they can assess the overarching personality dimension. The BFI-S thus provides domain-level personality description suitable for research examining broad personality effects, personality as a control variable, or personality profiles at the group level. It should not be used when detailed facet-level assessment is required or when individual clinical assessment is needed.

📊 Key insight: The BFI-S trades facet-level detail for brevity. Whether that trade-off is acceptable depends on the assessment setting rather than on the scale being inherently good or bad (cf. Ziegler et al., 2014).

Key Features

Assessment Characteristics

15 items total (3 items per Big Five dimension)
Under 5 minutes administration time (Hahn et al., 2012)
7-point Likert scale (1 = does not apply at all to 7 = applies perfectly)
Ages 16 and older
Domain-level assessment of broad personality dimensions
Free to use for research and educational purposes

Big Five Dimensions Assessed

Extraversion – Sociability, assertiveness, energy level
Agreeableness – Cooperation, compassion, trust
Conscientiousness – Organization, reliability, achievement orientation
Neuroticism – Emotional stability vs. anxiety and negative affect
Openness to Experience – Intellectual curiosity, creativity, aesthetic appreciation

Versions & Adaptations

German original, developed for the German Socio-Economic Panel and fielded there from 2005 (Gerlitz & Schupp, 2005)
English version, evaluated across survey methods by Lang et al. (2011); the BFI-S was also fielded in the British Household Panel Survey in 2005 (Hahn et al., 2012)
No other published forms or adaptations of the BFI-S were identified during verification

Research Applications

Survey research where personality is important but not primary focus
Longitudinal studies requiring repeated personality assessments
Large-scale population research with efficiency requirements
Organizational research on workplace personality and behavior
Educational research examining personality and academic outcomes

View Testable Demo

► Click here to try the Testable implementation

Assess your personality across the five fundamental dimensions in under 5 minutes.

Scoring and Interpretation

Response Format

Participants rate each statement on a 7-point scale anchored at the endpoints: 1 = does not apply to me at all, 7 = applies to me perfectly (Hahn et al., 2012). The English version used by Lang et al. (2011) anchors the same 7-point scale from 1 (strongly disagree) to 7 (strongly agree).

Complete BFI-S Items

Instructions (SOEP instruction, translated): “Here are a number of characteristics that a person may have… Please answer using the following scale. 1 means ‘does not apply to me at all’, 7 means ‘applies to me perfectly’.”

“I see myself as someone who…”

Extraversion (3 items):

…is talkative
…is reserved (R)
…is outgoing, sociable

Agreeableness (3 items):

…is sometimes rude to others (R)
…has a forgiving nature
…is considerate and kind to almost everyone

Conscientiousness (3 items):

…does a thorough job
…tends to be lazy (R)
…does things efficiently

Neuroticism (3 items):

…worries a lot
…gets nervous easily
…is relaxed, handles stress well (R)

Openness to Experience (3 items):

…is original, comes up with new ideas
…values artistic, aesthetic experiences
…has an active imagination

Scoring Procedure

Reverse score items 2, 4, 8, and 12 (the items marked R): reversed score = 8 − original score.
Compute each dimension score as the mean of its 3 items: Extraversion = (item 1 + item 2 reversed + item 3) ÷ 3; Agreeableness = (item 4 reversed + item 5 + item 6) ÷ 3; Conscientiousness = (item 7 + item 8 reversed + item 9) ÷ 3; Neuroticism = (item 10 + item 11 + item 12 reversed) ÷ 3; Openness = (item 13 + item 14 + item 15) ÷ 3.
Each dimension score ranges from 1.0 to 7.0, with higher scores indicating a stronger presence of the trait.

Score Interpretation

Scale Range: 1.0 – 7.0 for each dimension

No published cutoffs or interpretation bands exist for the BFI-S; interpret scores relative to your sample. No dimension-level normative table is available from the verified sources either: Hahn et al. (2012) report only item-level descriptive statistics, so no norms table is reproduced here.

Interpretation Guidelines

Appropriate uses:

Domain-level personality description for research
Personality profiles at group level
Control variables in multivariate research
Preliminary screening for comprehensive assessment

Interpretation considerations:

Focus on broad trait categories rather than specific facets
Consider measurement error with only 3 items per dimension
Use for group-level analyses more than individual assessment (Ziegler et al., 2014)
Supplement with detailed measures when clinical precision needed

Research Evidence and Psychometric Properties

Reliability Evidence

Internal consistency (Cronbach’s alpha; German validation sample, N = 598 adults; Hahn et al., 2012):

Extraversion: α = .76 (Hahn et al., 2012)
Neuroticism: α = .66 (Hahn et al., 2012)
Conscientiousness: α = .60 (Hahn et al., 2012)
Openness: α = .58 (Hahn et al., 2012)
Agreeableness: α = .44 (Hahn et al., 2012)
These modest values are expected for 3-item scales assessing broad constructs, where high inter-item correlations are not achievable; test-retest reliability and construct-representation criteria are more informative than alpha for short scales (Ziegler et al., 2014)

Test-retest stability (18-month interval; subsample N = 321; Hahn et al., 2012):

Extraversion .80, Neuroticism .74, Openness .72, Conscientiousness .67, Agreeableness .57; range .57-.80, mean .70 (Hahn et al., 2012)
Hahn et al. (2012) note this .70 average stability against the .80 average retest stability reported for NEO-PI-R domains and facets

Validity Evidence

Convergent validity with the NEO-PI-R (Hahn et al., 2012):

Uncorrected convergent correlations averaged .60, highest for Extraversion (.70) and lowest for Agreeableness (.50)
Corrected for attenuation, convergent correlations ranged from .75 (Agreeableness) to .86 (Extraversion and Neuroticism), averaging .82
Each BFI-S scale correlated significantly with all six facets of its corresponding NEO-PI-R domain, with the strongest and broadest coverage for Neuroticism, Extraversion and Conscientiousness (Hahn et al., 2012)

Factor structure:

Five-factor structure replicated: exploratory factor analysis extracted five factors explaining 62% of the total variance (Hahn et al., 2012)
Clean loadings: all items loaded substantially on their target scale (mean loading .74) with low secondary loadings (average .12, highest .32) (Hahn et al., 2012)

Discriminant validity:

BFI-S dimension intercorrelations ranged from .00 to .31 (mean absolute .12), substantially lower than the NEO-PI-R domains, indicating good discriminant validity (Hahn et al., 2012)

Criterion Validity

Psychological well-being: BFI-S scales correlate with life satisfaction in the expected directions—Neuroticism -.23, Extraversion .18, Conscientiousness .12 (all p < .01)—mirroring the NEO-PI-R pattern (-.43, .30, .25) and the meta-analytic findings of Steel et al. (2008) (Hahn et al., 2012)
Coping style: BFI-S scales showed the same pattern of associations with the Coping Inventory for Stressful Situations as the NEO-PI-R (e.g., Neuroticism with emotion-oriented coping; corrected r = .72), though somewhat weaker (Hahn et al., 2012)
Academic performance: across the broader Big Five literature, Conscientiousness predicts academic performance (Vedel, 2014); this is a general Big Five finding, not a BFI-S-specific comparison

Independent Validation

German validation: in 598 German adults, the BFI-S showed acceptable internal consistency, replicated five-factor structure, and convergent and discriminant validity against the NEO-PI-R (Hahn et al., 2012)
Large-scale panel use: the BFI-S was fielded in the German SOEP (from 2005) and the British Household Panel Survey (2005); strong correlations between the BFI-S and the full BFI scales have been reported in international internet data (Hahn et al., 2012)

Comparative Performance

vs. TIPI (10 items):

Three items per dimension rather than two

vs. NEO-FFI (60 items):

75% fewer items (15 vs 60); the NEO-FFI is too long to field in large panel studies, whereas the BFI-S is not (Hahn et al., 2012)

vs. Original BFI (44 items):

66% fewer items (15 vs 44); strong correlations between the BFI-S and the full BFI scales have been reported (Hahn et al., 2012)
Acceptable trade-off for research requiring efficiency

Usage Guidelines and Applications

Optimal Research Applications

When BFI-S is most appropriate:

Survey research where personality is secondary or control variable
Longitudinal studies requiring efficient repeated personality measurement
Large-scale population studies
Organizational and educational research on personality
Online studies where participant retention is a concern

Research Design Considerations

Sample size planning (generic guidance, not BFI-S-specific):

Larger samples help offset the lower reliability of short scales relative to longer measures
Structural equation modeling using the three items per dimension as manifest indicators can control for measurement error, which is particularly useful in longitudinal designs (Hahn et al., 2012)

Statistical considerations:

Report internal consistency for your specific sample
Consider measurement error in power analyses
Use appropriate statistical corrections for attenuation when possible
Focus on effect sizes and patterns rather than precise point estimates

Validation approaches:

Consider parallel administration with a longer Big Five measure in a subsample
Validate criterion relationships in your specific research context
Report correlations with relevant external criteria

Administration Guidelines

Best practices:

Provide clear instructions emphasizing honest self-reflection
Ensure a distraction-free environment for focused attention (under 5 minutes)
Collect relevant demographic variables for normative comparisons
Consider counterbalancing when used with other measures

Multiple assessment contexts:

Suitable for repeated measurement in longitudinal designs, with substantial 18-month stability (Hahn et al., 2012)
Less appropriate for detecting short-term state changes

When NOT to Use BFI-S

Inappropriate applications:

Individual clinical assessment or diagnosis
High-stakes decision-making (hiring, clinical placement, etc.); short scales are generally not recommended for individual decision-making (Ziegler et al., 2014)
When detailed facet-level personality information is needed
Situations requiring comprehensive personality profiling
Assessment in contexts where measurement precision is critical

Usage Recommendations

Reporting guidelines:

Always report internal consistency for your sample
Cite both the development source (Gerlitz & Schupp, 2005) and relevant validation studies (Hahn et al., 2012; Lang et al., 2011)
Acknowledge brevity trade-offs in the limitations section
Report both raw correlations and effect sizes

Combination strategies:

Supplement with detailed measures in subsamples for validation
Use for initial screening before comprehensive assessment
Combine with other brief measures for broader construct coverage

Limitations and Cautions

Domain-level only: cannot assess specific facets within each Big Five dimension (Hahn et al., 2012)
Modest reliability: lower than comprehensive measures, particularly for Agreeableness (α = .44) and Openness (α = .58) (Hahn et al., 2012)
Reduced precision: not suitable for individual clinical assessment
Content limitations: 3 items cannot capture the full breadth of each personality domain; the Openness and Agreeableness scales cover a narrower slice than the NEO-PI-R (Hahn et al., 2012)
Change sensitivity: less sensitive to short-term personality changes than longer measures

Import & Customize Testable Template

► Import scale to your Testable account – Add this scale. Modify instructions, edit questions, adjust presentation. Test anyone (including yourself)

► Try Testable version – View the full implementation of this scale in Testable.

► Browse other tests and scales in Testable Library – The largest collection of ready-made psychological tests and scales.

Legal and Copyright Information

Copyright and Usage Responsibility: Check that you have the proper rights and permissions to use this assessment tool in your research. This may include purchasing appropriate licenses, obtaining permissions from authors/copyright holders, or ensuring your usage falls within fair use guidelines.

The BFI-S is freely available for research and educational purposes. The measure was developed from the freely available (for research) Big Five Inventory and is available without licensing fees for non-commercial academic research.

Proper Attribution: When using or referencing this scale, cite the development source and the psychometric evaluations:

Gerlitz, J.-Y., & Schupp, J. (2005). Zur Erhebung der Big-Five-basierten Persönlichkeitsmerkmale im SOEP (DIW Research Notes 4). Berlin: DIW.

Lang, F. R., John, D., Lüdtke, O., Schupp, J., & Wagner, G. G. (2011). Short assessment of the Big Five: Robust across survey methods except telephone interviewing. Behavior Research Methods, 43(2), 548-567.

External Links and Resources

Big Five Personality Traits – Wikipedia

International Personality Item Pool

References

Development:

Gerlitz, J.-Y., & Schupp, J. (2005). Zur Erhebung der Big-Five-basierten Persönlichkeitsmerkmale im SOEP. DIW Research Notes 4. Berlin: DIW. (no DOI)

Psychometric Evaluation and Validation:

Lang, F. R., John, D., Lüdtke, O., Schupp, J., & Wagner, G. G. (2011). Short assessment of the Big Five: Robust across survey methods except telephone interviewing. Behavior Research Methods, 43(2), 548-567. https://doi.org/10.3758/s13428-011-0066-z
Hahn, E., Gottschling, J., & Spinath, F. M. (2012). Short measurements of personality – Validity and reliability of the GSOEP Big Five Inventory (BFI-S). Journal of Research in Personality, 46(3), 355-359. https://doi.org/10.1016/j.jrp.2012.03.008

Short-Scale Methodology:

Ziegler, M., Kemper, C. J., & Kruyen, P. (2014). Short scales – Five misunderstandings and ways to overcome them. Journal of Individual Differences, 35(4), 185-189. https://doi.org/10.1027/1614-0001/a000148

Related Big Five Research:

Steel, P., Schmidt, J., & Shultz, J. (2008). Refining the relationship between personality and subjective well-being. Psychological Bulletin, 134(1), 138-161. https://doi.org/10.1037/0033-2909.134.1.138
Vedel, A. (2014). The Big Five and tertiary academic performance: A systematic review and meta-analysis. Personality and Individual Differences, 71, 66-76. https://doi.org/10.1016/j.paid.2014.07.011

Related Assessments: TIPI: Ten-Item Personality Inventory; NEO-PI-3: Big Five Personality Inventory

A vibrant chameleon adapting its colors to represent the five core personality traits measured by the BFI-S (Big Five Inventory-Short)

Frequently Asked Questions

What does the BFI-S measure?

The BFI-S measures the five major personality dimensions: Extraversion (sociability, energy), Agreeableness (cooperation, compassion), Conscientiousness (organization, reliability), Neuroticism (emotional stability), and Openness to Experience (intellectual curiosity, creativity). It provides domain-level assessment of broad personality traits rather than specific facets.

Who developed the BFI-S, and how long does it take?

The BFI-S was developed by Gerlitz and Schupp (2005) for the German Socio-Economic Panel (SOEP), drawing its 15 items from the 44-item Big Five Inventory. It takes under 5 minutes to complete (Hahn et al., 2012), which is why it can be fielded in large panel studies that cannot accommodate longer inventories.

Is the BFI-S free to use?

Yes, the BFI-S is freely available for research and educational purposes without licensing fees. It was developed from the freely available (for research) Big Five Inventory. When using the measure in publications, cite Gerlitz and Schupp (2005) for the instrument's development and Lang et al. (2011) and Hahn et al. (2012) for its psychometric evaluation.

How is the BFI-S scored?

Reverse score items 2, 4, 8, and 12 (8 minus original score), then calculate each dimension score by averaging its 3 items. Scores range from 1.0 to 7.0, with higher scores indicating stronger trait presence. No published interpretation bands exist for the BFI-S, and no dimension-level normative table is available from the verified sources; interpret scores relative to your sample.

How reliable is the BFI-S?

Internal consistency is modest, with Cronbach's alpha ranging from .44 (Agreeableness) to .76 (Extraversion) in a German validation sample (Hahn et al., 2012), as expected for 3-item scales measuring broad constructs. Test-retest stability over 18 months ranged from .57 to .80 (mean .70). BFI-S scales converge with the NEO-PI-R (attenuation-corrected r = .75-.86, average .82) and show good discriminant validity.

How does the BFI-S compare with the TIPI and the NEO-FFI?

Against the 10-item TIPI, the BFI-S offers three items per dimension rather than two. Against the 60-item NEO-FFI, it has 75% fewer items and provides only domain-level rather than facet-level assessment, but it is short enough to field in large panel studies where the NEO-FFI and NEO-PI-R are impractical (Hahn et al., 2012).

Last Updated:

July 17, 2026