Business Insight  |  September 15, 2025

Takeaways from the NBER working paper on ChatGPT use (Sept 2025)

The NBER working paper “How People Use ChatGPT” (September, 2025) provides unprecedented insights into how ChatGPT is actually used globally.

Citation: Chatterji, A., Cunningham, T., Deming, D. J., Hitzig, Z., Ong, C., Shan, C. Y., & Wadman, K. (2025). How people use ChatGPT (NBER Working Paper No. 34255). National Bureau of Economic Research. https://doi.org/10.3386/w34255

Here are some takeaways and their methodology:

Takeaways

Scale and Growth

ChatGPT achieved extraordinary adoption: By July 2025, it reached 700 million weekly active users (10% of the global adult population) and processed 2.5 billion daily messages. This represents the fastest technology diffusion in history, surpassing even the internet's early adoption rates.

Non-Work Usage Dominates

Personal use grew faster than work use: Non-work messages increased from 53% to 73% between June 2024 and June 2025. This challenges the dominant narrative that AI's primary economic impact comes through workplace productivity.

Consumer surplus is massive: The authors cite Collis and Brynjolfsson's estimate that US users would need $98 compensation to give up generative AI for a month, implying $97 billion in annual consumer surplus.

Three Primary Use Cases

Nearly 80% of usage falls into three categories:

  1. Practical Guidance (29%): Personalized advice like custom workout plans, tutoring (10% of all messages are educational)
  2. Seeking Information (24%): Factual queries similar to web search
  3. Writing (24%): Most common work task, but two-thirds involves editing existing text rather than creating new content Programming is surprisingly small: Only 4.2% of ChatGPT messages involve coding, contradicting assumptions about AI's primary technical applications.

Decision Support vs Task Automation

"Asking" dominates over "Doing": 49% of messages seek advice/information ("Asking") versus 40% requesting task completion ("Doing"). Among work messages, Asking messages received higher quality ratings and grew faster.

Knowledge work applications: 81% of work-related messages involve "obtaining/interpreting information" or "making decisions and solving problems" - suggesting ChatGPT functions more as a research assistant than task automator.

Demographic Patterns

Gender gap closed dramatically: From 80% male users in early 2023 to 52% female users by June 2025, representing complete reversal of the gender disparity.

Youth dominance: 46% of messages come from users under 26, though this concentration has slightly decreased over time.

Global expansion in lower-income countries: Adoption grew fastest in countries with $10,000-40,000 GDP per capita, indicating AI access is democratizing globally.

Education and occupation matter for work use: Users with graduate degrees are 48% likely to send work-related messages versus 37% for those without bachelor's degrees. Professional occupations show 50-57% work usage versus 40% for non-professional roles.

Quality and Satisfaction

User satisfaction improved over time: "Good" interactions became 4x more common than "Bad" by July 2025, up from 3x in late 2024.

Writing tasks dominate professional use: 40% of work messages involve writing, with management/business users reaching 52%. This reflects writing as a universal white-collar skill.

Economic Implications

Home production impact equals workplace impact: The dominance of non-work usage suggests AI's economic value extends far beyond measured workplace productivity into unmeasured household efficiency and welfare.

Decision-making enhancement in knowledge work: The prevalence of information-seeking and advisory use cases indicates ChatGPT primarily augments human judgment rather than replacing human tasks, particularly valuable in knowledge-intensive occupations where better decisions drive productivity.

These findings fundamentally reframe how we should think about AI's economic impact - less about job displacement and automation, more about enhanced decision-making and consumer welfare across both work and personal contexts.

Methodology

The researchers used privacy-preserving methodology to analyze ChatGPT usage data. Here's how they did it:

Three Primary Datasets

1. Growth Dataset

  • Source: All consumer ChatGPT plans (Free, Plus, Pro) from November 2022 to September 2025

  • Content: Daily message counts, user metadata (country, age in 5-7 year buckets, subscription type)

  • Exclusions: Business/Enterprise users, under-18 users, opted-out users 2. Classified Messages Dataset

  • Sample size: ~1.1 million randomly selected conversations from May 2024-June 2025

  • Method: One message per conversation, weighted to reflect total daily volume

  • Additional subset: 130,000 users with up to 6 messages each for detailed analysis 3. Employment Dataset

  • Source: External vendor with publicly available employment data

  • Size: ~130,000 users matched to occupation/education categories

  • Restrictions: Aggregated data only, minimum 100 users per category

Privacy-Preserving Classification Pipeline

The critical innovation: No human researcher ever saw actual user messages. Instead, they used automated LLM classifiers:

Step 1: PII Removal

  • Messages first processed through "Privacy Filter" tool to remove personally identifiable information Step 2: Automated Classification

  • Five different LLM-based classifiers applied to each message:

    • Work vs. Non-work
    • Asking/Doing/Expressing intent
    • Conversation topics (24 categories)
    • O*NET work activities (332 categories)
    • Interaction quality

Step 3: Context Inclusion

  • Classifiers considered up to 10 previous messages for context
  • Messages truncated to 5,000 characters maximum
  • Used GPT-5-mini for most classifications, GPT-5 for interaction quality

Data Clean Room for Employment Analysis

Secure separation: Employment data held by external vendor, never directly accessed by researchers

Query restrictions:

  • Only pre-approved aggregate queries allowed
  • Minimum 100 users required for any reported category
  • Committee approval required for each analysis Privacy controls: Individual records never visible, only statistical summaries above threshold

Validation Methodology

Human validation: Researchers validated classifiers using WildChat dataset (publicly available third-party chatbot conversations)

Agreement metrics:

  • Work classification: 83% model-human agreement (κ=0.83)
  • Intent classification: 74% agreement (κ=0.74)
  • Topic classification: 56% agreement (κ=0.56) Quality checks: Cross-referenced automated sentiment analysis with actual user thumbs-up/down feedback on 60,000 interactions

Sampling and Weighting

Representative sampling: Messages weighted to reflect actual daily volume patterns, not just random selection

Exclusion criteria consistently applied:

  • Users who opted out of training data sharing
  • Self-reported under-18 users
  • Deleted conversations and banned accounts
  • Logged-out users (for consistency, though available from March 2025)

Methodological Strengths and Limitations

Strengths:

  • Unprecedented scale and access to actual usage data

  • Strong privacy protections exceeding academic standards

  • Multiple validation approaches

  • Global, representative sample Limitations:

  • Consumer plans only (excludes enterprise users who might have different patterns)

  • Classification accuracy varies by task

  • External employment matching limited to subset of users

  • Potential biases in automated classification, though validated against human judgment This methodology represents a significant advance in studying digital behavior while maintaining privacy. The automated classification approach could become a template for analyzing sensitive user data in other contexts.