The NBER working paper “How People Use ChatGPT” (September, 2025) provides unprecedented insights into how ChatGPT is actually used globally.

Citation: Chatterji, A., Cunningham, T., Deming, D. J., Hitzig, Z., Ong, C., Shan, C. Y., & Wadman, K. (2025). How people use ChatGPT (NBER Working Paper No. 34255). National Bureau of Economic Research. https://doi.org/10.3386/w34255

Here are some takeaways and their methodology:

Takeaways

Scale and Growth

ChatGPT achieved extraordinary adoption: By July 2025, it reached 700 million weekly active users (10% of the global adult population) and processed 2.5 billion daily messages. This represents the fastest technology diffusion in history, surpassing even the internet's early adoption rates.

Non-Work Usage Dominates

Personal use grew faster than work use: Non-work messages increased from 53% to 73% between June 2024 and June 2025. This challenges the dominant narrative that AI's primary economic impact comes through workplace productivity.

Consumer surplus is massive: The authors cite Collis and Brynjolfsson's estimate that US users would need $98 compensation to give up generative AI for a month, implying $97 billion in annual consumer surplus.

Three Primary Use Cases

Nearly 80% of usage falls into three categories:

Practical Guidance (29%): Personalized advice like custom workout plans, tutoring (10% of all messages are educational)
Seeking Information (24%): Factual queries similar to web search
Writing (24%): Most common work task, but two-thirds involves editing existing text rather than creating new content Programming is surprisingly small: Only 4.2% of ChatGPT messages involve coding, contradicting assumptions about AI's primary technical applications.

Decision Support vs Task Automation

"Asking" dominates over "Doing": 49% of messages seek advice/information ("Asking") versus 40% requesting task completion ("Doing"). Among work messages, Asking messages received higher quality ratings and grew faster.

Knowledge work applications: 81% of work-related messages involve "obtaining/interpreting information" or "making decisions and solving problems" - suggesting ChatGPT functions more as a research assistant than task automator.

Demographic Patterns

Gender gap closed dramatically: From 80% male users in early 2023 to 52% female users by June 2025, representing complete reversal of the gender disparity.

Youth dominance: 46% of messages come from users under 26, though this concentration has slightly decreased over time.

Global expansion in lower-income countries: Adoption grew fastest in countries with $10,000-40,000 GDP per capita, indicating AI access is democratizing globally.

Education and occupation matter for work use: Users with graduate degrees are 48% likely to send work-related messages versus 37% for those without bachelor's degrees. Professional occupations show 50-57% work usage versus 40% for non-professional roles.

Quality and Satisfaction

User satisfaction improved over time: "Good" interactions became 4x more common than "Bad" by July 2025, up from 3x in late 2024.

Writing tasks dominate professional use: 40% of work messages involve writing, with management/business users reaching 52%. This reflects writing as a universal white-collar skill.

Economic Implications

Home production impact equals workplace impact: The dominance of non-work usage suggests AI's economic value extends far beyond measured workplace productivity into unmeasured household efficiency and welfare.

Decision-making enhancement in knowledge work: The prevalence of information-seeking and advisory use cases indicates ChatGPT primarily augments human judgment rather than replacing human tasks, particularly valuable in knowledge-intensive occupations where better decisions drive productivity.

These findings fundamentally reframe how we should think about AI's economic impact - less about job displacement and automation, more about enhanced decision-making and consumer welfare across both work and personal contexts.

Methodology

The researchers used privacy-preserving methodology to analyze ChatGPT usage data. Here's how they did it:

Three Primary Datasets

1. Growth Dataset

Source: All consumer ChatGPT plans (Free, Plus, Pro) from November 2022 to September 2025
Content: Daily message counts, user metadata (country, age in 5-7 year buckets, subscription type)
Exclusions: Business/Enterprise users, under-18 users, opted-out users 2. Classified Messages Dataset
Sample size: ~1.1 million randomly selected conversations from May 2024-June 2025
Method: One message per conversation, weighted to reflect total daily volume
Additional subset: 130,000 users with up to 6 messages each for detailed analysis 3. Employment Dataset
Source: External vendor with publicly available employment data
Size: ~130,000 users matched to occupation/education categories
Restrictions: Aggregated data only, minimum 100 users per category

Privacy-Preserving Classification Pipeline

The critical innovation: No human researcher ever saw actual user messages. Instead, they used automated LLM classifiers:

Step 1: PII Removal

Messages first processed through "Privacy Filter" tool to remove personally identifiable information Step 2: Automated Classification
Five different LLM-based classifiers applied to each message:
- Work vs. Non-work
- Asking/Doing/Expressing intent
- Conversation topics (24 categories)
- O*NET work activities (332 categories)
- Interaction quality

Step 3: Context Inclusion

Classifiers considered up to 10 previous messages for context
Messages truncated to 5,000 characters maximum
Used GPT-5-mini for most classifications, GPT-5 for interaction quality

Data Clean Room for Employment Analysis

Secure separation: Employment data held by external vendor, never directly accessed by researchers

Query restrictions:

Only pre-approved aggregate queries allowed
Minimum 100 users required for any reported category
Committee approval required for each analysis Privacy controls: Individual records never visible, only statistical summaries above threshold

Validation Methodology

Human validation: Researchers validated classifiers using WildChat dataset (publicly available third-party chatbot conversations)

Agreement metrics:

Work classification: 83% model-human agreement (κ=0.83)
Intent classification: 74% agreement (κ=0.74)
Topic classification: 56% agreement (κ=0.56) Quality checks: Cross-referenced automated sentiment analysis with actual user thumbs-up/down feedback on 60,000 interactions

Sampling and Weighting

Representative sampling: Messages weighted to reflect actual daily volume patterns, not just random selection

Exclusion criteria consistently applied:

Users who opted out of training data sharing
Self-reported under-18 users
Deleted conversations and banned accounts
Logged-out users (for consistency, though available from March 2025)

Methodological Strengths and Limitations

Strengths:

Unprecedented scale and access to actual usage data
Strong privacy protections exceeding academic standards
Multiple validation approaches
Global, representative sample Limitations:
Consumer plans only (excludes enterprise users who might have different patterns)
Classification accuracy varies by task
External employment matching limited to subset of users
Potential biases in automated classification, though validated against human judgment This methodology represents a significant advance in studying digital behavior while maintaining privacy. The automated classification approach could become a template for analyzing sensitive user data in other contexts.

Takeaways from the NBER working paper on ChatGPT use (Sept 2025)