---
title: "Working with AI: Measuring the Applicability of Generative AI to Occupations"
type: source
tags: [ai-adoption, labor-economics, onet, microsoft, copilot, applicability-score, user-goal-ai-action]
authors: [Tomlinson, Kiran; Jaffe, Sonia; Wang, Will; Counts, Scott; Suri, Siddharth]
year: 2025
venue: "arXiv:2507.07935 (cs.AI); Microsoft Research"
kind: paper
raw_path: "raw/AI Capabilities & Adoption/Working with AI (2025).pdf"
sources: []
key_claims:
  - "Analyses 200,000 anonymised Bing Copilot conversations; classifies each into O*NET Intermediate Work Activities (IWAs) for both user goal and AI action separately."
  - "Introduces user-goal vs. AI-action split: user goal = what the human is trying to do; AI action = what the AI performs in service; distinguishes assistance from delegation."
  - "User goal and AI action IWA sets are disjoint in 40% of conversations; AI performs on average 2 IWAs per 1 IWA matched to the user goal."
  - "Top user goals cluster in four categories: learning, communicating, teaching/explaining, writing — overwhelmingly information work."
  - "AI plays a service role (Provide, Explain, Teach, Assist, Respond are the top AI-action verbs)."
  - "AI applicability score per occupation combines coverage × completion × scope; top occupations include Interpreters and Translators (0.492), Historians (0.462), Writers and Authors (0.454), Sales Reps of Services (0.449), CNC Tool Programmers (0.419)."
  - "Bottom occupations are physical/manual: pile driver operators, dredge operators, foundry mold makers, roofers."
  - "By SOC major group: Computer & Mathematical and Sales & Related lead (0.29 score); Healthcare Support and Farming lowest."
  - "Image generation and data analysis have notably lower completion × scope than text-based assistance tasks."
  - "Methodology contrasts with Handa et al. 2025 by classifying each conversation into all matching IWAs rather than a single O*NET task, enabling cross-occupation capability aggregation."
created: 2026-04-20
updated: 2026-04-20
---

# Working with AI: Measuring the Applicability of Generative AI to Occupations

## Summary
Tomlinson, Jaffe, Wang, Counts, and Suri (Microsoft Research, 2025) construct an empirical **AI applicability score per occupation** based on 200,000 anonymised Bing Copilot conversations, mapped to the O*NET Intermediate Work Activity (IWA) taxonomy. The study is the Microsoft-side complement to Anthropic's Economic Index paper ([[sources/2025-handa-which-economic-tasks-ai]]).

**Method.** An LLM-based pipeline classifies each conversation into *all* applicable O*NET IWAs along two axes simultaneously:
- **User goal** — what the human is trying to accomplish.
- **AI action** — what the AI does in service of that goal.

This two-sided classification is the paper's distinctive methodological move. Example: a user trying to learn how to print a document has a user-goal IWA of *operate office equipment*; the AI-action IWA is *train others to use equipment*. The 40% disjointness between user-goal and AI-action IWA sets (96% have more unique-to-one-side IWAs than common) shows that AI often performs a different activity than the one the user is trying to do.

**Success metrics.** Three orthogonal measures per conversation: completion (LLM judgement validated by thumbs up/down), user thumbs feedback, and scope (6-point ordinal scale for how broad an IWA the AI can assist with / perform). AI applicability score aggregates coverage × completion × scope, split further by user-goal side and AI-action side.

**Findings.**
- Most frequent work activities are *information work* — creation, processing, and communication of information. GWAs most over-represented vs. total US workforce: Getting Information, Updating and Using Knowledge, Communicate with External People, Interpreting Information for Others.
- Highest-scoring occupations: interpreters/translators (0.492), historians (0.462), writers/authors (0.454), sales reps of services (0.449), CNC tool programmers (0.419), customer service reps (0.408). Lowest: pile-driver operators, dredge operators, foundry mold makers, roofers (all ~0).
- Major-group ranking: Computer & Mathematical (0.29), Sales & Related (0.29), Office & Admin Support (0.26), Community & Social Service (0.25); physical trades and healthcare support at the bottom.
- Image generation and data analysis have notably lower completion × scope than text-based information tasks.
- Distinctive patterns: over-assisted activities (purchase goods, execute financial transactions, perform athletic activities) vs. over-performed activities (train others, coach others, teach subjects). Useful for predicting which occupations may *shift focus* (AI takes over teaching/training subtasks) vs. which will be primarily *assisted*.

**Contribution relative to Handa et al.** Handa et al. map each conversation to one O*NET task (occupation-specific). Tomlinson et al. map each conversation to all relevant IWAs (cross-occupation). This enables occupation-applicability scoring even when no user explicitly comes from that occupation — the AI's demonstrated capability transfers across occupations that share IWAs.

## Connections
- Microsoft-platform complement to Anthropic-platform [[sources/2025-handa-which-economic-tasks-ai]]; together they anchor [[concepts/ai-adoption]] at the bottom-up usage level.
- Enterprise-leader-side complement to [[sources/2025-korst-wharton-gen-ai-enterprise-adoption]].
- Info-work dominance + image/data-analysis weakness contextualises the methodology of [[sources/2024-xu-the-agent-company-benchmark]] and the productivity surprise in [[sources/2025-becker-metr-ai-developer-productivity]].
- User-goal / AI-action split is relevant to [[concepts/agentic-bpm]] agent-role framing and [[concepts/agent-process-observability]] trajectory analysis.
- New entities: [[entities/kiran-tomlinson]], [[entities/sonia-jaffe]].