--- title: "Working with AI: Measuring the Applicability of Generative AI to Occupations" type: source tags: [ai-adoption, labor-economics, onet, microsoft, copilot, applicability-score, user-goal-ai-action] authors: [Tomlinson, Kiran; Jaffe, Sonia; Wang, Will; Counts, Scott; Suri, Siddharth] year: 2025 venue: "arXiv:2507.07935 (cs.AI); Microsoft Research" kind: paper raw_path: "raw/AI Capabilities & Adoption/Working with AI (2025).pdf" sources: [] key_claims: - "Analyses 200,000 anonymised Bing Copilot conversations; classifies each into O*NET Intermediate Work Activities (IWAs) for both user goal and AI action separately." - "Introduces user-goal vs. AI-action split: user goal = what the human is trying to do; AI action = what the AI performs in service; distinguishes assistance from delegation." - "User goal and AI action IWA sets are disjoint in 40% of conversations; AI performs on average 2 IWAs per 1 IWA matched to the user goal." - "Top user goals cluster in four categories: learning, communicating, teaching/explaining, writing — overwhelmingly information work." - "AI plays a service role (Provide, Explain, Teach, Assist, Respond are the top AI-action verbs)." - "AI applicability score per occupation combines coverage × completion × scope; top occupations include Interpreters and Translators (0.492), Historians (0.462), Writers and Authors (0.454), Sales Reps of Services (0.449), CNC Tool Programmers (0.419)." - "Bottom occupations are physical/manual: pile driver operators, dredge operators, foundry mold makers, roofers." - "By SOC major group: Computer & Mathematical and Sales & Related lead (0.29 score); Healthcare Support and Farming lowest." - "Image generation and data analysis have notably lower completion × scope than text-based assistance tasks." - "Methodology contrasts with Handa et al. 2025 by classifying each conversation into all matching IWAs rather than a single O*NET task, enabling cross-occupation capability aggregation." created: 2026-04-20 updated: 2026-04-20 --- # Working with AI: Measuring the Applicability of Generative AI to Occupations ## Summary Tomlinson, Jaffe, Wang, Counts, and Suri (Microsoft Research, 2025) construct an empirical **AI applicability score per occupation** based on 200,000 anonymised Bing Copilot conversations, mapped to the O*NET Intermediate Work Activity (IWA) taxonomy. The study is the Microsoft-side complement to Anthropic's Economic Index paper ([[sources/2025-handa-which-economic-tasks-ai]]). **Method.** An LLM-based pipeline classifies each conversation into *all* applicable O*NET IWAs along two axes simultaneously: - **User goal** — what the human is trying to accomplish. - **AI action** — what the AI does in service of that goal. This two-sided classification is the paper's distinctive methodological move. Example: a user trying to learn how to print a document has a user-goal IWA of *operate office equipment*; the AI-action IWA is *train others to use equipment*. The 40% disjointness between user-goal and AI-action IWA sets (96% have more unique-to-one-side IWAs than common) shows that AI often performs a different activity than the one the user is trying to do. **Success metrics.** Three orthogonal measures per conversation: completion (LLM judgement validated by thumbs up/down), user thumbs feedback, and scope (6-point ordinal scale for how broad an IWA the AI can assist with / perform). AI applicability score aggregates coverage × completion × scope, split further by user-goal side and AI-action side. **Findings.** - Most frequent work activities are *information work* — creation, processing, and communication of information. GWAs most over-represented vs. total US workforce: Getting Information, Updating and Using Knowledge, Communicate with External People, Interpreting Information for Others. - Highest-scoring occupations: interpreters/translators (0.492), historians (0.462), writers/authors (0.454), sales reps of services (0.449), CNC tool programmers (0.419), customer service reps (0.408). Lowest: pile-driver operators, dredge operators, foundry mold makers, roofers (all ~0). - Major-group ranking: Computer & Mathematical (0.29), Sales & Related (0.29), Office & Admin Support (0.26), Community & Social Service (0.25); physical trades and healthcare support at the bottom. - Image generation and data analysis have notably lower completion × scope than text-based information tasks. - Distinctive patterns: over-assisted activities (purchase goods, execute financial transactions, perform athletic activities) vs. over-performed activities (train others, coach others, teach subjects). Useful for predicting which occupations may *shift focus* (AI takes over teaching/training subtasks) vs. which will be primarily *assisted*. **Contribution relative to Handa et al.** Handa et al. map each conversation to one O*NET task (occupation-specific). Tomlinson et al. map each conversation to all relevant IWAs (cross-occupation). This enables occupation-applicability scoring even when no user explicitly comes from that occupation — the AI's demonstrated capability transfers across occupations that share IWAs. ## Connections - Microsoft-platform complement to Anthropic-platform [[sources/2025-handa-which-economic-tasks-ai]]; together they anchor [[concepts/ai-adoption]] at the bottom-up usage level. - Enterprise-leader-side complement to [[sources/2025-korst-wharton-gen-ai-enterprise-adoption]]. - Info-work dominance + image/data-analysis weakness contextualises the methodology of [[sources/2024-xu-the-agent-company-benchmark]] and the productivity surprise in [[sources/2025-becker-metr-ai-developer-productivity]]. - User-goal / AI-action split is relevant to [[concepts/agentic-bpm]] agent-role framing and [[concepts/agent-process-observability]] trajectory analysis. - New entities: [[entities/kiran-tomlinson]], [[entities/sonia-jaffe]].