Which tasks to give AI? A four-question decision model

Eighty-eight percent of organizations now use AI in at least one business function, up from 78 percent a year earlier, according to McKinsey's State of AI 2025. What that number hides is how many of them automated the wrong work. A support agent who uses ChatGPT to summarize invoices saves half an hour a day. The same agent who lets the model answer a complaint risks reputational damage that takes months to undo. The difference isn't the tool. It's the choice of which work you give away and which you keep.

Why do most companies pick the wrong tasks?

Most companies start with the task that sounds coolest, not the one that pays off most. The logic looks simple: begin with whatever eats the most time. Reality works differently. The visible, exciting task usually needs human judgment, while the dull, repetitive one is the safe bet for AI.

Think of a restaurant that builds an AI chatbot for reservations while the waiters spend an hour every night typing allergy notes into the point-of-sale system by hand. The chatbot is visible to guests. The data entry isn't. But the data entry is repeatable, checkable, and low-risk. The chatbot has to handle nuance, emotion, and the expectations of a parent who just flagged a child's nut allergy.

Four questions that decide whether a task belongs with AI

Repeatability, verifiability, error cost, and context are the four dimensions you can score any task on. Researchers published a framework for intelligent AI delegation on arXiv in early 2026 that grades tasks across five dimensions. Four of them are immediately useful inside a working business.

Question 1: is the task repeatable and predictable?

AI performs best on tasks that follow a fixed pattern. A weekly summary of sales reports, sorting incoming email, turning receipts into bookkeeping entries. The more often you run a task the same way, the better the fit.

Here's a rule of thumb. If you can explain the task to a new colleague in three steps, AI can probably do it too. If you have to rethink the approach every single time, it's too unpredictable.

Question 2: can you check the result in two minutes?

This is the question most companies skip. AI produces output fast, but if checking it takes longer than the work itself, there is no net saving. You verify a meeting summary in thirty seconds: are the action points right? Verifying a legal review of a supplier contract takes an hour, even when AI wrote it in ten seconds.

The delegation framework calls this verifiability. With code, you can run tests. With a short email reply, you skim it. With strategic advice, checking costs almost as much as the work.

Question 3: what does a mistake cost?

Send an internal weekly update with a typo? Annoying, not a disaster. Send a client a quote with the wrong figure? That can cost you the deal.

The severity of an error decides how much human oversight you need, and therefore whether delegating to AI is worth it at all. The EU AI Act formalizes this split into high-risk and low-risk categories. For your daily decision the translation is simpler: if a mistake takes more than a working day to fix, keep the task with a person or build a tight review step around it.

Question 4: how much context do you have to supply?

Some tasks are context-free: convert this PDF to a spreadsheet. Others need company knowledge that is written down nowhere. The relationship with one specific client, the unwritten deal with a supplier, the politics of the shop floor.

Here's an analogy. Handing that work to AI is like asking a temp to run a tense conversation with your oldest client. The temp is missing ten years of shared history. So is the model.

The more undocumented context a task needs, the worse the fit for AI. The fix is not "document everything", because that is a project in itself. It is being honest about which tasks rely on knowledge that only lives in people's heads.

The decision matrix at a glance

The four questions, turned into a scoring grid you can fill in for any task.

Question	Scores well = fit for AI	Scores poorly = keep human
Repeatable?	Fixed steps, daily or weekly	Different every time, needs improvisation
Verifiable?	Checking takes under two minutes	Verification costs as much as the work
Error cost?	A mistake is fixed within an hour	A mistake costs a client or real money
Context?	All the information fits in a briefing	Needs undocumented company knowledge

Does a task score well on three or four questions? Start there. Does it score badly on all four? Don't use AI for it, however impressive the demo looked.

Which tasks almost always score well?

Summarizing email, processing invoices, and generating meeting notes are the tasks most often automated successfully. Adoption surveys keep pointing to the same pattern, and you can track the wider numbers in our generative AI statistics. The tasks that work are strikingly boring.

Summarizing and sorting email
Drafting replies to standard customer messages, with a template plus a human check
Processing invoices and creating bookkeeping entries
Generating meeting notes with action points
Writing job ads from a role profile
Competitive research: summarizing and comparing product pages

Wait, job ads? They make the list because they score well on all four questions. You write them regularly, you check the result in two minutes, a typo is not fatal, and the role profile holds all the context you need. Worth noting: if you point AI at the selection of candidates, Annex III of the EU AI Act classifies that as high-risk, which pulls in deployer duties like human oversight and transparency to applicants. Writing the ad copy itself carries none of that.

Which tasks should stay human?

Tasks that require judgment about people stay human work. They almost always share one trait: the context does not fit in a document. It lives in relationships, history, and social dynamics.

Preparing performance reviews (knowledge of someone's personal situation)
Handling complaints from major clients (years of relationship history)
Making strategic calls on incomplete data
Working through conflict inside a team
Negotiating with suppliers

AI can feed you facts here. A client's revenue history, the benchmark for a salary band, a summary of earlier correspondence. But the judgment itself takes empathy, nuance, and the willingness to own a decision. You don't delegate that.

A 2026 Deloitte study on the state of AI in the enterprise found that the most successful organizations are not the ones that automate the most. They are the ones that redesign their workflows so human strengths and AI capabilities reinforce each other. Less time spent preparing documents and doing basic research, more time spent framing the right questions and interpreting the results.

What if checking takes longer than the work?

Split the task: let AI do the groundwork and let the human do the judgment. This is the trap many companies fall into after the first month. The thrill of fast output fades the moment the review takes just as long.

A common example from professional services: a firm had AI write advisory reports. The output came back impressively fast. But the partner then spent two hours per report fact-checking it. The same as writing it from scratch. Net saving: zero.

The answer is not to drop AI. It is to split the task differently. Let AI gather and organize the data, which is repeatable and verifiable. Let the partner write the analysis and the conclusions, which take judgment and context. That is the decision model in action.

The same goes for any AI output you send to clients. The check itself must not become the bottleneck. If it does, you automated the wrong piece of work.

What can you do with this on Monday?

Take three tasks you are thinking of automating and score them on the four questions. You will find the answers are surprisingly clear, and that the tasks you like most are often not the ones that score highest.

Start with the most boring task that scores well on all four. Not the coolest. Not the most visible. The most boring. It delivers proven time savings without risk, and that is the foundation you build on later.

One more thing. Before you automate anything, pair the decision model with a simple return-on-investment check: does the task actually save time once you count the review? And if your team isn't aligned yet on how to use AI, agree on a shared way of working before you start handing over individual tasks.

How to decide which work to give AI and which to keep