Why many “AI-based” automation prototypes will never be launched

8 min read

By Maxim Kolosovskiy

Automation robot
Image source: 123RF

Mind the limitations and capabilities of AI from the very beginning, and beware of prototyping an “AI-based” product with a human behind a curtain. Otherwise, there might be no place for AI in such a product in the end.

Sticking to a human behind a curtain (a.k.a. Wizard of Oz technique)

Putting too much on AI’s shoulders may actually kick AI out of the product.

Many AI-based products set too ambitious automation goals which would require years or even decades of AI algorithms development, collecting precise and extensive training data, and all related work. Obviously, not any team can wait for such a long time to see the first results. Thus, teams start with Wizard of Oz prototyping: a product that is supposed to be controlled by AI is actually controlled by a human (e.g.). However, due to the huge gap between product expectations and real AI’s capabilities, such a product may stick to relying on ‘a human behind a curtain’ forever. That is sad news for AI and automation enthusiasts because the folks may end up creating infrastructure and/or tools to facilitate the work of these humans instead of genuinely teaching a machine to be smart. Below I suggest what we could do to avoid that.

Famous examples of a human behind a curtain

Mechanical Turk
Mechanical Turk was a fake chess-playing machine constructed in the late 18th century and allowed a human chess master hiding inside to operate the machine and won most of the demonstration games over 84 years. Source of the image.
Wizard of Oz
The Wizard of Oz technique is a way of prototyping an AI product where the product pretends to be autonomous, but actually is operated by an unseen human being. Screenshot from The Wizard of Oz (1939 film)

Don’t overpromise what your product can do

Apparently, a product that can do more for a user is way more appealing. As noted above, such a product may not live up to the promises or will have to stick to using human labor forever, which normally excludes applying any AI techniques, e.g. ML. Then, the tech stack will be all about two things: 

– tools to speed up manual work (as the manual work is the bottleneck — the most expensive and slow part of the pipeline);

– infrastructure to monitor whether additional manual work is needed, in particular, when the system receives a new unseen input (as unseen inputs or changes in the outer world are inevitable and a system that cannot really learn would not adapt to new inputs automatically).

Be honest about what a product can do and what it can not. Set clear expectations. 

In particular, be careful with positioning a product as human-like. Users may expect that the product should have all possible human capabilities, which is unlikely true. Have at least a rough understanding of how your AI model will ‘mine knowledge’ (preferably from massive training data), don’t consider AI as a magic box that will somehow figure out what to do.

Start small

In order to help users in a selected domain, a team could and should start small, namely, by generating suggestions in specific parts of a user flow instead of pretending to replace a human in the whole user flow. This comprises two changes:

  • By a ‘suggestion’, I mean something that doesn’t pretend to be 100% precise, but nevertheless provides a decent value for a user.
  • When automating just a part of a user flow, a team can be more focused and output better results for that specific part.

Thus, a team can build a product that provides value for a user and is not forced to involve a human in the loop. Moreover, this strategy allows cherry-picking the cases where AI strives and defer handling hard edge cases. Likely, the Pareto principle could be applied: the former would cover 80% of cases and would take just 20% out of the time to support all cases. Given that a genuine AI can be scaled up, even prodigal exclusion of the hard cases will still produce a lot of valuable knowledge.

The approach has an analogy in levels of driving automation.

levels of autonomous driving
Levels of driving automation. Source

A team can start with implementing a cruise control. It is just a ‘suggestion’ that a driver can easily overrule: it doesn’t have to be 100% perfect because a driver can correct it. Nevertheless, the ‘suggestion’ can be good enough to really help the driver.

Applying the Wizard of Oz technique could mean hiding a human in the vehicle and pretending that it is a prototype of an autonomous vehicle. That could be super impressive, but such a prototype is basically pointless and doesn’t move us forward to a real autonomous vehicle.

Assess an AI product by its ability to produce valuable knowledge without human tweaks and/or monitoring. Thus, a small component that offers just limited assistance but works genuinely autonomously is more valuable than a component that mimics wide assistance, but requires more monitoring from a human.

Start with a small, but genuinely autonomous product.

Continue by automating other parts of the flow

Later on, another part of a user flow can be automated without a human in the loop. And one more part…

And also the accuracy of the existing parts can be polished in parallel… Eventually, when all these parts work quite well, they can be combined to assist a user with the whole flow.

Automation without a human in the loop enables a product to work without your close supervision. Meanwhile you can automate the next parts.

Less automation might be a better strategic decision for the start

Offering a ‘humble suggestion’ instead of ‘I will simply do all the work for you’ is not only about starting small. It is more about having a special UI, where a user is explained how and why suggestions were inferred, feels in control, and is able to fix a mistake if needed.

Below is a comparison of the two approaches from various perspectives.

automation approaches
Less vs more automation approaches from various perspectives, e.g. scalability and living up to expectations. Image by author. Source of the emojis.

The first approach (less automation) could be referred to as Intelligent Augmentation (IA): unlike Artificial Intelligence (AI), it doesn’t aim to perform tasks like a human, but augments a human with assistance that a computer could do really better than a human (e.g. remembering and searching in masses of data, doing tons of mechanical computations instantly).

The approach may complicate pitching your idea 

It is usually hard to secure the value of such incremental changes. Real automation (i.e. without a human behind the curtain) of only a part of a user flow will still take time. The value of automating just one step would not be that big compared to a prototype that hides a human behind the curtain and therefore could perform the whole flow from end to end. Thus, it might be hard to convince a business to be patient and go gradually.

Slow and steady wins the race.

Below is what you can do to overcome these challenges:

  • Have a list of milestones where each milestone has its self-contained value and therefore the team can stop or suspend the project after any milestone (for example, if the further milestones don’t seem so promising anymore).
  • Create a rough first version of a product to demonstrate that your idea seems viable.
  • Don’t prototype a product by hiding a human behind a curtain, but prove that an (AI) algorithm under the hood could really produce some useful intelligence.
  • Prototyping how a product would look and feel to a user (i.e. showing what value a product will bring) could be important too, but the most challenging part is to assess whether you can really create an algorithm that will be able to provide that value. For example, demonstrating a UI that would answer any arbitrary question clarifies basically nothing — such a program would be definitely useful. The real question is whether there is an algorithm that could give a good answer to an arbitrary question.
  • Prototyping a rough first version may need a substantial amount of time, for which you may need to get permission first, for which it would be good to have the first version… It turns into the chicken or egg problem and there is no clear answer on how to handle such a situation. Consider spending a reasonable time to roughly verify whether a given idea would work. Ask for forgiveness, not permission.
  • Early prototyping is supposed to eliminate a substantial fraction of uncertainty that is typical for data-related projects: though there is always a chance that unknown underwater rocks could stop or significantly delay a project, the first testing of a hypothesis in the wild normally helps a lot.

How to position an AI-based suggestions product?

Imagine you need to build a feature for Medium.com to pick the next good read for a user. 

Build a clear mental model of how your AI product works. That would help users to use an AI-based product. A user should be able to understand why AI provides a given suggestion (e.g. ‘selected based on your reading history’), and how trustworthy a suggestion is (e.g. ‘you have read N articles from this topic’, ‘N people you follow read this article’). A product should not be a magic ball that shows what to do for a user. Instead, a product should be like a secretary that has done some tedious and/or boring work for their boss (a user) and provide a summary, which the ‘boss’ uses to make the final decision.

Provide control to affect an AI model (e.g. ‘show more articles like this’). This is how a user may comprehend an AI model more naturally.

Apparently, any user would prefer to solve a problem by pressing a single button that accepts the best suggestion. As mentioned above, AI may not be able to always infer a 100%-accurate suggestion — some mistakes are not inevitable even in the problems that have been researched for decades, e.g. handwritten digit recognition. Therefore, provide paths forward from a mistake (e.g. ‘not interested in this topic/author/article’), and return control to the user when needed (e.g. ‘customize your interests’, including those that were learned automatically by an AI model).


Less is more.

Minimizing manual work to serve automation creates a solid foundation for a product in the long term: you move gradually and allow imperfections of the product’s intelligence, but the independence of expensive human labor lets you scale up.

Recommended reading

  1. B. Dickson. Augmented Mind: Why we need a different perspective on AI (2019).
  2. B. Dickson. The Wizard of Oz: How bad AI marketing created human bots (2018).
  3. Google PAIR team. People + AI Guidebook (2019).
  4. G. Gürsun. Best practices for deploying AI within large organizations (2023).
  5. M. Kolosovskiy. Many ML projects fail because of this misunderstanding about ML (2022).
  6. M. Kolosovskiy. A factor that may make your AI/ML application great (2022).

About the author

Maxim Kolosovskiy

Maxim Kolosovskiy, SWE & Automation Enthusiast @ Google | PhD | ACM ICPC medalist. Maxim, with his colleagues from Google Password Manager, works on how Chrome understands web forms, and thereby helps users sign in faster and stay safe online. Maxim writes about AI on Medium. (The opinions stated here are Maxim’s own, not necessarily those of his employer.)

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.