The practical guide to build useful vertical AI agents by putting humans in the loop

Dec 30, 2024

Are you building an AI agent right now?

Or maybe you want to build one?

This is a step-by-step guide to building useful AI agents. It puts humans in the loop to quickly get to a working prototype.

Let’s get straight into it.

0- Pick the problem: scratch your own itches.

If you want to build a very good AI agent, it’s going to be tough if you don’t solve a problem you personally encountered.

When building traditional software, engineers can deliver good business value even if they don’t care too much about the business side of things.

For agentic systems, I think it’s very different. We’re dealing with a system that can serve high-level requests from the end user. You need to understand the context of this end user, the Job To Be Done, the incentives of this user, etc, etc …

So pick something you do daily, ideally multiple times a day. It’s something you don’t necessarily enjoy doing, but you do it because it enables you to do something else you care about.

I am going to pick lead generation as an example.

1- Gather real-life user requests

To do that, ask people around you:

AI agents suck. We're fixing that. From the team behind AgentOps, Agency helps teams create reliable AI agents at scale. Agent Dev Tools AgentOps is the industry leading agent observability platform. Vetted AI Agents Browse 400+ AI agents working in production. Agent Expertise We help everyone from startups to enterprise build agents at scale. About Agency AI agents are the next major paradigm shift in computing. We bring them to life. We've personally built and reviewed hundreds of AI agents. Agency helps clients prototype and productionize ambitious projects. Find out more.

Notes:

You can see I have picked a clear interface and a clear deliverable.
The user sends a text and he receives a list of LinkedIn urls.

And why only 20? Because I am trying to optimize for the number of briefs. I want to fulfill as fast as possible. The more volume I produce, the more opportunities I have to improve.

So ideally I would only give back 1 person. But 1 person would be completely useless for the end user.

So how about 100? Well that’s better for the end user. But what’s the smallest number that is still useful for the end user?

10 is ok, but maybe too little? How about 20? hmm… 20 feels decent.

So I went for 20.

Conclusion: because you understand the problem you’re solving, you’re able to orient the deliverable to make it easy and repeatable, while remaining useful for the end user.

2- Fulfill the request manually. Produce the expected deliverable.

Do the freaking work. Ship the stuff, using whatever tool you want.

3- Outsource the fulfillment and control the quality

Once you get a grasp of what to do, go on Upwork and outsource the fulfillment. Why? Because it’s not the most important part of the process. You will understand why.

In my example, I asked an Upwork freelancer to put the urls in a google sheet.

Once you outsource the work, you will see that some things will naturally emerge.

The “brief clarification” step

Ideally you’re able to do anything the user wants. But in practice it’s better to do something slightly different than what the user wants.

Because some requests will lead to do a disproportionate amount of work compared to the value they provides. I call this “dumb work”.

For example, let’s say someone asks: “founders of companies of 5+ employees in Chicago”.

On Linkedin the only range you’re given for employee number is either 1-10 or 11-50, etc, etc…

And Linkedin is usually going to be the simplest source to check the employee number.

So here is a a clarifying step that avoids dumb work:
”””
Hey, customer!

Would you mind if instead I gave you this: founders of companies of 11+ employees in Chicago
”””

Without this step, you’re going to spend a lot of your time working on things your customer doesn’t value.

Side note: Why do we pick 11+ instead of 1+ ? Because 11+ is included in the range 5+ but 1+ is not included in the range 5+.
These very basic “common sense rules” are going to become your bread and butter as an agent builder.

It’s incredibly boring right? That’s why you better care about the problem you solve. We’re not here to have fun and work on complex systems here. However, you will see that your whole agentic workflow will become a beautiful machine. And hopefully you will learn to love this machine.

The validation step

The Upwork freelancer fulfilling the request will make a lot of mistakes. Which is great. This allows you to nail your quality process. You will have to write rules to validate the work is done correctly.

These rules are also called evals. You might end up creating formulas and small automations to make them work.

For example, you might make an API call to a data provider like leadmagic.io to fetch the information of the company and the person associated to the linkedin url. Then use this data to manually check that the lead given to you matches the criteria in the brief. And eventually you will be able to automate this check.

Side note: “What’s the point of making these API calls? Aren’t we doing the work of the freelancer?”

And this is where understanding the Job To Be Done is important. Lead generation always involves a generation step, where potential candidates are generated. This is what the freelancer gives you. And this is actually really valuable. From there, it’s our job to label a candidate as valid or invalid, given the initial brief from the end user.

“Wait the freelancer can just send a bunch of invalid links and make money then?” No and this is where the validation step is important.

One common practice is to make 2 freelancers work and identify what represents a good baseline for the following metric:
[valid candidates/ candidates].

If you see a freelancer giving you too much garbage compared to other freelancers, simply stop working with him/her.

Conclusion: The beauty of outsourcing fulfillment is that some steps will naturally emerge to manage the fulfillment. These steps are more important than the fulfillment itself, because they will be used to build a benchmark that selects the best workers.

And these steps will have to be supported by states, which is step 4.

4- Create states that describe the status of the user request being fulfilled.

Here are some examples:

Side note:

1- I love writing “Waiting for XYZ” as a state, because it makes it clear what’s the next step and who is the blocker.

2- Let the states emerge. Don’t force the states. Supervise the work manually, and you will see that you will start getting lost. You will need to create a status that describes exactly what’s the status of a request. You know you’ve built the right states once you can easily switch between different user requests and instantly know the current status.

3- you’re pretty much building a workflow engine, here ;)

5- Partially outsource quality and make shadowing sessions

Once the workflow is starting to stabilize, outsource the work you’re doing.

To achieve this, hire someone and have them shadow you. This means having them go on a call with you, where you’re speaking out loud and describing your thought process to supervise the work.

Once you’ve done these shadowing sessions, shadow the freelancer. Now you’re watching them do the work and monitor quality, and they have to explain their thought process.

6- Apply Elon Musk’s five steps design process.

Congrats, you reached a point where you can technically remove yourself from the process.

I call this role the “super user role”. The super user validates some of the work performed by the supervisor (it “validates the validation”). And the super user also improves the process constantly.

At this stage there are 3 roles:

the super user (you)
the supervisor
the worker (person fulfilling the job)

But in practice you can’t really remove yourself from the process, because everything will slowly deteriorate.

Fortunately, you can now monitor the “manufacturing line” and improve it.

To do this, follow the 5 steps process of Elon Musk.

1- Question every requirement
2- Delete any part of the process you can
3- Simplify and optimize
4- Accelerate cycle time
5- Automate ( this is where we add LLMs)

More info here:

https://modelthinkers.com/mental-model/musks-5-step-design-process

This would probably require an article of its own.

Here are some things I recommend at this stage:

0- Keep playing the role of the supervisor every day, or you will lose touch of the work being done.

1- until the fulfillment (the worker) becomes a huge bottleneck, work on everything else. Remember you can always find a better worker, and you can select them based on how well they score at the benchmarks.

The process of selecting the worker is also part of your agentic system. So make sure you nail it. Fundamentally, there isn’t much difference between a benchmark and a system that allows you to select the right Upwork freelancer.

2- ask the workers how the briefs could be easier to fulfill.

Sometimes, when speaking to people fulfilling the briefs, you could drastically decrease the time it takes to fulfill a request without really affecting the user’s satisfaction (eliminate dumb requirements)

You want to have this information and use it to simplify the requirements.

3- Never stop the manufacturing line, even if it means creating fake orders.

Yes that sounds weird, but stopping the manufacturing line is very costly because you break the flow.

So create fake requests if you have to.

4- Once you’re at the stage where you’re automating the work performed by the worker, try to fulfill all the 3 roles during the same day (super user, supervisor and worker). This is really where the magic happens.

No one else will be able to find the heuristics and intuition you develop by working on these 3 at the same time. Yes, it’s hard, but it’s really worth it. You will immediately identify optimizations that would take weeks of process mining for other companies to find. Ideally your whole team is able to act as both workers and supervisors, eventually. And one day, one person will become the super user (at this point you pretty much built a business).

Side note:

The last stage is really what makes me think that vertical agents will be built by people passionate on their craft. For these people, improvement becomes the reason they work. They want to make the company succeed just so that they can keep enjoying working on their craft. This type of work culture is very hard to duplicate. And also very hard to beat.

Traditional software focuses on checking boxes. Agents deliver relevancy through deliverables. And there is no limit to relevancy.

One expert living and breathing for his/her craft can deliver more value and compete with organizations of thousands of people. This becomes even more true if software engineering keeps getting commoditized.

Conclusion

By putting humans in the loops, you can quickly reach step 6.

In step 6, you have a “manufacturing line” you can optimize.

This is an incredible shortcut because at this stage you have a process but no automation. Which means you can remove anything you want.

Automations are costly and can end up being useless. Worse, sometimes automations become the defacto way of doing things and you end up discussing with engineers stuck in the past.

Step 6 is the miracle step, where you get to shape the process exactly like you want it, before automating it.

Maybe you will discover you spend most of your time talking to the workers. Maybe the supervisor is overwhelmed because checking quality manually is a tough game.

At this stage you can build google sheets formulas and little scripts to make the supervisor move faster.

From there, the work is never done. Every day is an opportunity to improve. The best vertical agents will come from cultures prioritizing putting every team member in the loop to a degree that looks absolutely insane for organizations today and that will become the norm 10 years from now.

Brought to you by the agen.cy team

If you enjoyed this content, go to agen.cy to learn how we can help you!