How to automate Accounts Payable using LLM-Powered Multi Agent Systems


How to automate Accounts Payable using LLM-Powered Multi Agent Systems
How to automate Accounts Payable using LLM-Powered Multi Agent Systems

Introduction

In today’s fast-paced business landscape, organizations are increasingly turning to AI-driven solutions to automate repetitive processes and enhance efficiency. Accounts Payable (AP) automation, a critical area in financial management, is no exception. Traditional automation methods often fall short when dealing with complex, dynamic tasks requiring contextual understanding.

This is where Large Language Model (LLM)-powered multi-agent systems step in, combining the power of AI with specialized task allocation to deliver scalable, adaptive, and human-like solutions.

In this blog, we’ll:

  • Learn the core components and benefits of multi-agent designs in automating workflows.
  • Components of an AP system.
  • Coding a multi-agent system to automate AP process.

By the end of this blog, you’ll understand how to code your own AP agent for your own invoice use-case. But before we jump ahead, let’s understand what are LLM based AI agents and some things about multi-agent systems.

AI Agents

Agents are systems or entities that perform tasks autonomously or semi-autonomously, often by interacting with their environment or other systems. They are designed to sense, reason, and act in a way that achieves a specific goal or set of goals.

LLM-powered AI agents use large language models as their core to understand, reason and generate texts. They excel at understanding context, adapting to diverse data, and handling complex tasks. They’re scalable and efficient, making them suitable for automating repetitive tasks like AP automation. However LLMs cannot handle everything. As agents can be arbitrarily complex, there are additional system components such as IO sanity, memory and other specialized tools that are needed as part of the system. Multi-Agent Systems (MAS) come into picture, orchestrating and distributing tasks among specialized single-purpose agents and tools to enhance dev-experience, efficiency and accuracy.

Multi-Agent Systems (MAS): Leveraging Collaboration for Complex Tasks

A Multi-Agent System (MAS) works like a team of specialists, each with a specific role, collaborating toward a common goal. Powered by LLMs, agents refine their outputs in real-time—for instance, one writes code while another reviews it. This teamwork boosts accuracy and reduces biases by enabling cross-checks. Benefits of Multi-Agent Designs

Here are some advantages of using MAS that cannot be easily replicated with other patterns

Separation of Concerns Agents focus on specific tasks, enhancing effectiveness and delivering specialized results.
Modularity MAS simplifies complex problems into manageable tasks, allowing easy troubleshooting and optimization.
Diversity of Perspectives Various agents provide distinct insights, improving output quality and reducing bias.
Reusability Developed agents can be reconfigured for different applications, creating a flexible ecosystem.

Let’s now look at the architecture and various components which are the building blocks of a multi agent system.

Core Components of Multi-Agent Systems

The architecture of MAS consists of several critical components to ensure that agents work cohesively. Below are the key components that makes up an MAS:

  1. Agents: Each agent has a specific role, goal, and set of instructions. They work independently, leveraging LLMs for understanding, decision-making, and task execution.
  2. Connections: These pathways let agents share information and stay aligned, ensuring smooth collaboration with minimal delays.
  3. Orchestration: This manages how agents interact—whether sequentially, hierarchically, or bidirectionally—to optimize workflows and keep tasks on track.
  4. Human Interaction: Humans often oversee MAS, stepping in to validate results or make decisions in tricky situations, adding an extra layer of safety and quality.
  5. Tools and Resources: Agents use tools like databases for validation or APIs to access external data, boosting their efficiency and capabilities.
  6. LLM: The LLM acts as the system’s core, powering agents with advanced comprehension and tailored outputs based on their roles.

Below you can see how all the components are interconnected:

Core components of a Multi Agent System.

There are several frameworks that enable us to effectively write code and setup Multi Agent Systems. Now let’s discuss a few of these frameworks.


Frameworks for Building Multi-Agent Systems with LLMs

To effectively manage and deploy MAS, several frameworks have emerged, each with its unique approach to orchestrating LLM-powered agents. In below table we can see the 3 most popular frameworks and how they are different.

Criteria LangGraph AutoGen CrewAI
Ease of Usage Moderate complexity; requires understanding of graph theory User-friendly; conversational approach simplifies interaction Straightforward setup; designed for production use
Multi-Agent Support Supports both single and multi-agent systems Strong multi-agent capabilities with flexible interactions Excels in structured role-based agent design
Tool Coverage Integrates with a wide range of tools via LangChain Supports various tools including code execution Offers customizable tools and integration options
Memory Support Advanced memory features for contextual awareness Flexible memory management options Supports multiple memory types (short-term, long-term)
Structured Output Strong support for structured outputs Good structured output capabilities Robust support for structured outputs
Ideal Use Case Best for complex task interdependencies Great for dynamic, customizable agent interactions Suitable for well-defined tasks with clear roles

Now that we have a high level knowledge about different multi-agent systems frameworks, we’ll be choosing crewai for implementing our own AP automation system because it is straightforward to use and easy to setup.

Accounts Payable (AP) Automation

We’ll focus on building an AP system in this section. But before that let’s also understand what AP automation is and why it is needed.

Overview of AP Automation

AP automation simplifies managing invoices, payments, and supplier relationships by using AI to handle repetitive tasks like data entry and validation. It speeds up processes, reduces errors, and ensures compliance with detailed records. By streamlining workflows, it saves time, cuts costs, and strengthens vendor relationships, turning Accounts Payable into a smarter, more efficient process.

Typical Steps in AP

  1. Invoice Capture: Use OCR or AI-based tools to digitize and capture invoice data.
  2. Invoice Validation: Automatically verify invoice details (e.g., amounts, vendor details) using set rules or matching against Purchase Orders (POs).
  3. Data Extraction & Categorization: Extract specific data fields (vendor name, invoice number, amount) and categorize expenses to relevant accounts.
  4. Approval Workflow: Route invoices to the correct approvers, with customizable approval rules based on vendor or amount.
  5. Matching & Reconciliation: Automate 2-way or 3-way matching (invoice, PO, and receipt) to check for discrepancies.
  6. Payment Scheduling: Schedule and process payments based on payment terms, early payment discounts, or other financial policies.
  7. Reporting & Analytics: Generate real-time reports for cash flow, outstanding payables, and vendor performance.
  8. Integration with ERP/Accounting System: Sync with ERP or accounting software for seamless financial records management.
Here’s a typical flow of AP automation along with technology that’s used in each step.

Implementing AP Automation

As we’ve learnt what is a multi-agent system and what is AP, it’s time to implement our learnings.

Here are the agents that we’ll be creating and orchestrating using crew.ai –

  1. Invoice Data Extraction Agent: Extracts key invoice details (vendor name, amount, due date) using multimodal capability of GPT-4o for OCR and data parsing.
  2. Validation Agent: Ensures accuracy by verifying extracted data, checking for matching details, and flagging discrepancies.
  3. Payment Processing Agent: Prepares payment requests, validates them, and initiates payment execution.

This setup delegates tasks efficiently, with each agent focusing on a specific step, enhancing reliability and overall workflow performance.

Here’s a visualisation of how the flow will look like.

Here’s a visualisation of how the flow will look like.

Code:

First we’ll start by installing the Crew ai package. Install the ‘crewai’ and ‘crewai_tools’ packages using pip. 

!pip install crewai crewai_tools

Next we’ll import necessary classes and modules from the ‘crewai’ and ‘crewai_tools’ packages.

from crewai import Agent, Crew, Process, Task
from crewai.project import CrewBase, agent, crew, task
from crewai_tools import VisionTool

Next, import the ‘os’ module for interacting with the operating system. Set the OpenAI API key and model name as environment variables. Define the URL of the image to be processed.

import os
os.environ["OPENAI_API_KEY"] = "YOUR OPEN AI API KEY"
os.environ["OPENAI_MODEL_NAME"] = 'gpt-4o-mini'
image_url="https://cdn.create.microsoft.com/catalog-assets/en-us/fc843d45-e3c4-49d5-8cc6-8ad50ef1c2cd/thumbnails/616/simple-sales-invoice-modern-simple-1-1-f54b9a4c7ad8.webp"

Import the VisionTool class from crewai_tools. This tool uses multimodal functionality of GPT-4 to process the invoice image.

from crewai_tools import VisionTool
vision_tool = VisionTool()

Now we’ll be creating the agents that we need for our task.

  • Define three agents for the invoice processing workflow:
  • image_text_extractor: Extracts text from the invoice image.
  • invoice_data_analyst: Validates the extracted data with user defined rules and approves or rejects the invoice.
  • payment_processor: Processes the payment if it is approved.
image_text_extractor = Agent(
   role="Image Text Extraction Specialist",
   backstory="You are an expert in text extraction, specializing in using AI to process and analyze textual content from images, specifically from PDF files which are invoices that need to be paid. Make sure you use the tools provided.",
   goal= "Extract and analyze text from images efficiently using AI-powered tools. You should get the text from {image_url}",
   allow_delegation=False,
   verbose=True,
   tools=[vision_tool],
   max_iter=1
)
invoice_data_analyst = Agent(
   role="Invoice Data Validation Analyst",
   goal="Validate the data extracted from the invoice. In case the conditions are not met, you should return the error message.",
   backstory="You're a meticulous analyst with a keen eye for detail. You're known for your ability to read through the invoice data and validate the data based on the conditions provided.",
   max_iter=1,
   allow_delegation=False,
   verbose=True,
)
payment_processor = Agent(
   role="Payment Processing Specialist",
   goal="Process the payment for the invoice if the payment is approved.",
   backstory="You're a payment processing specialist who is responsible for processing the payment for the invoice if the payment is approved.",
   max_iter=1,
   allow_delegation=False,
   verbose=True,
)

Defining Agents, which are the personas in the multi-agent system

Now we’ll be defining the tasks that these agents will be performing.

Define three tasks which our agents will perform:

  • text_extraction_task: This task assigns the ‘image_text_extractor’ agent to extract text from the provided image.
  • invoice_data_validation_task: This task assigns the “invoice_data_analyst” agent to validate and approve the invoice for payment based on rules defined by the user.
  • payment_processing_task: This task assigns a “payment_processor” agent to process the payment if it is validated and approved.
text_extraction_task = Task(
   agent=image_text_extractor,
   description=(
       "Extract text from the provided image file. Ensure that the extracted text is accurate and complete, "
       "and ready for any further analysis or processing tasks. The image file provided may contain various text elements, "
       "so it's crucial to capture all readable text. The image file is an invoice, and we need to extract the data from it to process the payment."
   ),
   expected_output="A string containing the full text extracted from the image."
)
# We can define the conditions which we want the agent to validate for payment processing.
# Currently I have created 2 conditions which should be met in the invoice before it's paid.
invoice_data_validation_task = Task(
   agent=invoice_data_analyst,
   description=(
       "Validate the data extracted from the invoice and ensure that these 2 conditions are met:\n"
       "1. Total due should be between 0 and 2000.00 dollars.\n"
       "2. The date of invoice should be after Dec 2022."
   ),
   expected_output=(
       "If both conditions are met, return 'Payment approved'.\n"
       "Else, return 'Payment not approved' followed by the error string according to the unmet condition, which can be either\n"
   )
)
payment_processing_task = Task(
   agent=payment_processor,
   description=(
       "Process the payment for the invoice if the payment is approved. In case there is an error, return 'Payment not approved'."
   ),
   expected_output="A confirmation message indicating that the payment has been processed successfully: 'Payment processed successfully'."
)

Tasks performed by each agent

Once we have created agents and the tasks that these agents will be performing, we’ll initialise our Crew, consisting of the agents and the tasks that we need to complete. The process will be sequential, i.e each task will be completed in the order they are set.

# Note: If any changes are made in the agents and/or tasks, we need to re-run this cell for changes to take effect.
crew = Crew(
   agents=[image_text_extractor, invoice_data_analyst, payment_processor],
   tasks=[text_extraction_task, invoice_data_validation_task, payment_processing_task],
   process=Process.sequential,
   verbose=True
)

Finally, we’ll be running our crew and storing the result in the “result” variable. Also we’ll be passing the invoice image url, which we need to process.

result = crew.kickoff(inputs={"image_url": image_url})

Here are some sample outputs for different scenarios/conditions for invoice validation:

Sample approved invoice
Case 1: All the validation conditions met and invoice processed successfully by the AI agent.
Case 2: Invoice total due greater than the total due limit. Payment not approved by the AI agent.
Case 3: Invoice date before the allowed date. Payment not approved by the AI agent.

If you want to try the above example, here’s a Colab notebook for the same. Just set your OpenAI API and experiment with the flow yourself!

image


Sounds simple? There are a few challenges that we’ve overlooked while building this small proof of concept.

Challenges of Implementing AI in AP Automation

  1. Integration with Existing Systems: Integrating AI with existing ERP systems can create data silos and disrupt workflows if not done properly.
  2. Employee Resistance: Adapting to automation may face pushback; training and clear communication are key to easing the transition.
  3. Data Quality: AI depends on clean, consistent data. Poor data quality leads to errors, making source accuracy essential.
  4. Initial Investment: While cost-effective long-term, the upfront investment in software, training, and integration can be significant.

Nanonets is an enterprise-grade tool designed to eliminate all the hassles for you and provide a seamless experience, effortlessly managing the complexities of accounts payable. Click below to schedule a free demo with Nanonets’ Automation Experts.

Conclusion

In summary, LLM-powered multi-agent systems provide a scalable and intelligent solution for automating tasks like Accounts Payable, combining specialized roles and advanced comprehension to streamline workflows.

We’ve learned the paradigms behind multi-agent systems, and learnt how to code a simple crew.ai application to streamline invoices. Increasing the components in the system should be as easy as generating more agents and tasks, and orchestrating with the right process.

Related articles

Introductory time-series forecasting with torch

This is the first post in a series introducing time-series forecasting with torch. It does assume some prior...

Does GPT-4 Pass the Turing Test?

Large language models (LLMs) such as GPT-4 are considered technological marvels capable of passing the Turing test successfully....