Challenges in Using Agents

While AI agents offer significant potential, there are also several challenges that need to be addressed to ensure their successful implementation:

Use Case Understanding

Determining the appropriate level of complexity for an agent is critical. Over-engineering can lead to unnecessarily complex and inefficient systems, while under-engineering can limit the agent's ability to solve complex problems. It's important to identify tasks that require flexibility and model-driven decision-making, as opposed to simpler tasks that can be handled with predefined workflows. Care should also be taken when using an agentic workflow to replicate existing human workflows: The reasoning behind existing processes due to limitations in manual decision making might be improved by agentic systems. For example, a human might implement a function to scrape a website for data but there might be a programmatic way of getting the data via RSS feeds or through API calls. If we follow the existing processes and implement a web-scraping tool, this would lead to inherent inefficiency. Giving agents the autonomy to do their own searches and execute the right code might lead to better results in this case.

Evaluation

Evaluating agents is complex due to the multiple possible failure points in planning, tool execution, and efficiency. Traditional evaluation metrics for LLMs, such as perplexity or BLEU score, are not sufficient for evaluating the performance of AI agents. Evaluation must consider the agent's ability to plan, execute actions, and achieve its goals effectively. This requires developing new evaluation metrics and methodologies that can assess the agent's overall performance and identify areas for improvement.

Latency

Agents can be slower and more costly due to the multiple steps required to accomplish a task. The iterative nature of agent-based systems, which often involves multiple interactions with the LLM and external tools, can lead to increased latency. This can be a significant concern for applications that require real-time responses or high throughput. Reflection, while improving performance, can also increase latency.

Security

Agents are at risk of methods which subvert their intended purpose. Not only are they susceptible to prompt injection attacks, they can also execute code (e.g., using code interpreters or APIs) and pose a significant security risk if the code execution is not carefully controlled. Agents also often leverage tools and send data to them. Tools themselves can be compromised, especially if their implementations are not transparent. (e.g. tool calling a closed API which leaks data). Furthermore, agents might make mistakes when dealing with sensitive operations and insufficient control over the agent's actions and permissions can lead to unauthorised access to sensitive data or system resources. Security measures must be implemented to protect against these attacks and mistakes.

Infrastructure Support

Server side multi-agent systems potentially involve multiple model hosting infrastructure, and user side agentic systems may require ad-hoc deployment of resources required for tools to function. This may require more complex infrastructure orchestration solutions. For example, we can either have a self-hosted fine-tuned model running 24/7 for an image captioning tool which our agent will use sporadically, or spin up a GPU server to host the model only when the agent invokes the tool. The latter will lead to much greater cost savings.

TL;DR

Adopting agentic AI is not without its challenges. The use case must be well understood to to apply the technology. Even so, we need to know how to measure the success, address latency and security issues, and provision appropriate infrastructure to scale the solutions.