Tech Giants Deploy Autonomous AI Agents: Legal Systems Struggle to Catch Up

Tech giants Google and Microsoft are racing to equip AI agents with enough autonomy to handle tasks without constant oversight. Veteran software engineer Jay Prakash Thakur has been running tests late into the night and over weekends, exploring how these programs might one day place food orders or build mobile apps independently. His efforts highlight gaps in both operations and legal accountability as companies deploy these self-directed systems.

AI agents are applications designed to work with minimal human input, tackling chores from customer support to invoice processing. Conventional chatbots can draft messages or sift through expense records when prompted. Microsoft and other firms expect agents to handle more elaborate functions, all under light supervision.

One high-ambition idea calls for multi-agent networks, where dozens of independent bots collaborate to replace entire work teams. The payoff for businesses is clear: slashed labor costs and faster throughput. Industry tracker Gartner forecasts that agentic systems will resolve 80 percent of routine customer service questions by 2029. Marketplaces like Fiverr report that searches for “ai agent” have soared over 18,000 percent in recent months.

Thakur, largely self-taught and based in California, has pushed to stay on the cutting edge. His day role at Microsoft doesn’t involve agents, but he’s spent spare hours building prototypes with AutoGen, Microsoft’s open source agent toolkit he first tried at Amazon in 2024. Amazon has since introduced a rival called Strands; Google offers its own Agent Development Kit.

He says he assembled multi-agent mock-ups with only a few lines of code. Leveraging AutoGen’s library, he linked agents into automated sequences. The arrival of Strands and Google’s kit means engineers now face fresh options for launching their own systems.

Identifying fault when an autonomous agent makes a costly error has been Thakur’s biggest headache. In a shared environment, piecing together a failure feels like stitching separate reports from multiple witnesses. “It’s often impossible to pinpoint responsibility,” he says.

Benjamin Softness, a lawyer who moved from Google to King & Spalding, told an audience at a Media Law Resource Center conference in San Francisco that claimants tend to sue those with the deepest pockets. Firms will need to brace for liability if agents go awry—even when a casual tinkerer triggered the mistake. “I don’t think anybody is hoping to get through to the consumer sitting in their mom’s basement on the computer,” he said. Insurers have begun offering policies that cover AI chatbot mishaps.

Many of Thakur’s experiments string together agents to run with scant human oversight. In one proof of concept, he aimed to replace software developers with two AI units. One scoured the web for specialized tools needed for app creation. The other summarized each service’s usage terms. A third unit, in theory, would follow those guidelines to code a complete application.

Testing that chain revealed a critical oversight. The search agent flagged a tool boasting “unlimited requests per minute for enterprise users.” The summarization agent then dropped the “enterprise users” part from its summary. This misled the coding agent into firing unlimited queries on a free account. The test ended harmlessly, yet a production rollout could have caused a complete meltdown.

In another scenario, he designed a restaurant-ordering framework that could take custom meal requests across different cuisines. A customer might type “burgers and fries” into a chatbox. One agent would determine a fair price and convert the text into a structured recipe format. That data would then hand off to robot cooks simulated with varied cooking styles. Thakur built the simulation because he lacks an actual kitchen or real robots.

About 90 percent of simulated requests ran smoothly. Trouble emerged when orders included more than five items. “I want onion rings” became “extra onions” in the recipe. Directives such as “extra naan” sometimes went missing. In a live environment, such mistakes could endanger customers with allergies.

He also tried a shopping-comparison agent that hunted for the lowest price. It surfaced a bargain on one site yet linked to a costlier product page at another vendor. Had the system proceeded with an automatic purchase, the buyer would have overspent, Thakur warns.

Even single-agent tools like ChatGPT can incur major blunders. Last year, an airline chatbot invented a coupon code that a court deemed legally enforceable. This month, Anthropic apologized after its AI produced a flawed citation in a filing. Software developer Naveen Chatlapalli recalls a human-resources bot approving leave requests it should have denied and a note-taking bot sharing confidential details with the wrong team.

With those simpler setups, engineers can usually trace the glitch and insert extra human checkpoints. Thakur says a quick customer confirmation step would have caught the restaurant-order errors, but that undercuts the idea of speeding up workflows. “We want to save time for our customers,” he says. “That’s where it’s still making mistakes.”

Some teams are building an overseeing “judge” agent that monitors peers, flags inconsistencies and intervenes before errors cascade. That supervisory layer might detect that a request for onion rings was accidentally parsed as generic onions.

Freelancer Mark Kashef, founder of Prompt Advisers on Fiverr, warns that chasing complexity too early can bloat systems into corporate-style bureaucracy. This spring, he advised a West African government to focus on creating one high-value agent rather than dozens of minor utilities.

As agent networks grow more elaborate, companies must sort out who foots the bill when bots botch an order or trigger a big loss. At the conference in San Francisco, OpenAI senior legal counsel Joseph Fireman and other lawyers noted that existing laws can hold clients partly liable for an agent’s missteps if users were clearly warned about the bot’s scope and constraints.

Several legal experts propose that businesses contracting for agentic services secure agreements shifting liability back to the technology provider. Everyday consumers have little leverage, and some may ask agents to review the fine print for them. “There will be interesting questions about whether agents can bypass privacy policies and terms of service,” Rebecca Jacobs, associate general counsel at Anthropic, said at the session.

Attorney and researcher Dazza Greenwood counsels caution. “If you have a 10 percent error rate with ‘add onions,’ that to me is nowhere near release,” he says. “Work your systems out so that you’re not inflicting harm on people to start with.”

For now, real users cannot simply hand over tasks entirely to AI agents and walk away.

Correction 4:30 EDT 05/23/25: An earlier version misstated who made a remark on corporate liability. Benjamin Softness was the speaker, not Joseph Fireman.

Similar Posts