Responsible AI Agents: A CTO's Playbook for Software Teams

Eugene Naydenov,

Chief Technology Officer in Competera

Recently, Apple published its excellent research aimed at better understanding the “thinking” capabilities and limitations of Large Language Models (LLMs) and Large Reasoning Models (LRMs). The paper’s findings have further confirmed my observations and conclusions regarding the smart approach to using Generative AI in day-to-day software engineering activities and motivated me to create a compilation of our findings in Competera.

Illusion of absolute power

A powerful FOMO effect is prompting many engineers and managers to aggressively adopt Generative AI for all aspects of their work, to boost productivity and profits. This trend is fueled by a constant stream of marketing that showcases successes and ignores failures, which is a clear case of survivorship bias. This narrative creates the myth that a single person, sometimes even without deep specialized knowledge, can take any idea to production simply by being surrounded by AI tools. While not entirely false, this attractive idea masks the numerous pitfalls that await the vast majority of complex projects.

We probably can't measure yet which is the greater danger: the blind faith of the "vibe-coding" cohort, or the rigid opposition of those who reject any form of AI-augmented programming. I suspect that, as always, the wisest path is found somewhere between these two extremes. Nevertheless, rather than debating those opinions, let's focus on some principles rooted in common sense.

Coming down to earth

Reportedly, around 30% of the code at Google now uses AI-generated suggestions. I tend to believe that all of those suggestions are carefully supervised by expert engineers working within a mature development process. And the engineering velocity increase at Google, as confirmed recently by Sundar Pichai in his interview, is now at 10%.

One would think that 10% is too little. Especially those who believe that nowadays almost anyone is able to create any product from scratch by just prompting appropriate Generative AI tools. Though organization-wise, I would rather treat this velocity increase in Google as a substantial optimization of the company's economy. I assume that achieving this number was not easy in an environment of non-trivial architectures, high load, high security requirements, and sophisticated engineering processes, and was rather a time-consuming process of careful iterative adoption and improvement under full human control, which was a substantial company investment in itself.

Google’s case is indicative, given its number of engineers, and serves as a representative statistical sample. It creates an opportunity to set realistic expectations for Generative AI capabilities and confirms the findings and conclusions we made in Competera R&D.

High-quality code and architecture

I expect that the code available for LLM training is of variable quality. To both raise the bar for code quality control and maintain a sufficiently critical eye when reviewing LLM-generated code, I find it helpful to exaggerate my assumption: that the training data follows a positively skewed distribution, with its mass concentrated in the low- to medium-quality range and a long tail of rare, high-quality examples.

When it comes to architecture design, while I admit that LLMs are good at straightforward solutions, boilerplate code, and generic algorithms, they can struggle to generate properly architected solutions for complex cases requiring deep domain knowledge and vast contextual information. This limitation is expected, as they don't truly think like humans. This is why Generative AI agents and agentic workflows are becoming increasingly popular; they help to augment and guardrail the capabilities of LLMs by using deterministic tools and by grounding them in external knowledge from vector and graph databases.

Based on my observations, current LLM architectures have very limited potential for achieving human-like thinking capabilities. Before we can reliably delegate mission-critical tasks to them without additional grounding, guardrailing, and augmentation, we will first need to introduce different algorithms that more accurately reflect how the human brain thinks.

Finding the middle ground

While we shouldn't look at Generative AI through rose-tinted glasses, it's already been proven that these tools can significantly increase our productivity. Despite their imperfections in human-like cognition, when used properly, like any tool, they allow us to save significant time by delegating the grunt work they are capable of performing much faster than any human.

Based on extensive empirical findings from Competera’s R&D, which develops a complex pricing platform, I have found many principles essential for leveraging the strengths of Generative AI while mitigating its weaknesses. Some of the most crucial principles for maximizing our productivity, minimizing failures, and achieving a positive economic effect are the following:

Initially, don't expect an LLM to create a complex product from scratch with a single prompt. Instead, take ownership of the architectural design and begin by feeding the LLM very simple tasks. For an exaggerated example, start small with something as basic as autocompleting for-loops in your code editor. From there, gradually delegate more complex and context-rich problems to LLMs and agents. Throughout this process, you must critically analyze the results to define the optimal boundary between human and AI responsibilities - a boundary that you will need to constantly re-evaluate as LLMs and agents evolve.
Treat Generative AI tools as you would a junior engineer. Just like a junior, they require a clear task description, sufficient context, and continuous oversight. You remain fully responsible for reviewing, understanding, and refining the code, as well as for any consequences of its behavior in production.
While you should provide LLMs with sufficient context, you should also limit their unit of work. This minimizes the complexity the AI has to manage and leads to better results.
Ensure that related code, technical documentation, and domain knowledge combine to create a high-quality structured contextual input for the LLM. It has been empirically proven that providing this context improves the quality and correctness of the generated code - great in, great out.

The Transformative Impact of AI in Retail Pricing

Our work at Competera directly embodies the human-supervised AI principles discussed throughout this article, especially within retail pricing. Our R&D team builds advanced pricing solutions where AI fundamentally transforms how prices are set.

We deploy sophisticated models that analyze billions of data points—from demand elasticity to competitor actions—to optimize SKU-level prices. This empowers retailers to move from reactive pricing to proactive, data-backed decisions.

Critically, while our platform provides unparalleled recommendations, its true strength lies in combining these AI outputs with deterministic and statistical checks, under the crucial oversight of data scientists from the engineering side and domain experts such as pricing managers and pricing architects. It’s a prime example of how human expertise, much like an architect guiding AI-generated code, remains central to achieving robust, real-world impact.

Conclusion

Many powerful Generative AI tools can enhance productivity, but they are not a replacement for human expertise and critical thinking. They work best when given clear instructions, sufficient context, and continuous supervision, much like a junior engineer. One should not expect LLMs to create complex products from scratch with a single prompt. Instead, start with small tasks and gradually delegate more complex problems, always retaining responsibility for review and understanding. While there is no magic solution or one-size-fits-all approach, sensible and controlled use of LLMs can significantly optimize development processes. Therefore, to leverage their strengths and mitigate weaknesses, approach LLMs with realistic expectations and focus on augmenting, not replacing, human intellect and architectural design.

Evolution is unstoppable, and we must adapt to what it brings. Over time, we constantly increase our level of abstraction in many areas of our lives in an accelerating process, with the Generative AI era serving as a brilliant example. While low-level specialization will still be required for a long time, many of our activities will inevitably transform, forcing us to shift from being programmers writing code to becoming architects who focus on conceptual design. It’s a natural progression that has happened many times before. Embrace it, stay away from extremes, and follow common sense. In this way, we are destined for success.

CTO's Perspective: Ethical & Effective GenAI Adoption

Illusion of absolute power

Coming down to earth

High-quality code and architecture

Finding the middle ground

The Transformative Impact of AI in Retail Pricing

Conclusion

Related articles

Best Analytics / BI Solutions

Price Optimization Solutions

Trusted Vendor 2022

Top 3 startups at the AI Summit

Now Tech: Pricing and Promotion

G2 High Performer 2022

CTO's Perspective: Ethical & Effective GenAI Adoption

Illusion of absolute power

Coming down to earth

High-quality code and architecture

Finding the middle ground

The Transformative Impact of AI in Retail Pricing

Conclusion

Related articles

Get retail insights

Stay updated on retail news

Best Analytics / BI Solutions

Price Optimization Solutions

Trusted Vendor 2022

Top 3 startups at the AI Summit

Now Tech: Pricing and Promotion

G2 High Performer 2022