AINL#015 Augmented Intelligence in Investment Management Newsletter

Welcome to the 015 Edition of the Newsletter on Augmented Intelligence in Investment Management. Every two weeks, we deliver five unique insights tailored to empower investment decision-makers. Our insights are carefully curated by a seasoned team of market specialists. Unbiased, actionable and practical. They will help you navigate through noise.

 


AINL#15 SYNTHESIS


 

What do these recent developments mean for investment decision-makers?

 

1. AI Models Cannot Yet Be Trusted for High-Complexity Reasoning Tasks

Investors should remain cautious about over-relying on AI for non-linear, high-complexity tasks such as strategic asset allocation, tail-risk assessment, or geopolitical scenario planning. These tasks require abstract reasoning, judgment under uncertainty, and context-rich interpretation—areas where current frontier models collapse (Article 1). Treat current Gen AI outputs as pattern recognition tools, not cognitive substitutes for senior investment professionals. Use them for first-pass screening or summarisation—not for final decisions on complex investment theses.

 

2. Benchmarking and Prompt Sensitivity Undermine AI Consistency

When using LLMs for investment research or commentary generation, practitioners must be aware of benchmark instability and prompt sensitivity. Slight changes in input phrasing or task framing can lead to materially different outputs—undermining replicability and consistency in investment workflows. The study by Meincke et al. (Article 2) highlights that LLMs respond differently based on superficial prompt characteristics (e.g., politeness, formatting), and no universal benchmarks currently exist for evaluating “good” versus “bad” model performance. Consider running multi-prompt ensembles or cross-checking across models to stabilise outputs before they inform investment decisions.

 

3. AI Augmentation Works Best as a Collaborative, Iterative Process

The most effective use of Gen AI in professional investment settings involves collaborative iteration, combining divergent ideation with convergent expert review. AI should amplify, not replace, domain expertise—especially in idea generation and hypothesis formulation. Ivcevic & Grandinetti (Article 3) show that AI-generated ideas tend to converge across users, reducing originality if unchecked. The Harvard/Tuck sprint model (Article 5) offers a replicable structure for experimenting safely, using iterative loops, multiple models, and stakeholder personas. Set up AI-augmented investment sprints within teams to explore use cases like thematic investing or macro strategy updates.

 


TOP 5 ARTICLES


 

ARTICLE ONE

The Illusion of Thinking. Reasoning Models Run Against Complexity Wall. We Are Far From AGI.

ARTIFICIAL INTELLIGENCE | Apple | 06_2025 | Article

Important Development

Recent generations of frontier language models have introduced Large Reasoning Models (LRMs) that generate detailed thinking processes before providing answers.Apple just released a paper on AI “reasoning” models like Claude, DeepSeek-R1, and o3-mini, concluding that don’t reason. They memorize patterns well. Instead of using the same old math tests that AI companies love to brag about, Apple created fresh puzzle games. They tested Claude Thinking, DeepSeek-R1, and o3-mini on problems these models had never seen before.

All “reasoning” models hit a complexity wall where they completely collapse to 0% accuracy. No matter how much computing power you give them, they can’t solve harder problems.

Why Relevant to You?

We are far from AGI, no matter what the current GPT generation is claiming.

The research revealed three regimes:

  • Low complexity: Regular models actually win
  • Medium complexity: “Thinking” models show some advantage
  • High complexity: Everything breaks down completely

Most problems – including capital markets related – fall into that third category.

 


 

ARTICLE TWO

Prompting, Performance, and Perception: How Do We Evaluate LLMs

ARTIFICIAL INTELLIGENCE | Lennart Meincke, Ethan Mollick, Lilach Mollick and Dan Shapiro | 5_2025 | Article

Important Findings

As businesses and investors increasingly rely on large language models (LLMs) for decision-making and analysis, understanding how these systems perform is critical. This study evaluated the effectiveness of LLMs, delivering important insights. The paper also underscores a fundamental challenge in AI deployment: there is no universal standard for what defines a “good” or “bad” outcome, and different benchmarking choices can lead to vastly different conclusions about model quality.

Why Relevant to You?

This paper is highly relevant for investors exploring the use of AI, particularly large language models (LLMs), in decision-making and research. It reveals that small changes in prompt phrasing and benchmarking standards can significantly affect model performance, challenging assumptions about AI reliability. The paper also demonstrates that popular prompt-engineering “tricks” like politeness or formatting can have inconsistent effects

 


 

ARTICLE THREE

Artificial Intelligence as a Tool for Creativity

HUMAN & ARTIFICIAL INTELLIGENCE | Zorana Ivcevic, Mike Grandinetti | 08 2024 | ArticleArticle

Important Findings

Since the release of ChatGPT, there has been a lot of theoretical and empirical research on creative work produced by humans and artificial means. This article on AI augmentation presents research on how AI can be used for four levels of creativity: creativity in learning, everyday creativity, professional creativity and eminent creativity. 

It indicates, amongst others, that AI produces on average more original responses to divergent thinking tasks than humans. However, groups that used AI tools to generate ideas had ideas more similar to each other than groups that did not employ AI. If the goal is to generate the most creative ideas, the question becomes how to scaffold the use of AI tools to reach this goal.

Why Relevant to You?

Media articles and corporate presentations are filled with visions on use cases that could be optimized with AI. It is important to balance these ideas with factual learnings. It allows to correct initiatives for value creation. For most professionals, creating impact requires a mix of divergent and convergent thinking. Companies that value team performance have therefore to periodically tune the level of AI use / AI strategy beyond the classic spectrum focused on the performance of the individual AI augmented professional.

 


 

ARTICLE FOUR

Darwin Gödel Machine: Open-Ended Evolution of Self-Improving Agents

HUMAN & ARTIFICIAL INTELLIGENCE | Zhang et al. | 2025 | BlogArticle

Important Findings

The paper proposes the Darwin Gödel Machine (DGM), which is a self-improving AI that repeatedly modifies its own codebase in a beneficial manner through an iterative process of self-modification and evaluation. The paper applies this concept to code generation. In this context, the DGM employs foundation models to propose code improvements during the self-modification stage. Then, industry coding benchmarks are used for evaluation. The authors’ experiments indicate that the DGM automatically improves its coding capabilities by more than 100% based on the used industry benchmarks.

Why Relevant to You?

Today’s AI systems have human-designed, fixed architectures and cannot autonomously and continuously improve themselves. The approach presented in the paper shows a potential avenue for the automated advance of AI. If done safely, that would accelerate AI development and allow us to reap its benefits much sooner.

 


 

ARTICLE FIVE

Purposeful play with Gen AI: from students to investment teams

HUMAN & ARTIFICIAL INTELLIGENCE | Harvard Business Impact, Inspiring Minds | October 2024 | When it comes to Gen AI, Let your students play

Practical Experimentation 

In this article, professors at Tuck School of Business at Dartmouth College share an innovative “Gen AI sprint” approach – a short, hands-on, team-based exercise designed to let participants experiment freely with generative AI in real-world consulting use cases. They highlight three key takeaways.

First, effective prompting techniques can help “increase the creative surface area” of ideas, and these ideas can be further honed and refined through iterative loops, including the use of multiple AI models in succession. Second, generating compelling and engaging communication around ideas can be enhanced through relatively simple, iterative integration of Gen AI tools. Finally, using personas with differing perspectives allows for critical feedback and idea testing. Faculty shifted from the role of experts to co-explorers, fostering a mindset of curiosity and shared discovery, a valuable quality for investment leaders.

Why Relevant to You?

For investment professionals, this approach offers a practical blueprint for experimenting with diverse use cases while building AI fluency. Teams might run a sprint to explore activities such as developing new investment hypotheses, conducting macroeconomic analysis or rethinking client reporting – all of which can drive innovation and value. In addition, Gen AI sprints can foster a culture of collective learning and help lower barriers to AI engagement across teams and levels of hierarchy.