AINL#015 Augmented Intelligence in Investment Management Newsletter

Welcome to the 015 Edition of the Newsletter on Augmented Intelligence in Investment Management. Every two weeks, we deliver five unique insights tailored to empower investment decision-makers. Our insights are carefully curated by a seasoned team of market specialists. Unbiased, actionable and practical. They will help you navigate through noise.
AINL#15 SYNTHESIS
What do these recent developments mean for investment decision-makers?
1. AI Models Cannot Yet Be Trusted for High-Complexity Reasoning Tasks
Investors should remain cautious about over-relying on AI for non-linear, high-complexity tasks such as strategic asset allocation, tail-risk assessment, or geopolitical scenario planning. These tasks require abstract reasoning, judgment under uncertainty, and context-rich interpretation—areas where current frontier models collapse (Article 1). Treat current Gen AI outputs as pattern recognition tools, not cognitive substitutes for senior investment professionals. Use them for first-pass screening or summarisation—not for final decisions on complex investment theses.
2. Benchmarking and Prompt Sensitivity Undermine AI Consistency
When using LLMs for investment research or commentary generation, practitioners must be aware of benchmark instability and prompt sensitivity. Slight changes in input phrasing or task framing can lead to materially different outputs—undermining replicability and consistency in investment workflows. The study by Meincke et al. (Article 2) highlights that LLMs respond differently based on superficial prompt characteristics (e.g., politeness, formatting), and no universal benchmarks currently exist for evaluating “good” versus “bad” model performance. Consider running multi-prompt ensembles or cross-checking across models to stabilise outputs before they inform investment decisions.
3. AI Augmentation Works Best as a Collaborative, Iterative Process
The most effective use of Gen AI in professional investment settings involves collaborative iteration, combining divergent ideation with convergent expert review. AI should amplify, not replace, domain expertise—especially in idea generation and hypothesis formulation. Ivcevic & Grandinetti (Article 3) show that AI-generated ideas tend to converge across users, reducing originality if unchecked. The Harvard/Tuck sprint model (Article 5) offers a replicable structure for experimenting safely, using iterative loops, multiple models, and stakeholder personas. Set up AI-augmented investment sprints within teams to explore use cases like thematic investing or macro strategy updates.
TOP 5 ARTICLES
ARTICLE ONE
The Illusion of Thinking. Reasoning Models Run Against Complexity Wall. We Are Far From AGI.
ARTIFICIAL INTELLIGENCE | Apple | 06_2025 | Article
Important Development
Recent generations of frontier language models have introduced Large Reasoning Models (LRMs) that generate detailed thinking processes before providing answers.Apple just released a paper on AI “reasoning” models like Claude, DeepSeek-R1, and o3-mini, concluding that don’t reason. They memorize patterns well. Instead of using the same old math tests that AI companies love to brag about, Apple created fresh puzzle games. They tested Claude Thinking, DeepSeek-R1, and o3-mini on problems these models had never seen before.
All “reasoning” models hit a complexity wall where they completely collapse to 0% accuracy. No matter how much computing power you give them, they can’t solve harder problems.
Why Relevant to You?
We are far from AGI, no matter what the current GPT generation is claiming.
The research revealed three regimes:
- Low complexity: Regular models actually win
- Medium complexity: “Thinking” models show some advantage
- High complexity: Everything breaks down completely
Most problems – including capital markets related – fall into that third category.
ARTICLE TWO
Prompting, Performance, and Perception: How Do We Evaluate LLMs
ARTIFICIAL INTELLIGENCE | Lennart Meincke, Ethan Mollick, Lilach Mollick and Dan Shapiro | 5_2025 | Article
Important Findings
As businesses and investors increasingly rely on large language models (LLMs) for decision-making and analysis, understanding how these systems perform is critical. This study evaluated the effectiveness of LLMs, delivering important insights. The paper also underscores a fundamental challenge in AI deployment: there is no universal standard for what defines a “good” or “bad” outcome, and different benchmarking choices can lead to vastly different conclusions about model quality.
Why Relevant to You?
This paper is highly relevant for investors exploring the use of AI, particularly large language models (LLMs), in decision-making and research. It reveals that small changes in prompt phrasing and benchmarking standards can significantly affect model performance, challenging assumptions about AI reliability. The paper also demonstrates that popular prompt-engineering “tricks” like politeness or formatting can have inconsistent effects
ARTICLE THREE
Artificial Intelligence as a Tool for Creativity
HUMAN & ARTIFICIAL INTELLIGENCE | Zorana Ivcevic, Mike Grandinetti | 08 2024 | ArticleArticle
Important Findings
Since the release of ChatGPT, there has been a lot of theoretical and empirical research on creative work produced by humans and artificial means. This article on AI augmentation presents research on how AI can be used for four levels of creativity: creativity in learning, everyday creativity, professional creativity and eminent creativity.
It indicates, amongst others, that AI produces on average more original responses to divergent thinking tasks than humans. However, groups that used AI tools to generate ideas had ideas more similar to each other than groups that did not employ AI. If the goal is to generate the most creative ideas, the question becomes how to scaffold the use of AI tools to reach this goal.
Why Relevant to You?
Media articles and corporate presentations are filled with visions on use cases that could be optimized with AI. It is important to balance these ideas with factual learnings. It allows to correct initiatives for value creation. For most professionals, creating impact requires a mix of divergent and convergent thinking. Companies that value team performance have therefore to periodically tune the level of AI use / AI strategy beyond the classic spectrum focused on the performance of the individual AI augmented professional.
ARTICLE FOUR
Darwin Gödel Machine: Open-Ended Evolution of Self-Improving Agents
HUMAN & ARTIFICIAL INTELLIGENCE | Zhang et al. | 2025 | Blog, Article
Important Findings
The paper proposes the Darwin Gödel Machine (DGM), which is a self-improving AI that repeatedly modifies its own codebase in a beneficial manner through an iterative process of self-modification and evaluation. The paper applies this concept to code generation. In this context, the DGM employs foundation models to propose code improvements during the self-modification stage. Then, industry coding benchmarks are used for evaluation. The authors’ experiments indicate that the DGM automatically improves its coding capabilities by more than 100% based on the used industry benchmarks.
Why Relevant to You?
Today’s AI systems have human-designed, fixed architectures and cannot autonomously and continuously improve themselves. The approach presented in the paper shows a potential avenue for the automated advance of AI. If done safely, that would accelerate AI development and allow us to reap its benefits much sooner.
ARTICLE FIVE
Purposeful play with Gen AI: from students to investment teams
HUMAN & ARTIFICIAL INTELLIGENCE | Harvard Business Impact, Inspiring Minds | October 2024 | When it comes to Gen AI, Let your students play
Practical Experimentation
In this article, professors at Tuck School of Business at Dartmouth College share an innovative “Gen AI sprint” approach – a short, hands-on, team-based exercise designed to let participants experiment freely with generative AI in real-world consulting use cases. They highlight three key takeaways.
First, effective prompting techniques can help “increase the creative surface area” of ideas, and these ideas can be further honed and refined through iterative loops, including the use of multiple AI models in succession. Second, generating compelling and engaging communication around ideas can be enhanced through relatively simple, iterative integration of Gen AI tools. Finally, using personas with differing perspectives allows for critical feedback and idea testing. Faculty shifted from the role of experts to co-explorers, fostering a mindset of curiosity and shared discovery, a valuable quality for investment leaders.
Why Relevant to You?
For investment professionals, this approach offers a practical blueprint for experimenting with diverse use cases while building AI fluency. Teams might run a sprint to explore activities such as developing new investment hypotheses, conducting macroeconomic analysis or rethinking client reporting – all of which can drive innovation and value. In addition, Gen AI sprints can foster a culture of collective learning and help lower barriers to AI engagement across teams and levels of hierarchy.
