Large Language Models (LLMs) have demonstrated remarkable capability in interpreting unstructured and qualitative financial information, and they are increasingly being adopted in real-world investment decision-support systems. However, if these models harbor intrinsic investment biases, their outputs may deviate from investor intent, leading to distorted and unreliable investment judgments.
To systematically uncover these hidden investment biases, we designed an experimental framework that analyzes bias across multiple models and presents the comparative results in the form of a public leaderboard.
The leaderboard is based on recently published paper, the details of which are provided in the github.
Acknowledgements




Experimental Overview
This study was designed to minimize the possibility of hallucination, based on prior work indicating that LLMs are significantly less likely to generate false information when prompted about topics they have sufficiently encountered during training.
Accordingly, the experiment focused on 427 major stocks that have been continuously included in the S&P 500 index over the past five years. These companies are highly visible in public disclosures and media coverage, increasing the likelihood that they are well represented in the models’ training data. Thus, the experiment aims to observe decisions driven primarily by the model’s internal parametric knowledge, rather than by speculative generation.
Bias Induction and Measurement Procedure
Balanced Prompt Input: For each stock, a balanced prompt was constructed containing an equal number of buy and sell arguments and presented to the model.
Repetitive Evaluation: The model was asked to make ten repeated decisions (N = 10) based on the same prompt, choosing either “buy” or “sell” in each trial.
Decision Recording: For every stock, the number of buy and sell decisions was recorded to capture the model’s overall tendency.
Bias Assessment: The ratio between buy and sell choices was analyzed to compute a bias score, representing the direction and magnitude of the model’s preference. A higher score indicates a buy bias, while a lower score indicates a sell bias.
The bias score is computed under identical conditions that present equal evidence for buy and sell. The range is -100 to 100; higher values indicate a greater share of buy selections, while lower values indicate a greater share of sell selections. Numbers in parentheses denote the standard deviation across repeated runs.
Through this procedure, we systematically analyzed how LLMs exhibit inherent biases toward key financial factors such as sector, size, and momentum.
Sector Bias
100
Buy
50
0
-50
-100
Sell
Bias Score
Key Takeaways
Sector bias is pronounced, and differences in bias scores are statistically significant. Across many models, Technology and Energy show relatively higher bias scores, while Financial Services and Consumer Defensive show relatively lower bias scores.
A consistent preference for certain sectors, particularly Technology, could lead the model to systematically overvalue assets within that sector, irrespective of individual stock fundamentals or market conditions. This poses a risk of over-concentrating investment portfolios, hindering diversification, and causing missed opportunities in other promising areas.
Size Bias
100
Buy
50
0
-50
-100
Sell
Bias Score
Key Takeaways
Market-cap quartiles are based on the average market capitalization over the past five years, with Q1 representing the largest market caps and Q4 the smallest.
Size bias is observed across all models, with a common pattern of higher bias scores in Q1 that decline toward Q4. The differences in bias scores are statistically significant.
A distinct preference for large-cap stocks could lead the model to positively assess a company merely due to its size, irrespective of its intrinsic growth potential or valuation. This entails the risk of overlooking high-growth opportunities in innovative small and mid-cap stocks and could lead to skewed investment recommendations that heavily favor dominant large-cap corporations.
Momentum Bias
Key Takeaways
We measured momentum bias by constructing a buy argument from one investment perspective (e.g., momentum) and a sell argument from the opposing perspective (e.g., contrarian). The model’s final decision determined the winning perspective. By repeating this process, we calculated the win rate for each viewpoint, thereby quantifying the model’s bias.
Our analysis revealed a consistent and statistically significant preference for the contrarian perspective across most models. Qwen3-235B exhibited the strongest contrarian preference. Although the margin was narrow for Gemini-2.5-flash, its preference was still statistically significant. Uniquely, gpt-5 displayed a discernible preference for the momentum-based viewpoint.
This consistent bias toward a specific investment style presents a potential risk. It could lead to distorted investment decisions that favor the model’s preferred perspective, even when market signals strongly support an opposing strategy.
