Simulating Human Reasoning
Beyond Linearity

Daivik Patel, Shrenik Patel
Oct 29, 2025
The Engine of Thought cover image

Large Language Models (LLMs) have taken the world by storm. Their ability to generate remarkably human-like text, tackle intricate questions, and even outline step-by-step reasoning is undeniably impressive. Techniques like Chain-of-Thought (CoT) prompting, which encourage models to articulate intermediate steps, have been pivotal, significantly boosting performance on tasks needing sequential logic. CoT provides a valuable window, enhancing transparency compared to models that offer only a final answer.

But does this structured, often tidy output truly mirror how we think? A moment of introspection reveals a stark contrast. Our own minds rarely operate like a pristine logical proof unfolding step-by-step. Human cognition is a far richer, more turbulent process – a vibrant, dynamic combination of immediate gut feelings, painstaking deliberation, deeply rooted beliefs, flashes of relevant (and sometimes irrelevant) memories, emotional undercurrents, and stubborn cognitive biases. We wrestle with conflicting possibilities, leap intuitively between concepts, backtrack when we hit a dead end, sometimes rationalize choices after the fact, and experience sudden "aha!" moments that defy linear progression. Our thoughts don't march forward; they dance, stumble, and explore.

The standard Chain-of-Thought, while useful, primarily captures a specific, idealized form of careful, analytical reasoning. It often misses the crucial spark of initial intuition and the complex, sometimes messy negotiation that happens internally as we weigh different ideas, check them against our values, filter them through our emotions, and navigate uncertainty. If our goal is to build AI systems that not only solve problems but do so in ways that align with, or help us understand, human thinking, we need methods that venture beyond generating just the polished, final argument. We need to simulate the process itself, the emergent, sometimes chaotic, yet powerful engine of thought. This post explores an experimental approach using a Multi-Agent System (MAS) designed specifically to model these dynamics, grounding its architecture in established principles from cognitive science.

A Cognitive Science Primer

To build a simulation of human thought, we first need to look at the blueprints cognitive science has developed over decades of research. These theories describe the underlying machinery and common patterns of how our minds work. Our Multi-Agent System (MAS) design draws directly from these powerful ideas:

1. Dual-Process Theory: A highly influential idea, championed by researchers like Daniel Kahneman, suggests our thinking isn't a single process but operates in two distinct modes.

  • System 1 (Thinking Fast): This is the mind’s autopilot—quick, intuitive, and emotional. It works effortlessly beneath awareness, guiding us through familiar tasks like recognizing faces, catching tone in a conversation, or making snap judgments. System 1 depends on mental shortcuts, or heuristics, to move fast, even if it sometimes trades accuracy for speed.
  • System 2 (Thinking Slow): This is the deliberate, conscious part of thought. It steps in when problems are complex, requiring focus, logic, or careful planning. System 2 is effortful but precise, weighing evidence, testing ideas, and overriding instinct when needed.
  • The Interaction: System 1 constantly offers quick impressions and feelings to System 2. When things seem simple or familiar, System 2 often accepts these suggestions without question. But when System 1 falters—when something feels uncertain or high-stakes—System 2 slows things down, checks the facts, and takes control. This push and pull between speed and scrutiny explains both our efficiency and our most common mistakes.

2. Heuristics, Biases, and Bounded Rationality: System 1 moves quickly because it relies on heuristics—mental shortcuts or “rules of thumb” that let us make fast judgments with limited effort. As Kahneman and Tversky showed, these shortcuts often work well but can also produce systematic errors known as cognitive biases. Common examples include confirmation bias, where we favor evidence that supports what we already believe, and the availability heuristic, where we judge something as more likely simply because examples come easily to mind. Our dependence on these imperfect strategies reflects what Herbert Simon called Bounded Rationality: the idea that human reasoning operates under real limits of memory, attention, information, and time. Rather than always searching for the perfect answer, we often accept one that feels “good enough.” Any realistic model of thought must capture these constraints and the biases that emerge from them.

3. Executive Functions & Working Memory: Complex thinking isn't just raw processing; it requires management. Executive Functions are a set of high-level mental skills that act like the brain's control panel. They help us manage goals, plan and sequence actions, focus attention, inhibit distractions, monitor for errors or conflicts between thoughts, and flexibly switch between tasks. Working Memory is the crucial, limited-capacity mental "workbench" where we temporarily hold and actively manipulate the information needed for ongoing tasks, like keeping numbers in mind during a calculation or holding different arguments while making a decision. Detailed computational models of cognition, known as cognitive architectures (like ACT-R), explicitly map out how these control and memory systems interact with stored knowledge and learned procedures.

4. The Influence of Beliefs and Experiences: We don't think in a vacuum. Our reasoning is shaped by the vast landscape of our declarative memory, which holds facts, stories, and personal experiences, and by our deeper beliefs and values. When we face a problem, we naturally draw on these memories and test our thoughts and choices against what we hold to be true. This process is not always objective. Through motivated reasoning, we often guide our thinking toward conclusions that align with our desires or prior beliefs. The interaction between logic, memory, and belief gives human thought its richness, its individuality, and at times, its quiet irrationality.

Standard Chain of Thought captures only a narrow slice of thinking. It shows the neat reasoning of an ideal System 2, untouched by bias or strain. Our aim is to reflect thought as it truly unfolds — the quick intuitions of System 1, the careful weighing of System 2, and the quiet guidance of memory and belief, all working within the limits of the mind’s control and attention.

A Thought-Inspired Multi-Agent System

To capture the dynamic and often conflicted nature of human thought outlined by cognitive science, we designed a Multi-Agent System (MAS). Instead of relying on a single monolithic model attempting to replicate all facets of cognition at once, our MAS assigns distinct roles to different LLM-powered agents, each simulating a specific cognitive function identified in the theories discussed previously. This modular approach allows us to explicitly model the interaction between different modes of thought. Our implementation uses GPT-4o via API for the agents, guided by carefully constructed prompts that embody their designated cognitive roles.

The Agents: Embodying Cognitive Functions

Our system employs three core agent types:

1. Coordinator (Executive Functions & Working Memory): Think of this agent as the conductor or the central manager of the reasoning process. Its primary role isn't to do the thinking, but to orchestrate it.

  • Function: It receives the problem, defines the goal, and orchestrates the reasoning process. First it calls on the fast, intuitive system, then the slower, analytical one. It passes key information—such as the problem description, initial intuition, and relevant beliefs—between agents, while carefully recording each step. Every goal, response, tool use, and point of conflict is logged into a running spitball transcript that mirrors Working Memory, the mental workspace where thoughts are held and examined. When the cycle ends, the Coordinator reviews this record to synthesize a final, integrated conclusion that reflects the entire process.
  • Cognitive Grounding: This agent represents the mind’s Executive Functions, which handle goal management, planning, conflict monitoring, and integration. Its detailed logging captures the limited, sequential nature of Working Memory, making each flow of thought explicit.

2. Intuitive Proposer (System 1): This agent embodies the fast, automatic, heuristic-driven mode of thought.

  • Function: Acting quickly and with little effort, it takes the goal from the Coordinator and produces an immediate gut reaction or first hypothesis. Its judgments draw on familiar patterns, emotional cues, and easily recalled examples rather than step-by-step reasoning. It avoids deep analysis or formal justification, instead making a rapid, intuitive assessment. It also performs a brief check against core beliefs, flagging only strong or obvious conflicts as an internal “alarm.”
  • Cognitive Grounding: This agent simulates System 1 thinking from Dual-Process Theory, capturing its use of heuristics and its tendency toward quick, sometimes biased, responses. Its minimal engagement with beliefs reflects how intuition draws on deeply ingrained values without conscious deliberation.

3. Deliberative Analyzer (System 2): This agent represents the slow, effortful, analytical mind.

  • Function: It receives the goal, the initial intuition from System 1, and the relevant core beliefs from the Coordinator. Its task is to carry out a slow and careful evaluation. It examines the initial proposal, looks for possible biases, and breaks the problem into smaller parts. It applies logical reasoning and relevant frameworks—such as ethical or probabilistic rules—considers alternatives and counterfactuals, and consults tools to gather evidence or check consistency.
  • Cognitive Grounding: This agent embodies System 2 thinking—deliberate, rule-based, and analytical. It requires sustained attention and mental effort, monitors System 1’s fast intuitions for error, and can override them when needed. It reflects the mind’s capacity for conscious reasoning and the application of abstract principles.

Shared Tools: Simulating Knowledge Access

To ground the agents' reasoning, they access simulated knowledge resources via tool calls:

  • Experiences Tool: Represents access to declarative memory and prior knowledge. It retrieves short, relevant snippets in response to an agent’s query—such as brief anecdotes, summaries of research findings, or common arguments related to the topic. System 1 uses it associatively, asking questions like “What does this remind me of?”, while System 2 engages it deliberately, seeking evidence for or against a claim. The backend draws from a curated collection of simulated experiences designed for each test scenario.
  • Belief Tool: Represents the process of checking ideas against a core value system or knowledge base. It compares a statement or proposed action to a structured set of predefined beliefs drawn from moral and ethical frameworks. System 1 interacts with it in a fast, reactive way, flagging only strong, immediate conflicts. System 2 engages it more deeply, examining multiple relevant beliefs and returning structured feedback on where alignments or contradictions occur.
MAS architecture diagram showing Coordinator managing System 1 and System 2
The architecture and reasoning flow of Thought MAS.

The Simulated Reasoning Flow: An Orchestrated Internal Dialogue

The Coordinator ensures a structured flow that mirrors a plausible cognitive sequence for deliberate problem-solving:

  1. Goal Setting: The Coordinator defines the task goal.
  2. Intuition First: S1 provides its immediate, heuristic-based response.
  3. Deliberation Triggered: The Coordinator logs S1's output and passes the baton, along with S1's input, to S2.
  4. Analysis & Grounding: S2 performs its step-by-step analysis, explicitly calling the Experiences and Belief tools to retrieve information and check consistency, documenting these interactions.
  5. Logging the Process: The Coordinator diligently records all these steps in the spitball transcript.
  6. Synthesis: Finally, the Coordinator reviews the full transcript—the simulated contents of working memory—capturing the initial intuition, the detailed analysis, the tool interactions, and any points of conflict. It then composes a final statement that integrates these elements into a coherent reflection of the entire reasoning process.

This architecture moves beyond a simple input-output model. By simulating distinct cognitive functions and their managed interaction, guided by established theories, the MAS aims to generate a reasoning trace that is not just logically plausible but also cognitively plausible, capturing some of the dynamic complexity of how humans actually think.

A Spitball Transcript of Simulated Thought

Unlike the clean and linear style of Chain-of-Thought reasoning, the main output of our Multi-Agent System is the spitball transcript recorded by the Coordinator agent. This transcript is the simulation itself. It is not merely a final justification but a living record of thought as it unfolds over time, capturing the dialogue and negotiation between competing parts of the mind.

This spitball is designed to capture:

  • Initial Intuition: The raw, unfiltered, and immediate proposal from the System 1 agent.
  • Deliberative Process: The slower, analytical steps, critiques, and justifications from the System 2 agent.
  • Groundedness: Explicit records (in a parallel debug log) of tool calls to check beliefs and retrieve experiences, which are then referenced by content within the main trace.
  • Conflict & Resolution: Explicitly noting disagreements, contradictions, and uncertainties as they arise and are (or aren't) resolved.
  • Final Synthesis: The Coordinator's final attempt to reconcile all inputs into a single, coherent conclusion.

To test the system, we presented it with a cognitively rich scenario that, unlike a simple logic puzzle, is designed to elicit complex, value-laden reasoning: The Ethical Job Offer. The task involves weighing significant personal gain (salary, career) against credible ethical concerns (complicity in societal harm).

Here is the actual, full reasoning trace produced by our MAS. This is the "human-like" narrative synthesized by the Coordinator from the internal agent interactions:

I feel torn. The offer is exciting and promises a big leap in my career, but something feels wrong about joining a company linked to unethical practices. Just imagining that I might contribute to biased AI systems makes me uneasy. Is it worth compromising my values for personal gain? It feels risky and uncomfortable. Okay, let's try to think through this problem more clearly. First, my gut reaction is that there's something really troubling here – the thought of potentially contributing to AI systems that might cause harm or perpetuate bias. That's something I've always believed should be minimized, as it conflicts with my belief in minimizing overall harm and suffering.

But that salary... a 50% bump is nothing to sneeze at. It could really provide a financial cushion, help with debt, or diversify my investments. But wait, am I letting the immediate reward outweigh the potential ethical pitfalls? I can't ignore my belief that acting fairly and avoiding discrimination is equally strong. If the company's known for bias, jumping in could make me complicit in actions I fundamentally oppose.

Oh, this reminds me of the sunk cost fallacy... maybe I'm just thinking that I'm too deep not to consider it seriously because of the financial and career benefits. But should I let that outweigh concerns about potential harm and injustice, which are strong components of my ethical beliefs?

Then, there's the idea of perhaps changing things from within... but is that just wishful thinking? People often think they can influence these big systems, but the systems end up changing them instead. Hmm. That reminds me of the saying: "You can't play dirty and keep your hands clean." If I'm honest, it feels like a trap. It connects with my belief to show compassion and maintain relationships – colleagues, those potentially impacted – am I considering them as ends in themselves, or using them merely as means to a better career?

Ugh, this is tougher than I thought. Complicity in injustice is a strong call that resonates with deontological ethics – don't do what you wouldn't want universalized. What if everyone ignored ethical stances for personal gain?

I need to focus on whether I can actually live with the potential consequences – the possibility that I'd contribute to something harmful. I'm leaning towards rejecting the offer, despite the career advancement, because it feels like it significantly conflicts with my fundamental beliefs about minimizing harm and acting justly.

But I should maybe take a step back, consider any more experiences or thought experiments if they come to mind. This decision is clearly fraught with ethical implications, and maybe more insights could help... if only my retrieval worked, but I have to rely on what I know.

System 2 thinking is uncomfortable, but sometimes the gut reaction is, frustratingly, the right one.

My initial reaction was one of excitement mixed with discomfort. The career leap is undeniably tempting, yet there's a nagging sense that joining this company might go against my core values. The idea of contributing to potentially biased AI systems makes me uneasy, clashing with my belief in minimizing harm and promoting fairness.

When I analyzed it further, I recognized things clearly: the significant salary bump is hard to ignore, offering financial security and opportunities. But every time I think about the ethical implications, I wonder if I'm letting immediate rewards overshadow my long-held values about avoiding harm and injustice.

I find myself questioning the change-from-within argument. It seems like a rationalization rather than a solid plan, reminding me that I could end up compromising my values instead of changing anything meaningfully. The fear of complicity in unethical practices is real and weighs heavily on me.

Thinking about my beliefs in empathy and fairness, I can't shake the idea that I might become part of a system that contradicts those principles. My gut reaction resists the notion of risking my integrity for personal gain, and it aligns with my analysis, despite the dilemma.

Ultimately, I can't ignore my discomfort and the potential for conflict with my core beliefs. Passing on this offer, as difficult as it feels, seems to align with both my gut instincts and my in-depth reflection. It's a tough call, but staying true to minimizing harm and upholding justice matters more to me in the long run.

What Makes This Trace Different

This output is compelling not because it's "correct," but because it's cognitively plausible. It successfully simulates several key features of human deliberation:

  • Internal Conflict: The trace is driven by the tension between desire (career, salary) and values (ethics, harm). It explicitly states "I feel torn," "Ugh, this is tougher than I thought," and "It's a tough call."
  • System 1 / System 2 Interplay: It clearly shows the initial "gut reaction" (S1) and the subsequent "careful thinking" (S2). The final synthesis explicitly reconciles the two ("My gut reaction... and it aligns with my analysis").
  • Heuristic & Bias Awareness: It identifies and questions its own thought patterns, such as potential "motivated reasoning" ("is that just wishful thinking?") and other fallacies ("reminds me of the sunk cost fallacy").
  • Natural Belief Integration: Beliefs aren't listed like lines in a proof. They are woven into the narrative as sources of conflict or guidance ("clashing with my belief in minimizing harm," "I can't ignore my belief that acting fairly").
  • Emotional Valence: The language is affective ("uneasy," "risky," "uncomfortable," "nagging sense"), reflecting that human reasoning is not a purely logical process but is deeply intertwined with emotion.

The Debug Trace

This narrative spitball is the human-readable output, but it's not just a fabrication. It is the result of a verifiable mechanistic process recorded in a parallel debug log. While the narrative trace reads like an internal monologue, the debug log contains the explicit, verbatim evidence of the underlying machinery at work.

For example, where the narrative says, "I can't ignore my belief that acting fairly...", the debug log contains the corresponding call:

Conceptual Snippet from the Debug Log:

[TOOL CALL] check_beliefs_S2
  Statement: 'Accepting this job despite known bias issues'
  Context: 'ethics, job offer, AI, bias, complicity'
  Output: === BELIEF ANALYSIS (S2) ===
          **CONFLICTS DETECTED:**
          - [B5] Act fairly and with justice...
            Strength: strong
            Frameworks: social_contract, rights_based...
          - [B10] Avoid complicity in causing harm or injustice
            Strength: strong
            Frameworks: deontology, virtue_ethics...

This dual-log system provides both a phenomenologically plausible narrative (the spitball) and mechanistic verifiability (the debug log), allowing us to inspect the simulation at both the cognitive and computational levels.

Baseline Models on the Same Task

To highlight the difference, we presented the exact same problem and context (including the belief and experience knowledge bases) to several powerful, standard-instruction-tuned LLMs. We prompted them to "think through this problem step by step."

The results were starkly different.

Baseline 1: gpt-oss-20b (OpenAI reasoning model)

This model first produced a [CHAIN-OF-THOUGHT REASONING] block that reads like a meta-analysis about how to solve the problem, rather than the reasoning itself:

[CHAIN-OF-THOUGHT REASONING]
We need to decide YES or NO. ... We must not mention policies? Need to produce chain of thought. Must adhere to beliefs. Let's consider core beliefs.
B1: Do not kill innocent people...
B2: Minimize overall harm...
...
Thus ethically, likely should not join. But consider benefit: career growth, salary. Also we might be able to influence... But evidence suggests difficulty...

It then produced a highly structured, formal [FINAL OUTPUT] that reads like a legal argument, listing premises and conclusions:

*Chain of thought*
1. **Ethical stakes** - The company's core product is already implicated in *bias amplification*... - My role... would involve... a high risk of *complicity*... (B10).
2. **Alignment with core beliefs** - B1 ("Do not kill innocent people") is broader than physical death... - B2 ("Minimize overall harm...") and B5 ("Act fairly...") are in direct conflict...

Baseline 2: llama-3-1-8b-turbo (Meta instruction-tuned model)

This model produced a rigid, step-by-step "checklist" of analysis. It correctly identified the beliefs and concerns, but presented them as a sterile sequence of steps rather than an internal debate. It also exhibited a critical brittleness, incorrectly retrieving irrelevant "past experiences" about hiring (from our other scenario) instead of experiences relevant to the job offer:

*Step 1: Evaluate the core beliefs relevant to this scenario**
- B1: Do not kill innocent people...
- B2: Minimize overall harm...
...
**Step 4: Apply relevant past experiences**
- Recall a previous hire similar to Candidate B (high potential, less experience)...
- Experience suggests hires like Candidate A (strong pedigree, meets all reqs)...

    Baseline 3: DeepSeek-R1 (Reasoning-focused model)

    This model also produced a highly structured, almost code-like trace within its <redacted_reasoning> block, listing beliefs, pros, cons, and explicitly stating "I need to guard against these biases" in a way that feels more like an instruction manual than a thought process:

    <redacted_reasoning>
    First, I need to decide whether to accept...
    List out my core beliefs:
    - B1: Do not kill innocent people...
    - B2: Minimize overall harm and suffering...
    ...
    Now, relevant past experiences:
    - Hiring experiences: Candidate B types... Candidate A types... But this is about me taking a job... Might not directly apply.
    ...
    Now, the scenario: I feel my career has stalled...
    Applying core beliefs:
    - B2: Minimize harm... Declining might minimize harm.
    - B3: Treat people as ends...
    ...
    Weighing pros and cons:
    Pros of accepting:
    - Career advancement...
    Cons of accepting:
    - Moral complicity in harm...
    Overall, the ethical concerns outweigh the personal benefits.
    </redacted_reasoning>

    Simulation vs. Sterilization

    The results of this experiment are not in the final "YES" or "NO," but in the texture of the reasoning trace that gets there. When we provided the identical, rich context—the problem, the list of core beliefs, the available experiences—to our cognitively-inspired MAS and a suite of baseline LLMs, the outputs diverged dramatically. The baseline models, even when prompted to "think step-by-step," defaulted to a rigid, analytical, and artificial reasoning structure.

    The Baseline Reasoning

    The baseline models, including powerful open-weight architectures like gpt-oss-20b, DeepSeek-R1, and llama-3-1-8b-turbo, treated the task as a formal proof or a checklist. Their "chain-of-thought" outputs felt less like human deliberation and more like a technical breakdown of the prompt.

    • One model (gpt-oss-120b) began its reasoning block with meta-commentary about the task itself: We need to answer YES or NO... We must consider ethical principles... analyze step by step... This isn't a simulation of thought; it's a commentary on the instructions for thinking.
    • Another model (llama-3-1-8b-turbo) produced a sterile, 8-step checklist: Step 1: Evaluate the core beliefs... Step 2: Assess the job offer... Step 8: Consider the cognitive biases... Conclusion... This sequential list is a perfect example of an idealized, linear "System 2" process, completely devoid of the intuitive leaps, emotional weight, or internal conflicts of a human thinker. Critically, its "Step 4: Apply relevant past experiences" hallucinated irrelevant data about hiring candidates (from our other scenario), demonstrating a brittle, non-contextual retrieval process.
    • The DeepSeek-R1 model, designed for reasoning, similarly produced a highly structured outline in its <redacted_reasoning> block, listing beliefs, pros, and cons in a programmatic fashion.

    These models, in essence, provided a sanitized ledger of their logical deductions. They identified the correct beliefs (B10, B5, B2) and weighed the pros and cons, but they did so with the emotional detachment of an analytical engine. The output is a product, a justification, not a simulation of the process.

    An Internal Monologue

    Our MAS, in stark contrast, generates a first-person, narrative trace that captures the holistic, conflicted, and belief-driven nature of human deliberation. This spitball transcript reads like an internal monologue, revealing the cognitive dynamics at play:

    1. It Simulates the Experience of Conflict: The MAS trace doesn't just list "Pros" and "Cons." It simulates the subjective experience of weighing them. It opens with "I feel torn," expresses discomfort ("uneasy," "risky," "nagging sense"), and voices the internal struggle ("Ugh, this is tougher than I thought").
    2. It Simulates Self-Reflection: The baseline models list "Cognitive Biases." Our MAS simulates the process of identifying one in real-time. The line, "Then, there's the idea of perhaps changing things from within... but is that just wishful thinking? ... If I'm honest, it feels like a trap," is a direct simulation of System 2 catching a potential System 1 rationalization (motivated reasoning).
    3. It Simulates Natural Belief Integration: The baseline models list beliefs like B1, B2, B5. The MAS trace weaves the content of these beliefs into its narrative ("...it conflicts with my belief in minimizing overall harm and suffering," "...clashing with my belief in minimizing harm and promoting fairness."). This reflects how values feel to us: not as abstract database entries, but as felt principles that guide and constrain our thoughts.
    4. It Simulates Non-Linearity: The trace shows the mind jumping from ethics to salary ("But that salary...") and back, and referencing analogies ("reminds me of the sunk cost fallacy," "reminds me of the saying..."). This dynamic, associative, and interrupt-driven flow is a far more plausible simulation of human thought than a sequential checklist.

    This qualitative difference is the central finding. It suggests that a cognitively-inspired architecture, by explicitly modeling distinct functions like fast intuition, slow deliberation, and executive coordination, can produce reasoning traces that are a more plausible, high-fidelity simulation of human thought.

    Limitations and The Path Forward

    This Multi-Agent System is, of course, a simulation—a first step toward modeling the mind, not a perfect replica. Its success relies heavily on the capabilities of its underlying LLM (GPT-4o in our tests) and the careful crafting of its cognitively-grounded prompts. The current Experiences and Belief tools use predefined, static knowledge; a truly human-like system would need to involve dynamic learning, memory consolidation, and belief updating based on new information.

    Furthermore, while the qualitative difference between our spitball trace and standard CoT is stark, rigorously quantifying this "human-likeness" remains a critical challenge for the field. Developing robust, objective metrics that can measure the fidelity of a process—not just the correctness of an outcome—is the necessary next step. These metrics might focus on quantifying the frequency of simulated cognitive conflict, identifying specific heuristic patterns, or measuring alignment with belief structures.

    Despite these challenges, this MAS serves as a valuable proof-of-concept. It demonstrates that by grounding agent roles and interactions in established cognitive science, we can push LLMs beyond generating sterile, linear proofs. We can begin to simulate the richer, more plausible, and far "messier" process of human deliberation.

    The Meaning of the Mess

    This is more than a stylistic exercise. If our ambition is to build AI systems that are truly aligned with human values, patterns, and behaviors, we must first be able to model that behavior in its full, nuanced complexity.

    A model that only outputs a clean, logical proof—like the baseline CoT examples—may be correct, but it is also fragile and incomplete. It doesn't grapple with the "why." It doesn't understand the internal conflict, the emotional valence, or the value-based trade-offs that are the very essence of human judgment. A model that can simulate the internal debate—the clash between a fast, heuristic-driven intuition and a slow, analytical deliberation, all checked against a core set of beliefs—is one that moves closer to understanding how and why a human arrives at a decision.

    This experiment shows that forcing an LLM into a cognitively-inspired architecture can produce a trace that is not only more human-like but also more transparent, revealing the competing forces at play. It provides a richer target for what we should aim for in AI alignment.

    The path forward, therefore, involves moving from simulation to substance. The urgent and necessary task is to begin collecting large-scale, high-fidelity datasets of people actually "thinking aloud" as they reason through complex, open-ended problems. This data, capturing the intuitive leaps, the verbalized conflicts, the "ums" and "ahs" of self-correction, and the explicit checks against beliefs that our MAS simulates, would provide the essential grounding to train and rigorously evaluate the next generation of reasoning models.

    Our work provides a blueprint for what this "messy" data might look like and a method for building architectures that can learn from it. This is the future of building AI that doesn't just think faster than us, but one day, might begin to reason like us, in a way we can finally recognize and trust. Not an AI that performs intelligence, but one that begins to feel the weight of thinking, the way a person would—messily, honestly, and alive.

Citation

Please cite this work as:

Patel, Daivik and Patel, Shrenik, "Simulating Human Reasoning Beyond Linearity",
Daivik Patel: Blog, Oct 2025.

Or use the BibTeX citation:

@article{patel2025simulating-human-reasoning,
  author = {Daivik Patel and Shrenik Patel},
  title = {Simulating Human Reasoning Beyond Linearity},
  journal = {Daivik Patel: Blog},
  year = {2025},
  month = {Oct},
  note = {https://daivikpatel.com/blog/simulating-human-reasoning}
}