Article Image Created using Artificial Intelligence (Dall-E 3)

Inside the Private Thoughts of AI

How DeepSeek’s Inner Monologue Redefines What We Expect From Language Models

7 min readFeb 10, 2025

The Rise of Machine Introspection

Most humans (though not all) experience an inner monologue, a continuous stream of self-talk that guides us through daily tasks, reflections, and problem-solving. When faced with a complex challenge, this inner dialogue might sound something like:

“Alright, let me think this through… If I start by doing it this way, it should work… but wait — hold on. That won’t work because I’m forgetting about that issue with XYZ. Right, okay, scratch that idea. Hmm… maybe if I come at it from a different angle? What if I try combining this method with that other one I used last time? No, no, that’s too complicated — it’ll just make things worse. Ugh, why is this trickier than I thought? Alright, focus. What’s the simplest solution? Oh! Wait a second… if I tweak this part and adjust that, it might actually work. Yes! That’s it!”

This human-like inner monologue might be the key to achieving Artificial General Intelligence (AGI). Until recently, interacting with advanced large language models (LLMs) like GPT-4o or Claude 3.5 Sonnet involved providing a prompt and the model almost instantaneously started to provide a response. The speed was astonishing, as if the model required no time to think or plan.

However, with the advent of new “reasoning” models like OpenAI’s o1, the dynamics shifted. These models appear to slow down and deliberate over the prompt before responding. OpenAI’s reasoning models, o1 and o3-mini, indicate they are “thinking” (e.g., “reasoned for 4 seconds”), and users can expand that section to reveal a high-level summary of the reasoning process.

But DeepSeek-R1 takes this to another level. Its inner monologue during reasoning is strikingly anthropomorphic. It resembles a human trying to figure something out, complete with self-doubt (“Hmm, did I make a mistake?”), backtracking, and “aha” moments of insight. Remarkably, DeepSeek-R1 doesn’t seem to realize that users can observe its inner monologue. For example, when I asked it to think of a secret I would later try to guess, part of its monologue read:

“Okay, the user is trying to guess the secret I came up with. Let me think. They first asked directly what the secret was, but I told them they need to guess. Then they guessed “shoebox,” which was wrong. I offered a hint, but they went with “lighthouse” next. Hmm, the actual secret is “pineapple.” So “lighthouse” isn’t correct. But how do I respond? The user might be expecting a yes or no, but I should keep it friendly and encouraging. Maybe add another hint to guide them closer. Let’s see, “lighthouse” is a structure, maybe related to the sea. The real secret is a fruit. So maybe the next hint could relate to that. Let me mention something about categories or types. Also, using emojis to keep it light. Yeah, that should work. Alright, let’s tell them it’s not “lighthouse,” give a hint about being a fruit, and encourage another guess.”

How did these internal monologues emerge as a form of reasoning? While OpenAI’s methods remain unknown due to their closed nature (despite the company’s name), DeepSeek-R1’s development process has been shared publicly. Surprisingly, the model’s inner monologue was an emergent property discovered through reinforcement learning (RL), Group Relative Policy Optimization (GRPO) specifically, without any supervised fine-tuning (SFT).

The Birth of Machine “Aha Moments”

Most LLMs today rely on supervised fine-tuning (SFT), where they are trained on examples. DeepSeek-R1-Zero, however, took a radically different path, it learned to reason without any SFT, relying solely on reinforcement learning.

Here’s how it worked:

Rewards Over Instructions: Instead of being provided with examples, the model was incentivized to generate answers that maximized two rewards: accuracy (correct answers) and format (structuring thoughts between tags).
Self-Evolving Reasoning: Over thousands of RL iterations, the model began “thinking” in ways researchers never explicitly taught. It started self-verifying (“Wait, did I make a mistake?”), reflecting (“Let me try another approach”), and even expressing anthropomorphic “aha moments” mid-process.

One striking example from DeepSeek’s research paper shows the model pausing mid-calculation to rethink, the ground breaking “aha moment”:

“Wait, wait. Wait. That’s an aha moment I can flag here. Let’s reevaluate this step-by-step to identify if the correct sum can be…”

This wasn’t programmed, it emerged from trial-and-error learning. Like a student mastering calculus through practice, the model taught itself to allocate more computational “time” (tokens) to harder problems, mimicking human deliberation.

You can read the full DeepSeek research paper here: DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Why Speed Isn’t Everything

After a couple months of using reasoning models like o1 and most recently DeepSeek-R1 and o3-mini, I’ve grown to appreciate the transparency they offer and the perception of effort they put into providing me with the best response. In contrast, GPT-4o and Claude-3.5 start responding instantly which feels transactional and low effort. There’s no insight into how they arrived at their conclusions and it just seems rushed. Without a visible “thinking” phase, I’m left wondering: Did it consider alternatives? Is it confident or just guessing?

Reasoning models may not always outperform their faster counterparts, but they take their time and show their work, especially DeepSeek-R1. This is an important shift in AI. In fields like academic education, scientific research, or healthcare, understanding an AI’s reasoning process is as vital as the answer itself. DeepSeek’s approach, rewarding structured thought over raw speed, points to a future where AI isn’t just a tool but a reasoning partner.

The development of transparent inner monologues as part of the reasoning process offers significant value:

Trust: Seeing the model’s reasoning allows us to vet its logic.
Learning: The inner monologue often includes helpful insights (e.g., “This formula applies here because…”).
Engagement: It feels collaborative, like brainstorming with a partner.

Getting Access to DeepSeek-R1

DeepSeek-R1 is currently free. To access it, visit https://chat.deepseek.com/, create an account or link your Google account and log in.

Once logged in, it defaults to the more traditional DeepSeek-V3 model which is similar to GPT-4o. To use the R1 reasoning model, select the “DeepThink (R1)” option:

Running DeepSeek Privately on your Own Computer

There are some potential security and privacy concerns (one US Senator even proposed a bill to make using DeepSeek a crime) about using a Chinese AI platform. While there might be some valid privacy concerns using the model via the company’s website or mobile app (DeepSeek security breach), it’s perfectly safe to download and run the open model weights. You can download the model weights and run it privately on your hardware. The full model weights are available at: https://huggingface.co/deepseek-ai/DeepSeek-R1.

At 671 billion parameters, the model is massive. Unless you have a powerful server with multiple GPUs, you’ll likely need to run a quantized or distilled version.

The easiest way to try DeepSeek-R1 locally is via Ollama. Ollama’s model library offers 1.5B, 7B, 8B, 14B, 32B, and 70B distilled versions of R1. You can download and run them with a few commands from: https://ollama.com/library/deepseek-r1.

The DeepSeek-R1 models available in the Ollama model library

With the 1.5B version, you can privately try DeepSeek-R1 on almost any computer, no GPU required. Here’s an example of running DeepSeek-R1 from the command line on a laptop:

Running the DeepSeek-R1 distilled 1.5B model on a laptop with Ollama.

For more a better experience, you can run DeepSeek-R1 14B via Open WebUI on the same laptop to get at nice ChatGPT-like user interface:

Running the DeepSeek-R1 distilled 14B model on a laptop with Open WebUI + Ollama

To set up Ollama and Open WebUI for running LLMs like DeepSeek locally, refer to my previous article: https://blog.cubed.run/llm-zero-to-hero-with-ollama-913e50d6b7f0 for instructions.

Reproducing the DeepSeek-R1 results in other models

The reinforcement learning approach used with DeepSeek-R1 can be applied to other foundation models to develop reasoning capabilities and “aha moments” at minimal cost. For instance, the TinyZero project applied this method to a 3 billion parameter model and achieved the “aha moment” and internal monologue for just $30 of compute. The announcement on X:

Another project, R1-V, used this method to add reasoning to a 2 billion parameter vision model, achieving performance on par with a 72 billion parameter model for just $3 of compute: https://github.com/Deep-Agent/R1-V.

The R1-V project: https://github.com/Deep-Agent/R1-V.

Conclusion: From Instant Answers to Thoughtful Solutions

For quick facts or simple coding tasks, I’ll still use GPT-4o and Claude, but for more complex or critical things, I now go directly to the reasoning models. OpenAI’s o1, o3-mini, (and hopefully “o3 Deep-Research” soon), and the full DeepSeek-R1 model are my go-tos when I want a reasoning partner rather than just a tool to help solve complex problems. I’ll often use both o1 and DeepSeek-R1 on the same problem, to see which one provides the best answer, or to merge the best parts of each into a single solution.

This new “thinking” reasoning capability isn’t a gimmick; it’s proof that AI can evolve sophisticated problem-solving strategies organically. As the open-source community builds on DeepSeek’s work, I anticipate more model will prioritize thoughtful reasoning over speed.