The context window
In recent years, large language models such as GPT4o, Claude-3.7-Sonnet or Deepseek-V3 have changed the way we interact with technology. We can now talk directly to these systems in apps such as ChatGPT and receive direct responses to our queries. The models seem to remember previous conversations and provide contextual feedback.
But how exactly does this actually work? Technically speaking, LLMs only generate responses to individual queries and do not have an integrated memory in their neural network. It's almost as if they have a kind of memory, although technically speaking they don't have a memory of their own.
If you have a longer conversation with a Large Language Model (LLM), the so-called context window comes into play. What happens is that the LLM not only receives the latest requests and responses, but also all previous messages. The longer the conversation lasts, the more extensive the text that is fed into the language model becomes.
However, LLMs have a limitation: they have a fixed context window that defines how much text they can store and process at a time. If a conversation becomes too extensive, the neural network may no longer remember all previous questions and answers. The result is that it loses the context of the conversation and thus provides less precise or even poor answers.
But Large language models are developing rapidly and their context windows are becoming ever more extensive. Some models, such as Gemini 1.5 or Qwen-2.5-Turbo, are already reaching the millions, and this trend is continuing.
The scale of this development means that in the near future entire books can be easily integrated into a Large Language Model (LLM). This opens up ever better possibilities for information processing and knowledge integration.
The situation is different for smaller models, especially for local AI systems. Context windows still play a decisive role here. These models have a more compact design and therefore have significantly smaller context windows, which limits their ability to absorb and process information.
This has a direct impact on the applicability of such systems. Smaller models have to be cleverer with their limited memory space. They develop strategies to select and prioritize important information.
One example is Retrieval Augmented Generation (RAG), which makes it possible to pull specific knowledge from a database in response to a query and use it to feed the LLM without needing the entire document. But more on this in another blog post.
I hope I was able to give you an understanding of what the context window is and hope you continue to have fun with AI. Until next time.
Best regards
Thomas from the Quantyverse
P.S.: Visit my website Quantyverse.ai for products, bonus content, blog posts and more