We are in the midst of a time in which Large Language Models are having a major impact. Thanks to chat interfaces such as ChatGPT, Claude or Le Chat by Mistral, anyone can now use this AI technology. What makes it so special is that it makes knowledge and certain forms of intelligence accessible to people - just like electricity or drinking water in different regions of the world.
But what exactly are Large Language Models? Put simply, you can think of them as text-based assistants that can recognize patterns and generate suitable answers to questions based on large amounts of training data. For example, if you ask for a recipe with coconut milk, ginger and vegetables, the LLM will probably suggest a curry recipe - simply because it has learned that these ingredients often go together. But how does an LLM actually "learn" and where does its extensive knowledge come from?
How does a Large Language Model learn?
A Large Language Model (LLM) is essentially based on a neural network with a special transformer architecture - but more on this in another article. These networks can be trained to develop certain skills. In the case of LLMs, this means that they are fed a huge amount of text - we are talking about data volumes in the range of several terabytes, which include large parts of the Internet such as Wikipedia, books, news articles and many other texts.
During training, the model learns to recognize patterns and correlations in texts. As a result, it can predict with a high degree of probability which word might follow next. This creates a kind of rough "world model" - albeit limited.
For example, if the model reads: 'Today the... is shining', it calculates from previously learned patterns that the next word is probably 'sun' and not 'banana'. This is exactly how meaningful texts are created.
The model doesn't really understand things in a human sense. It doesn't really understand concepts like gravity, it just reproduces relationships from the training data. It has no real understanding of physics. It generates and combines words very quickly on the basis of probabilities.
Nevertheless, large language models are real all-rounders and can support us in many areas of everyday life and work. Imagine you need help writing a blog post, want to quickly summarize a long article or explain complicated topics in an understandable way - this is exactly where these AI systems come into play. They can also be a great help when programming by generating code snippets or assisting with troubleshooting. All these applications have one thing in common: they make our work easier and faster. However, you should always check the answer for correctness if you as a user have the feeling that something might not be correct.
LLMs come in different sizes. The best models such as GPT4o, GPT-o1 or Claude 3.7-Sonnet are very large and require large amounts of computing capacity, which is why they have to be operated in data centers. This also means that we do not have complete control over how our data is processed or stored.
Meanwhile, there are also small but powerful Large Language Models such as Gemma 2 and Gemma 3 from Google. These smaller variants, often referred to as Small Language Models (SLMs), are compact enough to be run locally. In other words, on a home computer or even on a smartphone.
My focus in the Quantyverse is on these local application possibilities with SLMs - I develop applications that run directly on my own hardware. They can't compete with the big models from OpenAI or Anthropic. But even the smallest SLMs are getting more and more impressive and are already very helpful for many tasks. If you're curious about what I'm developing, feel free to drop by or create an account to stay up to date.
I hope this blog article has given you a good insight and that I have been able to introduce you to Large Language Models without getting too technical.
See you next time and best regards
Thomas from the Quantyverse
P.S.: Visit my website Quantyverse.ai for products, bonus content, blog posts and more