Devlog #1: Ollama Chat SDK

The first step towards an AI app

Quantyverse

Mai 20, 2025

Why I start a devlog

There are good reasons to document projects. For me personally, these are above all:

- A look back: how have I developed and what have I learned?

- To share inspiration: Others can benefit from my journey

- Self-reflection: Understanding my work better and questioning it critically

I am starting my devlog based on these considerations. I will record my development steps here - as a reminder for myself and perhaps also as motivation or inspiration for others.

Ollama as a backend

I rely on local AI because I am convinced that AI systems should run on private hardware. There are various solutions for running Large Language Models (LLMs) directly on your own computer. However, I am particularly fond of Ollama. I will therefore use Ollama as the LLM backend for my AI applications.

QV Ollama SDK

Ollama not only impresses with its simple use of Large Language Models (LLMs), but also with its efficiency. The Python API is user-friendly and straightforward. When I started building an AI chat application, I quickly realized that I needed a small solution to have conversations with a full conversation history. This means that the LLM not only receives the latest message, but the entire context of previous requests and responses. To simplify this process, I developed a small Python SDK that allows messages to be saved in chat histories. The LLM's response can either be streamed or output as a whole. Here is an example of a simple output.

```python

from qv_ollama_sdk import OllamaChatClient

# Create a client with a system message

client = OllamaChatClient(

    model_name="gemma2:2b",

    system_message="You are a helpful assistant."

)

# Simple chat - uses Ollama's default parameters

response = client.chat("What is the capital of France?")

print(response)

# Continue the conversation

response = client.chat("And what is its population?")

print(response)

# Set specific parameters only when you need them

client.temperature = 1.0 # Using property setter

client.max_tokens = 500 # Using property setter

client.set_parameters(num_ctx=2048) # For multiple parameters

# Get conversation history

history = client.get_history()

```

Or if you want to stream the answer of the LLM:

```python

from qv_ollama_sdk import OllamaChatClient

client = OllamaChatClient(model_name="gemma2:2b")

# Stream the response

for chunk in client.stream_chat("Explain quantum computing."):

    print(chunk, end="", flush=True)

```

If you want to try it out, here is the link to the Github repo:

[QV Ollama SDK]

Maybe this little helper is also helpful for you. Next I will build a super simple user interface with which we can interact with an LLM.

Until the next devlog, best regards

Thomas from the Quantyverse

P.S.: Visit my website Quantyverse.ai for product, bonus content, blog posts and more