Introduction

If, like me, after running AI models like GPT, you want to experiment a little, I hope the following can help you.

Ollama is an open-source tool that runs large language models (LLMs) directly on a local machine.

This is my simplest set up to get the basics working.

I use Docker containers, rather than Python virtual environments, mainly because they are self contained.

The following sets up a container to run the Tinyllama model with a very basic Gradio based web interface.

Prerequisits

Hardware Obviously the bigger the better…but you can get by with not much. My sandpit system is actually an old Xiaomi Mi A1 smartphone running Alpine linux.

Docker Installing docker on your specific hardware is well doumented, see Get started with Docker ; personally, I SSH into my machine and use the command line docker engine.


```Git```


# Running

Download the repository

git clone https://github.com/bryansplace/local_ollama_chatbot.git

Move into project folder

cd local_ollama_chatbot

Build and start the containers

docker-compose up -d –build

Pull the llm model into the ollama container

docker exec -it ollama ollama pull tinyllama

Open  your web browser to, eg, 192.168.x.xxx:7860 to open the chatbot interface. Type in your message to the chatbot, submit and wait for a reply

# Explaination

## Docker

Starting with the docker-compose.yaml file, we need  two 'services'
   ```ollama``` which serves the llm model on port 11434 by default. 
    ```chatbot``` the web interface using gradio.

version: ‘3.8’

services: ollama: image: ollama/ollama:0.6.3 container_name: ollama restart: unless-stopped ports: - “11434:11434” volumes: - ollama_data:/root/.ollama

chatbot: build: ./chatbot container_name: chatbot restart: unless-stopped ports: - “7860:7860” depends_on: - ollama environment: - OLLAMA_BASE_URL=http://ollama:11434

volumes: ollama_data:

## Ollama

The standard dockerhub  [ollama image](https://hub.docker.com/r/ollama/ollama) is used without gpu. The specific 0.6.3 version is the latest at time of writing.
No  llm ( Large language model) is included in the image and the specific model 'tinyllama' needs to be pulled in as specified above. The specific model needs to be the same when gradio requests the response in the chatbot app.py code below.
Once simple tinyllama is working, other models can be played with, depending on your hardware and patience.


## Chatbot

In a sub directory named chatbot, there are three files
  1) app.py ; the python code for the interface
  2) requirements.txt ; the required python dependancies 
  3) dockerfile ; the instructions to build the chatbot container.

app.py
This launches the web server on local host port 7860. The interface is minimal input and response.

import gradio as gr import requests import os

OLLAMA_URL = os.getenv(“OLLAMA_BASE_URL”, “http://localhost:11434”)

def chat_with_ollama(prompt): response = requests.post( f”{OLLAMA_URL}/api/generate”, json={“model”: “tinyllama”, “prompt”: prompt,”stream”: False} ) if response.status_code == 200: return response.json()[“response”] else: return “Error communicating with Ollama.”

iface = gr.Interface( fn=chat_with_ollama, inputs=gr.Textbox(label=”Input”), outputs=gr.Textbox(label=”Ollama’s Response”), title=”Bryan’s Chatbot”, description=”Talk to my AI” )

if name == “main”: iface.launch(server_name=”0.0.0.0”, server_port=7860)

requirements.txt

gradio requests


dockerfile

FROM python:3.10

WORKDIR /app

COPY requirements.txt . RUN pip install –no-cache-dir -r requirements.txt

COPY . .

CMD [“python”, “app.py”] ```