Introduction
Wrangling complex Kubernetes commands, debugging cryptic CI/CD errors, and writing one-off bash scripts are daily realities in DevOps. While AI assistants like ChatGPT are useful, sending internal scripts, infrastructure details, or proprietary documentation to a third-party service is a non-starter for many organizations. The solution is to build your own.
This tutorial guides you through creating a completely private, locally-hosted AI DevOps assistant. It will run on your machine, answer questions about your tools, and even learn from your internal documentation. You will learn to set up a local Large Language Model (LLM) with Ollama, serve it through a standard API, and enhance its capabilities with a simple but powerful Retrieval-Augmented Generation (RAG) pattern.
Why Go Local? The Case for a Private DevOps Brain
Building your own assistant isn’t just a novelty; it’s a strategic advantage.
- Absolute Privacy and Security: Your queries, scripts, and internal data never leave your machine. This is critical for handling sensitive information like infrastructure code, logs, and proprietary runbooks.
- Zero Cost: Forget per-token fees and API bills. Once the open-source models are downloaded, you can run inference as much as you need, for free.
- Offline Capability: Your assistant is available even without an internet connection, making it a reliable tool in restricted or on-the-go environments.
- Deep Customization: You control the AI’s behavior. You can tailor its knowledge, personality, and expertise to match your team’s specific tech stack and workflows.
The Tech Stack: Our Open-Source Toolkit
We’ll use a handful of powerful, lightweight tools to assemble our assistant:
- Ollama: A brilliant command-line tool that makes it incredibly simple to download, manage, and run open-source LLMs like Llama 3 locally.
- Llama 3: The “brain” of our assistant. We’ll use the 8-billion parameter instruction-tuned model (
llama3:8b-instruct), which offers an excellent balance of performance and resource efficiency. - Python: The glue that holds everything together. We’ll write a simple script to orchestrate the logic of our assistant.
- Your Documentation: The real magic. We’ll use your existing Markdown files, cheat sheets, and runbooks as a custom knowledge base.
Step 1: Setting Up Your Local LLM with Ollama
First, let’s get a powerful language model running on your machine. Ollama makes this process a breeze.
Install Ollama
On macOS or Linux, open your terminal and run the official installation script:
# Download and install Ollama
curl -fsSL https://ollama.com/install.sh | sh
For Windows, you can download the installer from the Ollama website.
Pull a Model
With Ollama installed, you can now pull a model from the Ollama library. We’ll use Llama 3’s 8B instruction-tuned version. It’s powerful enough for complex DevOps tasks and runs well on most modern laptops.
# Download the Llama 3 8B instruction-tuned model
ollama pull llama3:8b-instruct
This download is several gigabytes, so it may take a few minutes depending on your internet connection. You can also try other models like codellama for a more code-focused assistant or mistral for a different flavor of performance.
Test the Model
You can immediately chat with your model directly from the command line to ensure it’s working.
# Start a chat session with the model
ollama run llama3:8b-instruct
Try asking it a simple DevOps question:
What does
kubectl rollout restart deploymentdo?
You should get a concise and accurate answer. To exit the chat, type /bye.
Step 2: Building the Assistant Logic in Python
Now that we have a local LLM, let’s build a Python script to interact with it programmatically. This will allow us to add custom logic and a specialized system prompt.
First, install the necessary Python library.
pip install ollama
Next, create a Python file named assistant.py. We’ll start with a basic script that defines the assistant’s persona and asks a question.
import ollama
import sys
# Define the expert persona for our DevOps assistant
SYSTEM_PROMPT = """
You are a senior DevOps engineer and an expert in Linux, Docker, Kubernetes, Terraform, and CI/CD pipelines.
Your responses should be clear, concise, and include executable code examples where possible.
When a user asks a question, provide the best possible answer based on your knowledge.
"""
def main():
"""Main function to run the assistant."""
if len(sys.argv) < 2:
print("Usage: python assistant.py \"<your question>\"")
sys.exit(1)
user_query = sys.argv[1]
response = ollama.chat(
model='llama3:8b-instruct',
messages=[
{'role': 'system', 'content': SYSTEM_PROMPT},
{'role': 'user', 'content': user_query},
]
)
print(response['message']['content'])
if __name__ == "__main__":
main()
Run it from your terminal:
python assistant.py "How do I copy a file from a Docker container to my host machine?"
The script will send your query, along with our custom system prompt, to the local Llama 3 model and print the response. You now have a specialized, command-line DevOps expert.
Step 3: Supercharging with RAG (Retrieval-Augmented Generation)
Our assistant is smart, but it doesn’t know about your specific environment. It doesn’t know your team’s runbooks, your company’s Terraform module standards, or your private server hostnames. Let’s fix that with RAG.
RAG is a technique where you “retrieve” relevant information from your own documents and “augment” the LLM’s prompt with that information. This gives the model the context it needs to provide highly relevant, specific answers.
The flow is simple:
- Find: Search a local folder of documents for content related to the user’s query.
- Augment: Inject the found content directly into the prompt.
- Generate: Ask the LLM to answer the user’s question using the provided context.
A Simple RAG Implementation
Let’s create a docs/ directory and add a sample document.
docs/k8s-conventions.md:
# Kubernetes Naming Conventions
- **Production Namespace:** All production workloads must be deployed in the `prod-main` namespace.
- **Staging Namespace:** All staging workloads must be deployed in the `staging-alpha` namespace.
- **Restarting Deployments:** To perform a rolling restart, use the command `kubectl rollout restart deployment <deployment-name> -n <namespace>`.
Now, let’s update assistant.py to include our RAG logic. We’ll add a simple function to search our docs/ folder. For this tutorial, we’ll use a basic keyword search. For more advanced use cases, you could integrate a vector database like Chroma DB.
import ollama
import sys
import os
# Define the expert persona for our DevOps assistant
SYSTEM_PROMPT = """
You are a senior DevOps engineer and an expert in Linux, Docker, Kubernetes, Terraform, and CI/CD pipelines.
Your responses should be clear, concise, and include executable code examples where possible.
When a user asks a question, first use the provided context to form your answer. If the context is not relevant, rely on your general knowledge.
"""
def find_relevant_docs(query: str, docs_path: str = "docs") -> str:
"""
A simple keyword-based search to find relevant documents.
"""
query_keywords = set(query.lower().split())
best_match_content = ""
highest_score = 0
if not os.path.exists(docs_path):
return ""
for filename in os.listdir(docs_path):
if filename.endswith(".md"):
with open(os.path.join(docs_path, filename), 'r') as f:
content = f.read()
content_keywords = set(content.lower().split())
score = len(query_keywords.intersection(content_keywords))
if score > highest_score:
highest_score = score
best_match_content = content
return best_match_content
def main():
"""Main function to run the RAG-powered assistant."""
if len(sys.argv) < 2:
print("Usage: python assistant.py \"<your question>\"")
sys.exit(1)
user_query = sys.argv[1]
# 1. Find relevant context from local documents
context = find_relevant_docs(user_query)
# 2. Augment the user query with the context
augmented_prompt = f"Context:\n---\n{context}\n---\nUser Question: {user_query}"
print("--- Found Relevant Context ---\n", context)
print("----------------------------\n")
# 3. Generate a response from the LLM
response = ollama.chat(
model='llama3:8b-instruct',
messages=[
{'role': 'system', 'content': SYSTEM_PROMPT},
{'role': 'user', 'content': augmented_prompt},
]
)
print(response['message']['content'])
if __name__ == "__main__":
main()
Putting It All Together: A Real-World Example
Now, run the assistant again with a question that can benefit from our custom knowledge base:
python assistant.py "How do I restart the 'api-gateway' deployment in production?"
Without RAG, the model would give a generic answer, forcing you to look up the correct namespace.
With RAG, the output will be far more useful:
--- Found Relevant Context ---
# Kubernetes Naming Conventions
- **Production Namespace:** All production workloads must be deployed in the `prod-main` namespace.
- **Staging Namespace:** All staging workloads must be deployed in the `staging-alpha` namespace.
- **Restarting Deployments:** To perform a rolling restart, use the command `kubectl rollout restart deployment <deployment-name> -n <namespace>`.
----------------------------
Based on your documentation, the production namespace is `prod-main`. To restart the 'api-gateway' deployment, you should run the following command:
```bash
kubectl rollout restart deployment api-gateway -n prod-main
The assistant correctly retrieved the context and used it to provide a precise, immediately usable command.
## Conclusion
You have successfully built a powerful, private, and context-aware AI DevOps assistant using only open-source tools. By combining Ollama's ease of use with a simple RAG pattern, you've created a system that can be tailored to your exact needs, enhancing both your productivity and your security posture.
This is just the beginning. You can expand your assistant's knowledge by adding more documents, improve its retrieval by integrating a vector database, or even give it the ability to execute commands with tools like `subprocess`.
What's the first task you'd automate with your new assistant? Share your ideas in the comments below