A Step-by-Step Guide to Automated Threat Modeling with LLMs in Your DevSecOps Pipeline

In the world of DevOps, speed is king. But as we accelerate development with continuous integration and deployment, security can’t be an afterthought. Traditional threat modeling—a crucial practice for identifying security risks—is often a manual, time-consuming process that creates a bottleneck. By the time a security review is complete, the code has already moved on.

What if we could make threat modeling as agile as our development process? This is where Large Language Models (LLMs) enter the picture. By integrating AI into our DevSecOps pipeline, we can automate security analysis, provide developers with immediate feedback, and truly “shift security left.”

This guide will walk you through a step-by-step process to build an automated threat modeling workflow using LLMs directly within your CI/CD pipeline. You’ll learn how to transform a manual security checkpoint into a seamless, automated, and intelligent part of your development lifecycle.

What is Threat Modeling and Why Automate It?

Threat modeling is a structured process for identifying potential threats, vulnerabilities, and mitigations in a system. It helps teams proactively address security risks before they become real-world incidents.

The STRIDE Framework: A Quick Refresher

One of the most popular threat modeling methodologies is STRIDE, a mnemonic for six categories of threats:

Spoofing: Impersonating another user or system.
Tampering: Modifying data or code without authorization.
Repudiation: Denying having performed an action.
Information Disclosure: Exposing sensitive data to unauthorized parties.
Denial of Service: Making a system unavailable to legitimate users.
Elevation of Privilege: Gaining higher-level access than authorized.

The Case for Automation

Manual threat modeling sessions are invaluable but suffer from several drawbacks in a fast-paced environment:

They are slow: A single session can take hours or days to schedule and conduct.
They require expertise: Finding skilled security architects is a challenge.
They are inconsistent: The quality of the output can vary depending on the participants.

Automating threat modeling with LLMs addresses these issues by providing instant, consistent, and scalable security analysis on every code change. This doesn’t replace human experts but rather empowers them, freeing them up to focus on complex, high-risk issues while the automation handles the routine checks.

Leveraging LLMs for Security Analysis

LLMs are uniquely suited for threat modeling because they can understand the context of code, system architecture, and user stories. They can reason about potential flaws based on patterns learned from analyzing billions of lines of code and security documents.

The key to unlocking this capability is crafting a high-quality prompt. A well-designed prompt acts as your directive to the AI, guiding it to perform a specific and structured security analysis.

Your prompt should include four key components:

Persona: Instruct the LLM to act as an expert DevSecOps or application security engineer.
Context: Provide the necessary information about your application, such as source code, Infrastructure as Code (Terraform, etc.), and package dependencies.
Task: Clearly state the goal. For us, it’s to perform a threat analysis using the STRIDE framework.
Format: Specify the output format. Structured formats like JSON are ideal for programmatic processing in a pipeline.

A Step-by-Step Implementation Guide

Let’s build a practical workflow to automate threat modeling for a simple Python Flask application. We’ll use GitHub Actions as our CI/CD platform and the OpenAI API for LLM access.

Step 1: Gather the Context in Your Pipeline

The first step is to collect all relevant files that describe the application and its infrastructure. A simple shell script in your CI job can concatenate these files into a single context block.

For our example, we’ll gather the application code, Dockerfile, and requirements.txt.

# .github/scripts/gather_context.sh

# This script concatenates key application files into a single text block
# to be used as context for the LLM.

echo "--- Gathering application context for threat modeling ---"

# Create a temporary file for the context
CONTEXT_FILE="threat_model_context.txt"
> $CONTEXT_FILE # Clear the file if it exists

# Add file content with clear separators
echo "### File: src/app.py ###" >> $CONTEXT_FILE
cat src/app.py >> $CONTEXT_FILE
echo -e "\n" >> $CONTEXT_FILE

echo "### File: Dockerfile ###" >> $CONTEXT_FILE
cat Dockerfile >> $CONTEXT_FILE
echo -e "\n" >> $CONTEXT_FILE

echo "### File: requirements.txt ###" >> $CONTEXT_FILE
cat requirements.txt >> $CONTEXT_FILE
echo -e "\n" >> $CONTEXT_FILE

echo "--- Context gathered successfully ---"

Step 2: Design the LLM Prompt and Script

Next, we’ll create a Python script that reads the gathered context, constructs a detailed prompt, and sends it to the LLM API. This script will also parse the JSON response.

# .github/scripts/run_threat_model.py

import os
import sys
import json
from openai import OpenAI

# It's recommended to use environment variables for API keys
client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))
CONTEXT_FILE = "threat_model_context.txt"

def generate_threat_model(context):
    """
    Calls the LLM to generate a threat model based on the provided context.
    """
    # This prompt is the core of our automation. It's detailed and specific.
    system_prompt = """
    You are an expert DevSecOps security engineer performing automated threat modeling.
    Your task is to analyze the provided application context and identify potential threats
    using the STRIDE framework (Spoofing, Tampering, Repudiation, Information Disclosure,
    Denial of Service, Elevation of Privilege).
    """

    user_prompt = f"""
    Based on the following application context, please perform a threat analysis.
    For each threat you identify, provide the following details:
    - threat_id: A unique identifier (e.g., STRIDE-001).
    - category: The STRIDE category (e.g., "Spoofing").
    - description: A clear and concise description of the threat.
    - mitigation: A recommended, actionable mitigation strategy for developers.

    Return your response as a single, well-formed JSON object containing a list
    named "threats". Do not include any other text or explanations outside of the JSON.

    Application Context:
    ---
    {context}
    ---
    """

    try:
        response = client.chat.completions.create(
            model="gpt-4-turbo",  # Or your preferred model
            messages=[
                {"role": "system", "content": system_prompt},
                {"role": "user", "content": user_prompt},
            ],
            response_format={"type": "json_object"},
            temperature=0.2, # Lower temperature for more deterministic output
        )
        return json.loads(response.choices[0].message.content)
    except Exception as e:
        print(f"Error calling LLM API: {e}", file=sys.stderr)
        sys.exit(1)

def format_as_markdown(threat_model_json):
    """
    Formats the JSON output from the LLM into a Markdown table for PR comments.
    """
    if not threat_model_json or "threats" not in threat_model_json:
        return "No threats identified or an error occurred during analysis."

    markdown = "## Automated Threat Model Report\n\n"
    markdown += "| ID | Category | Threat Description | Mitigation Recommendation |\n"
    markdown += "|----|----------|--------------------|---------------------------|\n"

    for threat in threat_model_json["threats"]:
        markdown += f"| {threat.get('threat_id', 'N/A')} | {threat.get('category', 'N/A')} | {threat.get('description', '')} | {threat.get('mitigation', '')} |\n"
    
    return markdown

if __name__ == "__main__":
    if not os.path.exists(CONTEXT_FILE):
        print(f"Error: Context file '{CONTEXT_FILE}' not found.", file=sys.stderr)
        sys.exit(1)
        
    with open(CONTEXT_FILE, 'r') as f:
        app_context = f.read()

    threat_model = generate_threat_model(app_context)
    markdown_output = format_as_markdown(threat_model)
    
    # Save the output to a file to be used by the GitHub Actions workflow
    with open("threat_model_report.md", "w") as f:
        f.write(markdown_output)

    print("Threat model report generated successfully.")

Step 3: Integrate into Your CI/CD Pipeline

Now, let’s tie it all together in a GitHub Actions workflow that runs on every pull request. This workflow will use the scripts we just created and post the results as a comment on the PR.

# .github/workflows/threat_modeling.yml

name: Automated Threat Modeling

on:
  pull_request:
    branches: [ main ]

permissions:
  contents: read
  pull-requests: write # Required to post comments on PRs

jobs:
  llm_threat_model:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout repository
        uses: actions/checkout@v4

      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: '3.10'

      - name: Install dependencies
        run: pip install openai

      - name: Gather application context
        run: |
          chmod +x .github/scripts/gather_context.sh
          ./.github/scripts/gather_context.sh

      - name: Run LLM-based threat analysis
        env:
          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
        run: python .github/scripts/run_threat_model.py

      - name: Post threat model report to PR
        uses: actions/github-script@v7
        with:
          script: |
            const fs = require('fs');
            const report = fs.readFileSync('threat_model_report.md', 'utf8');
            github.rest.issues.createComment({
              issue_number: context.issue.number,
              owner: context.repo.owner,
              repo: context.repo.repo,
              body: report
            });

Note: Don’t forget to add your OPENAI_API_KEY to your repository’s secrets in GitHub.

Step 4: Review the Results

With this workflow in place, every pull request will automatically trigger a threat model analysis. The results will be posted directly as a comment, giving developers immediate, context-aware security feedback right where they work.

This creates a powerful feedback loop, enabling teams to discuss and mitigate potential threats before they are merged into the main branch.

Best Practices and Overcoming Challenges

While powerful, this approach requires careful implementation.

Handling Hallucinations and False Positives

LLMs can “hallucinate” or produce inaccurate information. The output should be treated as a set of recommendations to be reviewed, not as infallible truth. The goal is to augment human intelligence, not replace it. Use the generated report as a starting point for a conversation with your security team.

Data Privacy and Security

Be cautious about sending proprietary or sensitive source code to public LLM APIs. For organizations with strict data privacy requirements, consider using services like Azure OpenAI Service, which offers enhanced privacy controls, or self-hosting an open-source model using frameworks like vLLM or Ollama.

Iterating on Your Prompts

Prompt engineering is an iterative process. If the results aren’t meeting your expectations, refine your prompt. You can make it more specific, provide better examples (few-shot prompting), or adjust the context you provide to improve the quality of the analysis.

Conclusion

Integrating LLMs into your DevSecOps pipeline for automated threat modeling represents a significant leap forward in our ability to build secure software at scale. By transforming a manual, periodic review into a continuous, automated process, we can catch vulnerabilities earlier, reduce friction between development and security, and ultimately ship more resilient products.

This guide provides a starting point. The real power comes from adapting this framework to your unique environment, refining your prompts, and integrating the feedback into your team’s culture.

Start small, pick a single service, and try generating your first automated threat model today. We’d love to hear about your experiments and what you learn. Share your thoughts and questions in the comments below

Deploy an Astro Blog with Cloudflare Pages and Porkbun

How to Build a Predictive Kubernetes Autoscaler with Prometheus and Machine Learning