All posts
·6 min read

How We Built GitHub Integration for Prompt Management

A technical deep dive into how Montage scans GitHub repos for hardcoded prompts, generates replacement PRs, and keeps prompts in sync, all from the dashboard.


One of the most common questions we hear from teams evaluating Montage is: "I have hundreds of prompts hardcoded in my codebase. How do I migrate?"

We built GitHub integration to answer that question. Connect your repo, and Montage will find your prompts, let you import them, and generate PRs to replace the hardcoded strings with SDK calls.

Here's how we built it.

The Problem

Production codebases have prompts everywhere:

// In an API route
const response = await openai.chat.completions.create({
  messages: [{ role: "system", content: "You are a helpful..." }],
});

// In a utility function
const SUPPORT_PROMPT = `As a customer support agent, you should...`;

// In a config file
export const prompts = {
  onboarding: "Welcome the user and ask about their goals...",
};

Finding all of these manually is tedious. Migrating them one by one is slow. We wanted to automate the entire flow.

Phase 1: OAuth and Repository Access

The foundation is GitHub OAuth. When a user connects their GitHub account, we store an access token that lets us read their repositories and create pull requests on their behalf.

We implemented token refresh to handle expiring tokens automatically. When a token is close to expiration, we refresh it transparently before making API calls.

Phase 2: Prompt Detection

The interesting challenge is finding prompts in code. For JavaScript and TypeScript, we use the TypeScript compiler's AST to walk the syntax tree, visiting call expressions and identifying message arrays, variable declarations, and inline prompt content. This gives us precise structural detection that understands parent calls, argument positions, and nesting.

For Python and other languages, we use a multi-pattern regex approach as a fallback. Across both methods, we detect:

  1. LLM API calls like messages: [{ role: "system", content: "..." }]
  2. Template literals assigned to variables with prompt-like names
  3. String constants like const SYSTEM_PROMPT = "..."
  4. Python f-strings like f"You are a {role} assistant..."
  5. Docstrings used as prompt templates
  6. Array patterns with messages arrays containing role/content objects

Each detected prompt gets a confidence score based on the detection method, variable name, and surrounding context. AST-detected prompts in a recognized API call get high confidence automatically. Regex-detected prompts are scored based on pattern strength and naming conventions.

Phase 3: Import Flow

Detected prompts are presented in a review interface with:

  • The original source code with syntax highlighting
  • Editable name and slug fields (with AI-generated suggestions)
  • Role selectors for each message (System/User/Assistant)
  • Confidence badges so users know which detections to trust
  • Bulk import to select all or pick individually

The key UX decision was making this a review step, not an automatic import. We show what we found and let the user confirm, edit, or skip each detection.

Phase 4: Replacement Code Generation

After importing a prompt, Montage generates the replacement code showing what the user's code should look like after migrating from hardcoded to SDK-fetched prompts.

For a simple case:

// Before
const response = await openai.chat.completions.create({
  model: "gpt-4o",
  messages: [
    { role: "system", content: "You are a helpful support agent..." },
    { role: "user", content: userMessage },
  ],
});

// After
import { Montage } from "@montage-sh/sdk";
const montage = new Montage({ apiKey: process.env.MONTAGE_API_KEY });
const prompt = await montage.get("support-agent");
const compiled = prompt.compile({ userMessage });

const response = await openai.chat.completions.create({
  model: "gpt-4o",
  messages: compiled.messages,
});

This is straightforward for standalone prompts. It gets complex when prompts are inline in function calls, nested in conditionals, or mixed with dynamic runtime data. We built safety checks that flag prompts where auto-replacement might not preserve the original behavior.

Phase 5: Auto-PR Creation

The one-click PR flow works like this:

  1. User clicks "Create PR" on an imported prompt
  2. Montage generates the replacement code diff
  3. We create a new branch on the user's repo
  4. We commit the file change with the replacement
  5. We open a PR with a description explaining the change

The PR description includes context: which prompt was replaced, links to the prompt in Montage, and instructions for setting up the SDK.

Phase 6: Sync Detection

After importing, Montage tracks the source file. We store the file's SHA hash at import time and use GitHub webhooks to detect when the file changes.

When a tracked file is modified, we show a sync indicator in the dashboard. The user can review the diff and decide whether to re-import, ignore the change, or manually update the prompt.

This also works in reverse: when a user edits a prompt in Montage, they can push the change back to GitHub via a "push-back" PR.

Lessons Learned

AST parsing was worth the investment. For JavaScript and TypeScript, we use the TypeScript compiler's AST to walk call expressions, identify message arrays, and extract parent call context. This gives us precise detection with high confidence. We fall back to regex patterns for Python and for edge cases where AST parsing doesn't apply, but AST-first detection catches the structure that regex alone would miss.

Auto-replacement has limits. About 60-70% of real-world prompts can be cleanly auto-replaced. The rest involve inline API calls, complex variable interpolation, or dynamic message construction that requires manual migration. We learned to frame this as guidance rather than an error: "here's the replacement code, you may need to adapt it."

Webhooks are essential for sync. Polling for file changes doesn't scale. GitHub webhooks let us detect changes in real-time and keep the sync status accurate.

The manual prompt picker was a late addition that turned out to be essential. When our regex doesn't detect a prompt, users can select lines in the source code viewer and import them manually. This catches the long tail of unusual prompt patterns.

What's Next

We're working on smarter replacement code that handles more complex cases: inline API calls, conditional prompts, and multi-file prompt compositions. Our AST-based replacement already handles straightforward variable declarations and falls back to regex for trickier patterns, and we're pushing that boundary further.

The goal is to make migration from hardcoded to managed as close to zero-effort as possible.

technicalgithubengineering

Written by Jeremy Seicianu