PickSkill
← Back

HatmanStack/ragstack-mcp

README.md
Rendered from GitHub raw
View raw ↗

RAGStack MCP Server

MCP (Model Context Protocol) server for RAGStack knowledge bases. Enables AI assistants to search, chat, upload documents/media, and scrape your knowledge base.

Installation

# Using uvx (recommended - no install needed)
uvx ragstack-mcp
 
# Or install globally
pip install ragstack-mcp

Configuration

Get your GraphQL endpoint and API key from the RAGStack dashboard: Settings → API Key

Claude Desktop

Edit ~/Library/Application Support/Claude/claude_desktop_config.json (Mac) or %APPDATA%\Claude\claude_desktop_config.json (Windows):

{
  "mcpServers": {
    "ragstack-kb": {
      "command": "uvx",
      "args": ["ragstack-mcp"],
      "env": {
        "RAGSTACK_GRAPHQL_ENDPOINT": "https://xxx.appsync-api.us-east-1.amazonaws.com/graphql",
        "RAGSTACK_API_KEY": "da2-xxxxxxxxxxxx"
      }
    }
  }
}

Amazon Q CLI

Edit ~/.aws/amazonq/mcp.json:

{
  "mcpServers": {
    "ragstack-kb": {
      "command": "uvx",
      "args": ["ragstack-mcp"],
      "env": {
        "RAGSTACK_GRAPHQL_ENDPOINT": "https://xxx.appsync-api.us-east-1.amazonaws.com/graphql",
        "RAGSTACK_API_KEY": "da2-xxxxxxxxxxxx"
      }
    }
  }
}

Cursor

Open Settings → MCP Servers → Add Server, or edit .cursor/mcp.json:

{
  "ragstack-kb": {
    "command": "uvx",
    "args": ["ragstack-mcp"],
    "env": {
      "RAGSTACK_GRAPHQL_ENDPOINT": "https://xxx.appsync-api.us-east-1.amazonaws.com/graphql",
      "RAGSTACK_API_KEY": "da2-xxxxxxxxxxxx"
    }
  }
}

VS Code + Cline

Edit .vscode/cline_mcp_settings.json:

{
  "mcpServers": {
    "ragstack-kb": {
      "command": "uvx",
      "args": ["ragstack-mcp"],
      "env": {
        "RAGSTACK_GRAPHQL_ENDPOINT": "https://xxx.appsync-api.us-east-1.amazonaws.com/graphql",
        "RAGSTACK_API_KEY": "da2-xxxxxxxxxxxx"
      }
    }
  }
}

VS Code + Continue

Edit ~/.continue/config.json, add to mcpServers array:

{
  "mcpServers": [
    {
      "name": "ragstack-kb",
      "command": "uvx",
      "args": ["ragstack-mcp"],
      "env": {
        "RAGSTACK_GRAPHQL_ENDPOINT": "https://xxx.appsync-api.us-east-1.amazonaws.com/graphql",
        "RAGSTACK_API_KEY": "da2-xxxxxxxxxxxx"
      }
    }
  ]
}

Available Tools

search_knowledge_base

Search for relevant documents in the knowledge base.

Parameter Type Required Default Description
query string Yes - The search query
max_results int No 5 Maximum results to return

chat_with_knowledge_base

Ask questions and get AI-generated answers with source citations.

Parameter Type Required Default Description
query string Yes - Your question
conversation_id string No null ID to maintain conversation context

start_scrape_job

Scrape a website into the knowledge base.

Parameter Type Required Default Description
url string Yes - Starting URL to scrape
max_pages int No 50 Maximum pages to scrape
max_depth int No 3 How deep to follow links (0 = start page only)
scope string No "HOSTNAME" SUBPAGES, HOSTNAME, or DOMAIN
include_patterns list[str] No null Only scrape URLs matching these glob patterns
exclude_patterns list[str] No null Skip URLs matching these glob patterns
scrape_mode string No "AUTO" AUTO, FAST (HTTP only), or FULL (browser)
cookies string No null Cookie string for authenticated sites
force_rescrape bool No false Re-scrape even if content unchanged

Scope values:

  • SUBPAGES - Only URLs under the starting path
  • HOSTNAME - All pages on the same subdomain
  • DOMAIN - All subdomains of the domain

Scrape mode values:

  • AUTO - Try fast mode, fall back to full for SPAs
  • FAST - HTTP only, faster but may miss JavaScript content
  • FULL - Uses headless browser, handles all JavaScript

get_scrape_job_status

Check the status of a scrape job.

Parameter Type Required Description
job_id string Yes The scrape job ID

list_scrape_jobs

List recent scrape jobs.

Parameter Type Required Default Description
limit int No 10 Maximum jobs to return

upload_document_url

Get a presigned URL to upload a document or media file.

Parameter Type Required Description
filename string Yes Name of the file (e.g., 'report.pdf', 'meeting.mp4')

Supported formats:

  • Documents: PDF, DOCX, XLSX, HTML, TXT, CSV, JSON, XML, EML, EPUB, Markdown
  • Images: JPG, PNG, GIF, WebP, AVIF, BMP, TIFF
  • Video: MP4, WebM
  • Audio: MP3, WAV, M4A, OGG, FLAC

Video/audio files are transcribed using AWS Transcribe and segmented for search.

upload_image_url

Get a presigned URL to upload an image (step 1 of image upload workflow).

Parameter Type Required Description
filename string Yes Name of the image file (e.g., 'photo.jpg')

Supported formats: JPEG, PNG, GIF, WebP, AVIF, BMP, TIFF

generate_image_caption

Generate an AI caption for an uploaded image using a vision model (step 2, optional).

Parameter Type Required Description
s3_uri string Yes S3 URI returned by upload_image_url

submit_image

Finalize an image upload and trigger indexing (step 3).

Parameter Type Required Default Description
image_id string Yes - Image ID from upload_image_url
caption string No null Primary caption
user_caption string No null User-provided caption
ai_caption string No null AI-generated caption

Configuration Tools (Read-Only)

get_configuration

Get all current RAGStack configuration settings organized by category.

Returns settings for:

  • Chat: Models, quotas, system prompt, document access
  • Metadata Extraction: Enabled, model, mode (auto/manual), max keys
  • Query-Time Filtering: Filter generation, multi-slice retrieval settings
  • Public Access: Which endpoints allow unauthenticated access
  • Document Processing: OCR backend, image caption prompt
  • Media Processing: Transcribe language, speaker diarization, segment duration
  • Budget: Alert thresholds

Note: Read-only. To modify settings, use the admin dashboard (Cognito auth required).


Metadata Analysis Tools

These tools help understand and optimize metadata extraction and filtering.

get_metadata_stats

Get statistics about metadata keys extracted from documents.

Returns key names, data types, occurrence counts, sample values, and status.

get_filter_examples

Get AI-generated filter examples for metadata-based search queries.

Returns filter patterns with name, description, use case, and JSON filter syntax.

Filter syntax reference:

  • Basic operators: $eq, $ne, $gt, $gte, $lt, $lte, $in, $nin, $exists
  • Logical operators: $and, $or
  • Example: {"topic": {"$eq": "genealogy"}}

get_key_library

Get the complete metadata key library with all discovered keys.

Returns all keys available for filtering with data types and sample values.

check_key_similarity

Check if a proposed metadata key is similar to existing keys.

Parameter Type Required Default Description
key_name string Yes - Proposed key name to check
threshold float No 0.8 Similarity threshold (0.0-1.0)

Use this before adding documents with new keys to avoid duplicates.

analyze_metadata

Trigger metadata analysis to discover keys and generate filter examples.

Note: This is a long-running operation (1-2 minutes). It samples up to 1000 vectors and uses LLM analysis.

Run this after ingesting new documents or when filter generation isn't working as expected.


Usage Examples

Once configured, just ask your AI assistant naturally:

Search & Chat:

  • "Search my knowledge base for authentication best practices"
  • "What does our documentation say about API rate limits?"
  • "What was discussed in the team meeting about deadlines?" (searches video/audio transcripts)

Web Scraping:

  • "Scrape the React docs at react.dev/reference"
  • "Check the status of my scrape job"

Document, Image & Media Upload:

  • "Upload a new document called quarterly-report.pdf"
  • "Upload this image and generate a caption for it"
  • "Upload the meeting recording meeting-2024-01.mp4"

Metadata Analysis:

  • "What metadata keys are available for filtering?"
  • "Analyze the metadata in my knowledge base"
  • "Show me the filter examples"
  • "Check if 'author' is similar to any existing keys"

Configuration:

  • "What are my current RAGStack settings?"
  • "What model is being used for chat?"
  • "Is multi-slice retrieval enabled?"
  • "What are my quota limits?"
  • "What language is configured for transcription?"

Environment Variables

Variable Required Description
RAGSTACK_GRAPHQL_ENDPOINT Yes Your RAGStack GraphQL API URL
RAGSTACK_API_KEY Yes Your RAGStack API key

Development

# Clone and install
cd src/ragstack-mcp
uv sync
 
# Run locally
uv run ragstack-mcp
 
# Build package
uv build
 
# Publish to PyPI
uv publish

License

MIT