2026-05-21 00:00 UTCOriginal source9 min readUpdated: 2026-06-27 00:25 UTC

Build a Coding Assistant with Weaviate MCP: RAG over Code & Docs

Use Weaviate's built-in MCP server to give Claude Code, Cursor, and VS Code hybrid search over your codebase and docs. No glue code.

SourceWeaviate Blog

Last week I asked Claude Code to implement something relatively trivial in my codebase. Three turns in, the conversation used up >80K tokens and Claude was still missing some crucial information I'd forgotten to include. That's the loop you fall into without retrieval: paste too little and the agent guesses, paste too much and pay for context the agent isn't using.

Most teams solve this with RAG over the codebase. The typical setup is a vector database plus a custom MCP server process bridging the two. Weaviate simplifies this: the MCP server is built into the database, at /v1/mcp on the same port as the REST API. One env var enables it. The same hybrid search you'd use for any other Weaviate workload powers code retrieval, with the BM25 half keeping function identifiers like connect_to_local matchable and the vector half finding semantic intent like "how do I init a client."

This post walks through building a coding assistant on top of that built-in MCP server: ingest a codebase, ingest its docs, connect Claude Code, Cursor, and VS Code, and run real queries. Topics covered:

Why your coding assistant needs more than its training data

Why Weaviate MCP fits this job

Step 1: Run Weaviate with MCP enabled

Step 2: Design the schema

Step 3: Chunk and ingest the codebase

Step 4: Chunk and ingest documentation

Step 5: Connect Claude Code, Cursor, and VS Code

Try it out

Agent runbook: autonomous setup

Why your coding assistant needs more than its training data

LLMs ship with a fixed cutoff and zero knowledge of your private code. The naive workaround is to dump files into the prompt. That has three problems.

Cost: Tokens in context are billed. Every turn. A 200-file Python project doesn't fit, and even the parts that do are billed continuously while the agent reasons.

Stale context: Once a file is in the prompt, it's frozen. If the agent changes a function and then needs to read it again, it has to reload the whole file. There's no live link between the model's view and the on-disk truth.

Wrong granularity: Even when files fit, the model spends attention on the wrong parts. Imports. Module-level boilerplate. The function the agent is actually editing competes for context with a hundred lines of from x import y.

Retrieval solves all three. Index the codebase once, store the chunks in a vector database, and let the LLM client pull only what it needs per query. That's RAG. Coding assistants haven't had a clean way to talk to such a database without a custom shim. That's where MCP comes in.

Why Weaviate MCP fits this job

The Model Context Protocol (MCP) is the standardized way for LLM clients like Claude Code, Cursor, and VS Code to call out to external tools. Weaviate v1.37.1 exposes its core operations as MCP tools directly, on a Streamable HTTP endpoint at /v1/mcp. Four tools are surfaced:

weaviate-collections-get-config — let the LLM inspect what collections exist and what properties they have

weaviate-tenants-list — list tenants when you're using multi-tenancy

weaviate-query-hybrid — run hybrid (BM25 + vector) search

weaviate-objects-upsert — write objects back, only when write access is enabled

Hybrid search is the most concrete reason this stack works for a coding assistant. Code is a mix of identifiers and intent. BM25 nails the identifiers. Vectors nail the intent. A query like "where do we handle retry on 429?" wants both at once: vectors find the semantically related retry code, BM25 anchors on 429 as an exact token. Pure-vector retrieval drops the integer match. Pure-BM25 misses any wording the user didn't already know. Hybrid wins on this kind of mixed-intent query.

Operational simplicity is the second reason. Competing stacks run an MCP server alongside the vector database. That's a second service to watch in production. Weaviate ships the MCP server inside the database, on the same port, with the same auth. The thing you have to monitor is just Weaviate.

The third reason is multi-tenancy. One Weaviate instance can hold many codebases, each isolated as a tenant. For an organization with multiple repos, that's one cluster instead of one-per-team.

Weaviate MCP vs function calling comes up as a natural question. Function calling is per-LLM-API. MCP is transport-level: any client that speaks MCP can call any server that speaks MCP, no rewrites. Build the retrieval once, use it from Claude Code, Cursor, and VS Code without translating.

Step 1: Run Weaviate with MCP enabled

The MCP server is disabled by default. Two environment variables turn it on:

docker-compose.yml

services: weaviate: image: cr.weaviate.io/semitechnologies/weaviate:1.37.1 ports:

'8080:8080'
'50051:50051'

environment: MCP_SERVER_ENABLED: 'true' MCP_SERVER_WRITE_ACCESS_ENABLED: 'true' AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'true' DEFAULT_VECTORIZER_MODULE: 'text2vec-openai' ENABLE_MODULES: 'text2vec-openai' OPENAI_APIKEY: ${OPENAI_APIKEY}

MCP_SERVER_ENABLED exposes the read tools (config, tenants, hybrid query). MCP_SERVER_WRITE_ACCESS_ENABLED adds the upsert tool, which lets the agent write findings back. Skip it if you only want retrieval.

Honestly, I'd skip it for a while even if you think you want write-back. Read-only MCP covers most of what a coding agent actually does, and you sidestep a class of failure modes where the agent writes nonsense back into your knowledge base. Turn write access on once you have a specific use case that justifies the risk.

Auth for production

This example uses AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'true' so the post stays focused on the MCP wiring. For any networked deployment, enable an API key and add Authorization: Bearer to your client config — Weaviate's MCP server respects standard auth and RBAC. See §"Going further" for the RBAC permissions involved.

Bring it up and confirm the endpoint is alive. Streamable HTTP requires an initialize handshake before any other call, so the liveness probe sends one:

docker compose up -d curl -sf -X POST http://localhost:8080/v1/mcp \ -H 'Content-Type: application/json' \ -H 'Accept: application/json, text/event-stream' \ -d '{ "jsonrpc":"2.0","id":0,"method":"initialize", "params":{ "protocolVersion":"2025-03-26", "capabilities":{}, "clientInfo":{"name":"curl","version":"1"} } }'

A live MCP server returns a JSON-RPC envelope describing the server's capabilities and sets an Mcp-Session-Id response header that subsequent calls must echo back. You don't need to parse the body for a sanity check — a non-empty response is enough. If you get a connection refused or a 404, MCP isn't enabled.

Step 2: Design the schema

Two collections, one for code chunks and one for documentation chunks, sharing the same Weaviate instance:

import weaviate from weaviate.classes.config import Property, DataType, Tokenization, Configure

client = weaviate.connect_to_local()

client.collections.create( name="CodeChunks", properties=[ Property(name="content", data_type=DataType.TEXT, tokenization=Tokenization.WORD), Property(name="symbol", data_type=DataType.TEXT, tokenization=Tokenization.LOWERCASE), Property(name="file_path", data_type=DataType.TEXT, tokenization=Tokenization.FIELD), Property(name="language", data_type=DataType.TEXT, tokenization=Tokenization.FIELD), Property(name="repo", data_type=DataType.TEXT, tokenization=Tokenization.FIELD), ], vector_config=Configure.Vectors.text2vec_openai(), )

client.collections.create( name="DocChunks", properties=[ Property(name="content", data_type=DataType.TEXT, tokenization=Tokenization.WORD), Property(name="title", data_type=DataType.TEXT, tokenization=Tokenization.WORD), Property(name="source_url", data_type=DataType.TEXT, tokenization=Tokenization.FIELD), ], vector_config=Configure.Vectors.text2vec_openai(), )

The tokenization choices matter. symbol uses lowercase so connect_to_local is one token instead of three. file_path, language, and repo use field so they match exactly. Prose properties (content, title) use word. If this section feels familiar, the tokenization post explains why each method belongs where.

Step 3: Chunk and ingest the codebase

Naive line-based chunking destroys code. A function split across two chunks loses its signature on one side and its body on the other. The fix is to chunk along syntactic boundaries: one chunk per function, one per class, one per top-level statement.

Python's standard library is enough for Python code (no extra dependency). For other languages, tree-sitter is the standard choice (10+ languages, AST-aware, fast).

import ast from pathlib import Path

def chunk_python_file(path: Path) -> list[dict]: """Emit one chunk per top-level function or class.""" source = path.read_text() tree = ast.parse(source) lines = source.splitlines() chunks = [] for node in ast.iter_child_nodes(tree): if isinstance(node, (ast.FunctionDef, ast.AsyncFunctionDef, ast.ClassDef)): start = node.lineno - 1 end = node.end_lineno chunks.append({ "content": "\n".join(lines[start:end]), "symbol": node.name, "file_path": str(path), "language": "python", "repo": "my-service", }) return chunks

Batch-ingest with the standard Weaviate batch pattern:

code_chunks = client.collections.get("CodeChunks")

with code_chunks.batch.dynamic() as batch: for py_file in Path("./src").rglob("*.py"): for chunk in chunk_python_file(py_file): batch.add_object(properties=chunk)

Weaviate vectorizes each chunk on insert via the configured text2vec-openai module, so there's no separate embedding step. Replace the vectorizer with text2vec-voyageai (Voyage's voyage-code-2 is purpose-built for code) or text2vec-cohere if you want a different embedding family.

Step 4: Chunk and ingest documentation

Prose chunks differently. Headings are real boundaries. A useful default is "split on H2, then on H3 if a section is too long":

import re

def chunk_markdown(text: str, max_chars: int = 1500) -> list[str]: sections = re.split(r"(?m)^## ", text) chunks = [] for section in sections: if len(section) docker-compose.yml /dev/null; then echo "STEP_1_READY" break fi sleep 2 done

Verify the MCP endpoint is reachable. Streamable HTTP requires an

initialize handshake first; tools/list then needs the Mcp-Session-Id

header returned by initialize.

SID=$(curl -s -i -X POST http://localhost:8080/v1/mcp \ -H 'Content-Type: application/json' \ -H 'Accept: application/json, text/event-stream' \ -d '{"jsonrpc":"2.0","id":0,"method":"initialize","params":{ "protocolVersion":"2025-03-26","capabilities":{}, "clientInfo":{"name":"curl","version":"1"}}}' \ | awk -F': ' 'tolower($1)=="mcp-session-id"{print $2}' | tr -d '\r\n')

if [ -z "$SID" ]; then echo "STEP_1_MCP_FAILED: no Mcp-Session-Id returned. MCP may be disabled." exit 1 fi

curl -sf -o /dev/null -X POST http://localhost:8080/v1/mcp \ -H 'Content-Type: application/json' \ -H 'Accept: application/json, text/event-stream' \ -H "Mcp-Session-Id: $SID" \ -d '{"jsonrpc":"2.0","id":1,"method":"tools/list"}' \ && echo "STEP_1_MCP_OK" \ || { echo "STEP_1_MCP_FAILED"; exit 1; }

Success criteria: stdout contains both STEP_1_READY and STEP_1_MCP_OK. If only STEP_1_READY shows, MCP is disabled or the version is wrong.

Step 2: Create collections and ingest data

Save the following as $WEAVIATE_DIR/ingest.py and run it with python3 ingest.py. Idempotent: collections that already exist are reused; objects are inserted (duplicates may accumulate, see note at end).

#!/usr/bin/env python3 """Ingest a codebase and its docs into Weaviate. Run after setup.sh.""" import ast import os import re import sys from pathlib import Path

import weaviate from weaviate.classes.config import Property, DataType, Tokenization, Configure

CODE_DIR = Path(os.environ["CODE_DIR"]).resolve() DOCS_DIR = Path(os.environ.get("DOCS_DIR", CODE_DIR)).resolve() REPO_LABEL = CODE_DIR.name

client = weaviate.connect_to_local() try:

---- Schema (idempotent) ----

if not client.collections.exists("CodeChunks"): client.collections.create( name="CodeChunks", properties=[ Property(name="content", data_type=DataType.TEXT, tokenization=Tokenization.WORD), Property(name="symbol", data_type=DataType.TEXT, tokenization=Tokenization.LOWERCASE), Property(name="file_path", data_type=DataType.TEXT, tokenization=Tokenization.FIELD), Property(name="language", data_type=DataType.TEXT, tokenization=Tokenization.FIELD), Property(name="repo", data_type=DataType.TEXT, tokenization=Tokenization.FIELD), ], vector_config=Configure.Vectors.text2vec_openai(), ) if not client.collections.exists("DocChunks"): client.collections.create( name="DocChunks", properties=[ Property(name="content", data_type=DataType.TEXT, tokenization=Tokenization.WORD), Property(name="title", data_type=DataType.TEXT, tokenization=Tokenization.WORD), Property(name="source_url", data_type=DataType.TEXT, tokenization=Tokenization.FIELD), ], vector_config=Configure.Vectors.text2vec_openai(), )

---- Code: AST-chunk every .py file ----

code = client.collections.get("CodeChunks") code_count = 0 with code.batch.dynamic() as batch: for py in CODE_DIR.rglob("*.py"): if any(part.startswith(".") or part in ("node_modules", "venv", ".venv", "pycache") for part in py.parts): continue try: source = py.read_text(encoding="utf-8") tree = ast.parse(source) except (SyntaxError, UnicodeDecodeError): continue lines = source.splitlines() for node in ast.iter_child_nodes(tree): if isinstance(node, (ast.FunctionDef, ast.AsyncFunctionDef, ast.ClassDef)): chunk = "\n".join(lines[node.lineno - 1:node.end_lineno]) batch.add_object(properties={ "content": chunk, "symbol": node.name, "file_path": str(py), "language": "python", "repo": REPO_LABEL, }) code_count += 1 print(f"STEP_2_CODE_CHUNKS={code_count}")

---- Docs: heading-chunk every .md file ----

docs = client.collections.get("DocChunks") doc_count = 0 with docs.batch.dynamic() as batch: for md in DOCS_DIR.rglob("*.md"): if any(part.startswith(".") for part in md.parts): continue try: text = md.read_text(encoding="utf-8") except UnicodeDecodeError: continue sections = re.split(r"(?m)^## ", text) for sec in sections: sec = sec.strip() if not sec: continue if len(sec) > 1500: for sub in re.split(r"(?m)^### ", sec): sub = sub.strip() if sub: batch.add_object(properties={ "content": sub, "title": md.stem, "source_url": f"file://{md.resolve()}", }) doc_count += 1 else: batch.add_object(properties={ "content": sec, "title": md.stem, "source_url": f"file://{md.resolve()}", }) doc_count += 1 print(f"STEP_2_DOC_CHUNKS={doc_count}")

if code_count + doc_count == 0: print("STEP_2_FAILED: no chunks ingested. Check CODE_DIR / DOCS_DIR paths.", file=sys.stderr) sys.exit(2) print("STEP_2_INGEST_OK") finally: client.close()

Success criteria: stdout contains STEP_2_INGEST_OK and the chunk counts are non-zero. If both counts are zero, the input paths point to empty or unsupported content.

Note on idempotency: re-running this script appends objects rather than upserting. For a clean re-ingest, delete the collections first:

curl -X DELETE http://localhost:8080/v1/schema/CodeChunks curl -X DELETE http://localhost:8080/v1/schema/DocChunks

Step 3: Wire up the LLM client

Pick the block matching the agent's host client. All three point at the same Weaviate endpoint.

Claude Code (preferred, single command):

claude mcp add weaviate http://localhost:8080/v1/mcp --transport http

If the claude CLI is not on PATH, write the config directly:

python3 -c ' import json, os, pathlib p = pathlib.Path(os.path.expanduser("~/.claude.json")) data = json.loads(p.read_text()) if p.exists() else {} data.setdefault("mcpServers", {})["weaviate"] = { "type": "http", "url": "http://localhost:8080/v1/mcp" } p.write_text(json.dumps(data, indent=2)) print("STEP_3_CLAUDE_OK") '

Cursor (~/.cursor/mcp.json):

python3 -c ' import json, os, pathlib p = pathlib.Path(os.path.expanduser("~/.cursor/mcp.json")) p.parent.mkdir(parents=True, exist_ok=True) data = json.loads(p.read_text()) if p.exists() else {} data.setdefault("mcpServers", {})["weaviate"] = {"url": "http://localhost:8080/v1/mcp"} p.write_text(json.dumps(data, indent=2)) print("STEP_3_CURSOR_OK") '

VS Code with Copilot (project-local .vscode/mcp.json):

mkdir -p .vscode cat > .vscode/mcp.json <<'JSON' { "servers": { "weaviate": { "type": "http", "url": "http://localhost:8080/v1/mcp" } } } JSON echo "STEP_3_VSCODE_OK"

Success criteria: the appropriate STEP_3_*_OK marker prints, and the next interactive turn surfaces a weaviate-query-hybrid tool in the client's tool list. If the tool does not appear, restart the client.

Step 4: Verify retrieval works

This block confirms end-to-end retrieval without involving the LLM client. Streamable HTTP again needs the initialize handshake first so the session ID can be reused on the actual tools/call. Replace the query string with something the agent expects to be in the ingested codebase.

curl -sX POST http://localhost:8080/v1/mcp \ -H 'Content-Type: application/json' \ -H 'Accept: application/json, text/event-stream' \ -H "Mcp-Session-Id: $SID" \ -d '{ "jsonrpc":"2.0", "id":1, "method":"tools/call", "params":{ "name":"weaviate-query-hybrid", "arguments":{ "collection_name":"CodeChunks", "query":"how does FastAPI handle dependency injection", "limit":3 } } }' | python3 -m json.tool

Success criteria: the response contains a result key with non-empty content and the matched chunks reference real files in $CODE_DIR. If the response is empty, ingest produced zero matchable chunks for this query — try a query you know is in the codebase.

Done

The agent now has hybrid search over the ingested codebase and docs. From the LLM client, queries that previously hallucinated will instead return citations from real files. To re-ingest after the codebase changes, delete the two collections (curl block above) and re-run ingest.py.

Going further

A few directions worth picking up after the basics work:

Multi-repo organizations: Switch the schema to multi-tenant and create one tenant per repo. The weaviate-tenants-list MCP tool gives the agent a way to discover them.

Auth and RBAC: Weaviate's MCP server respects standard authentication. Three RBAC permissions (read_mcp, create_mcp, update_mcp) control who can do what. Hand out read-only MCP credentials to most users; reserve write access for trusted agents.

Agent write-back: With write access on, the agent can persist its own findings (postmortem notes, cross-references it discovered, summaries of long-running work) back into Weaviate. The next session inherits them. This is what shifts Weaviate from a passive retrieval engine into long-term memory for the coding agent.

Embedding choice: text2vec-openai is fine for most code. Voyage's voyage-code-2 is better if you can swap it in. For fully local setups, point Weaviate at an Ollama-served embedding model.

Summary

A coding assistant that doesn't know your code is a generic chatbot. The build is straightforward: index the codebase along syntactic boundaries, index the docs along heading boundaries, point your LLM client at the database via MCP, and let the agent retrieve only what it needs per query.

Weaviate simplifies this setup by running the MCP server inside the database. Hybrid search and multi-tenancy are enabled in the MCP service by default and writing directly to the database is one env var away. Spin it up, point Claude Code at it, and start asking questions about your actual code.

To go deeper:

The Weaviate MCP server release notes

The weaviate/mcp-server-weaviate GitHub repository

The official MCP protocol docs

The tokenization post for the per-property tokenization rationale

Related blog posts: What is agentic RAG and Hybrid search explained

Ready to start building?

Check out the Quickstart tutorial, or build amazing apps with a free trial of Weaviate Cloud (WCD).

GitHub

Forum

X (Twitter)

Don't want to miss another blog post?

By submitting, I agree to the Terms of Service and Privacy Policy.