Memgraph Ingester. Speed up your AI agent
Memgraph Ingester ingests Java codebase structure and engineering context into Memgraph as a queryable knowledge graph, enabling AI agents to reason over code and project knowledge via graph queries instead of raw text search — improving accuracy, reducing cost, and speeding up analysis. Supports parallel ingestion, watch mode, and integration with Claude, Codex, Gemini, and GitHub Copilot.
Article intelligence
Key points
- Creates two graphs: Code (source structure) and Memory (engineering context) under a project scope.
- Uses JavaParser for Java 25 syntax, with optional classpath for better symbol resolution.
- Supports parallel ingestion (4-8 threads optimal) and watch mode for automatic re-ingestion.
- Provides scripts to integrate with Claude, Codex, Gemini, and GitHub Copilot AI agents.
Why it matters
This matters because creates two graphs: Code (source structure) and Memory (engineering context) under a project scope.
Technical impact
May affect model selection, inference cost, product capability, and evaluation benchmarks.
Notifications You must be signed in to change notification settings
Fork 0
Star 4
Copy path
More file actions
More file actions
Latest commit
History
History
History
751 lines (588 loc) · 31 KB
Raw
Copy raw file
Download raw file
Outline
Memgraph Ingester. Speed up your AI agent!
Ingests the structural model of a Java codebase into Memgraph as a queryable code + memory knowledge graph, combining source structure with persistent engineering context (decisions, rules, findings, etc.).
Optionally paired with the Memgraph MCP server, this enables you AI agent to reason over both code and accumulated project knowledge via graph queries instead of raw text search — improving accuracy, reducing cost, and speeding up analysis.
Having MCP configured is not required: mgconsole utility can be used to query the graph directly which also decreases tokens usage.
You can use the code in this repo as-is, or fork it and customize it to your needs. Memgraph is free too. Please submit any issues or pull requests.
What it does
Memgraph Ingester creates two project-scoped graphs for a Java codebase:
A Code graph under (:Project)-[:CONTAINS]->(:Code)
A Memory graph under (:Project)-[:HAS_MEMORY]->(:Memory)
Every code and memory node is scoped by a project property, so multiple Java codebases can share the same Memgraph instance without collisions.
The Code graph stores Java source structure in a queryable, persistent form. The ingester walks the source tree with JavaParser and symbol resolution, then writes packages, files, classes, interfaces, annotations, methods, fields, inheritance, and within-project call relationships.
The parser is configured for Java 25 syntax. It should handle most sources written for earlier Java versions too, but JavaParser is not a javac replacement and may still miss unsupported or edge-case constructs.
The Memory graph stores durable engineering context: decisions, ADRs, rules, findings, tasks, risks, questions, ideas, and domain notes. Memory items can refer to stable :CodeRef nodes, which are resolved back to the current code graph after ingestion. This lets agents query both structure (code) and knowledge (memory) without relying only on raw text search.
See doc/MEMORY.md for the Memory usage guide with prompt examples and Cypher recipes. See SCHEMA.md for the full graph model.
Requirements
Required: Java 25 JRE to run
Required: Memgraph instance (or Docker)
Optional: Java 25 SDK, Maven 3.9+ to build
Optional: mgconsole
Quick start
Download the latest jar (v6.0.7 the latest for now)
wget https://github.com/ousatov-ua/memgraph-ingester/releases/download/v6.0.7/memgraph-ingester.jar
Run Memgraph
docker run -p 7687:7687 -p 7444:7444 --name memgraph memgraph/memgraph-mage:3.9.0
Ingest the project:
Without classpath libs (weaker resolving):
cd /path/to/your/java/project java -jar path/to/memgraph-ingester.jar \ --source path/to/src \ --bolt bolt://localhost:7687 \ --project my-project \ --wipe-project-code \ --wipe-project-memories \ --apply-schema
With classpath libs (better resolving). Example for Maven projects:
cd /path/to/your/java/project CP=$(mvn -q dependency:build-classpath -DincludeScope=test -Dmdep.outputFile=/dev/stdout 2>/dev/null) java -jar path/to/memgraph-ingester.jar \ --source path/to/src \ --bolt bolt://localhost:7687 \ --project my-project \ --wipe-project-code \ --wipe-project-memories \ --apply-schema \ --classpath "$CP"
Append knowledge for your agent
GitHub Copilot
curl -s https://raw.githubusercontent.com/ousatov-ua/memgraph-ingester/refs/heads/main/script/init-memgraph-github.sh \ | bash -s -- my-project
Claude
curl -s https://raw.githubusercontent.com/ousatov-ua/memgraph-ingester/refs/heads/main/script/init-memgraph-claude.sh \ | bash -s -- my-project
Codex
curl -s https://raw.githubusercontent.com/ousatov-ua/memgraph-ingester/refs/heads/main/script/init-memgraph-codex.sh \ | bash -s -- my-project
Gemini
curl -s https://raw.githubusercontent.com/ousatov-ua/memgraph-ingester/refs/heads/main/script/init-memgraph-gemini.sh \ | bash -s -- my-project
Enable MCP Memgraph for your AI agent (below you can find examples) OR put mgconsole in the path
Going further
Maven dependency (optional)
io.github.ousatov-ua memgraph-ingester
Start Memgraph
With Docker Compose:
cd memgraph-platform docker-compose up -d
Just Docker
docker run -p 7687:7687 -p 7444:7444 --name memgraph memgraph/memgraph-mage:3.9.0
Bolt listens on localhost:7687.
Build the ingester
git clone https://github.com/ousatov-ua/memgraph-ingester.git cd memgraph-ingester mvn clean package -Pshade -DskipTests
Produces a shaded fat JAR at target/memgraph-ingester.jar.
Or use published shaded fat JAR in releases page.
Apply the schema (one-time per Memgraph instance)
cat src/main/resources/io/github/ousatov/tools/memgraph/cypher/create-schema.cypher | mgconsole --host localhost --port 7687
Creates uniqueness constraints and lookup indexes for both the code graph and the memory graph. Safe to re-run — existing constraints are reported and skipped.
You can also use the CLI. This command will apply the schema to the memgraph database first, then ingest the project:
java -jar target/memgraph-ingester.jar \ --source /path/to/your/java/project/src/main/java \ --bolt bolt://localhost:7687 \ --project my-project \ --apply-schema
Next command will also wipe all data in the memgraph database first, then will apply the schema and ingest the project:
java -jar target/memgraph-ingester.jar \ --source /path/to/your/java/project/src/main/java \ --bolt bolt://localhost:7687 \ --project my-project \ --wipe-all \ --apply-schema
Ingest a project
This will wipe the Code graph for this project first:
java -jar target/memgraph-ingester.jar \ --source /path/to/your/java/project/src/main/java \ --bolt bolt://localhost:7687 \ --project my-project \ --wipe-project-code
This will wipe the Code and Memory graph for this project first:
java -jar target/memgraph-ingester.jar \ --source /path/to/your/java/project/src/main/java \ --bolt bolt://localhost:7687 \ --project my-project \ --wipe-project-code \ --wipe-project-memories
Verify
MATCH (p:Project)-[:CONTAINS]->(c:Code) RETURN p.name, c.sourceRoots, c.lastIngested;
You should see your project with a fresh lastIngested timestamp.
CLI options
Option Short Required Default Description
--source -s yes
Root directory to scan (e.g. src/main/java)
--bolt -b yes
Bolt URL, e.g. bolt://localhost:7687
--project -P yes
Logical project name. Namespaces all nodes.
--user -u no
Memgraph username (empty by default)
--pass -p no
Memgraph password (empty by default)
--threads -t no 1 Parser threads (default 1). Each thread gets its own Bolt session.
--wipe-project-code no no false Delete this project's code graph before ingesting
--wipe-project-memories no no false Delete this project's memory graph before ingesting
--apply-schema no no false Apply schema before ingesting
--wipe-all no no false Wipe all data (schema will be dropped first)
--incremental no no false Skip files whose last-modified timestamp matches the stored value
--watch -w no false Watch for changes in the source directory and automatically re-ingest
--classpath no no
Additional classpath entries (JARs) for symbol resolution, separated by the platform path separator. Improves CALLS edge and type resolution coverage.
--wipe-project-code only affects code nodes matching the given --project; other codebases in the same Memgraph instance are untouched, and the :Project anchor remains. --wipe-project-memories only affects memory nodes matching the given --project; the code graph and the :Project anchor remain.
Parallel ingestion
Large codebases ingest faster with multiple parser threads:
java -jar target/memgraph-ingester.jar \ --source /path/to/your/java/project/src/main/java \ --bolt bolt://localhost:7687 \ --project my-project \ --wipe-project-code \ --threads 8
Each thread holds its own JavaParser and its own Bolt session. The Driver itself is shared.
Realistic speedup — don't expect linear scaling. JavaParser work is CPU-bound and parallelizes well, but Memgraph Community serializes writes internally, so the write path bottlenecks quickly:
Threads Typical speedup Bottleneck
1 1× (baseline) Sequential parse + write
4 ~2.5–3× Write serialization starts
8 ~3–4× Diminishing returns
16+ ~3–4× Writes fully saturated
4–8 threads is the sweet spot on most machines. Values higher than your CPU core count rarely help.
Determinism note: with --threads > 1, file processing order is non-deterministic. MERGE is idempotent, so results are identical, but log order will vary between runs.
Watch mode
For active development, use --watch (or -w) to monitor the source directory for changes. The ingester will automatically re-ingest modified .java files, update call edges, and refresh code references whenever a change is detected:
java -jar target/memgraph-ingester.jar \ --source /path/to/your/java/project/src/main/java \ --bolt bolt://localhost:7687 \ --project my-project \ --watch
Watch mode uses Java's WatchService for efficient OS-level notifications and includes a small debounce delay to handle multiple rapid writes (e.g., from IDE saves). It recursively watches all subdirectories under the --source root.
Using with AI agents
This repo ships scripts designed to be dropped into any project that's been ingested. It tells AI agents how to scope queries to the right project, how the schema is shaped, when to reach for the graph vs. filesystem search, and how to use Memories for durable decisions and follow-up context.
Per-repo setup
CLAUDE
Use the bundled init-memgraph-claude.sh script, which fetches the template, substitutes the project name, and appends the result to the local CLAUDE.md
Run it from inside the repo you just ingested:
Point at the script in your local checkout
/path/to/memgraph-ingester/script/init-memgraph-claude.sh my-project
Or fetch-and-run straight from GitHub:
curl -s https://raw.githubusercontent.com/ousatov-ua/memgraph-ingester/refs/heads/main/script/init-memgraph-claude.sh \ | bash -s -- my-project
Commit the updated CLAUDE.md. Claude Code reads it on every session start.
CODEX
Use the bundled init-memgraph-codex.sh script, which fetches the template, substitutes the project name, and appends the result to the local AGENTS.md
Run it from inside the repo you just ingested:
Point at the script in your local checkout
/path/to/memgraph-ingester/script/init-memgraph-codex.sh my-project
Or fetch-and-run straight from GitHub:
curl -s https://raw.githubusercontent.com/ousatov-ua/memgraph-ingester/refs/heads/main/script/init-memgraph-codex.sh \ | bash -s -- my-project
Commit the updated AGENTS.md. Codex reads it on every session start.
GEMINI
Use the bundled init-memgraph-gemini.sh script, which fetches the template, substitutes the project name, and appends the result to the local AGENTS.md
Run it from inside the repo you just ingested:
Point at the script in your local checkout
/path/to/memgraph-ingester/script/init-memgraph-gemini.sh my-project
Or fetch-and-run straight from GitHub:
curl -s https://raw.githubusercontent.com/ousatov-ua/memgraph-ingester/refs/heads/main/script/init-memgraph-gemini.sh \ | bash -s -- my-project
Commit the updated AGENTS.md. Gemini reads it on every session start.
GITHUB COPILOT
Use the bundled init-memgraph-github.sh script, which fetches the template, substitutes the project name, and appends the result to the local AGENTS.md
Run it from inside the repo you just ingested:
Point at the script in your local checkout
/path/to/memgraph-ingester/script/init-memgraph-github.sh my-project
Or fetch-and-run straight from GitHub:
curl -s https://raw.githubusercontent.com/ousatov-ua/memgraph-ingester/refs/heads/main/script/init-memgraph-github.sh \ | bash -s -- my-project
Commit the updated AGENTS.md. GitHub
[truncated for AI cost control]