AI News HubLIVE
站内改写

Memgraph Ingester. Speed up your AI agent

Memgraph Ingester ingests Java codebase structure and engineering context into Memgraph as a queryable knowledge graph, enabling AI agents to reason over code and project knowledge via graph queries instead of raw text search — improving accuracy, reducing cost, and speeding up analysis. Supports parallel ingestion, watch mode, and integration with Claude, Codex, Gemini, and GitHub Copilot.

Article intelligence

EngineersIntermediate

Key points

  • Creates two graphs: Code (source structure) and Memory (engineering context) under a project scope.
  • Uses JavaParser for Java 25 syntax, with optional classpath for better symbol resolution.
  • Supports parallel ingestion (4-8 threads optimal) and watch mode for automatic re-ingestion.
  • Provides scripts to integrate with Claude, Codex, Gemini, and GitHub Copilot AI agents.

Why it matters

This matters because creates two graphs: Code (source structure) and Memory (engineering context) under a project scope.

Technical impact

May affect model selection, inference cost, product capability, and evaluation benchmarks.

Notifications You must be signed in to change notification settings

Fork 0

Star 4

Copy path

More file actions

More file actions

Latest commit

History

History

History

751 lines (588 loc) · 31 KB

Raw

Copy raw file

Download raw file

Outline

Memgraph Ingester. Speed up your AI agent!

Ingests the structural model of a Java codebase into Memgraph as a queryable code + memory knowledge graph, combining source structure with persistent engineering context (decisions, rules, findings, etc.).

Optionally paired with the Memgraph MCP server, this enables you AI agent to reason over both code and accumulated project knowledge via graph queries instead of raw text search — improving accuracy, reducing cost, and speeding up analysis.

Having MCP configured is not required: mgconsole utility can be used to query the graph directly which also decreases tokens usage.

You can use the code in this repo as-is, or fork it and customize it to your needs. Memgraph is free too. Please submit any issues or pull requests.

What it does

Memgraph Ingester creates two project-scoped graphs for a Java codebase:

A Code graph under (:Project)-[:CONTAINS]->(:Code)

A Memory graph under (:Project)-[:HAS_MEMORY]->(:Memory)

Every code and memory node is scoped by a project property, so multiple Java codebases can share the same Memgraph instance without collisions.

The Code graph stores Java source structure in a queryable, persistent form. The ingester walks the source tree with JavaParser and symbol resolution, then writes packages, files, classes, interfaces, annotations, methods, fields, inheritance, and within-project call relationships.

The parser is configured for Java 25 syntax. It should handle most sources written for earlier Java versions too, but JavaParser is not a javac replacement and may still miss unsupported or edge-case constructs.

The Memory graph stores durable engineering context: decisions, ADRs, rules, findings, tasks, risks, questions, ideas, and domain notes. Memory items can refer to stable :CodeRef nodes, which are resolved back to the current code graph after ingestion. This lets agents query both structure (code) and knowledge (memory) without relying only on raw text search.

See doc/MEMORY.md for the Memory usage guide with prompt examples and Cypher recipes. See SCHEMA.md for the full graph model.

Requirements

Required: Java 25 JRE to run

Required: Memgraph instance (or Docker)

Optional: Java 25 SDK, Maven 3.9+ to build

Optional: mgconsole

Quick start

Download the latest jar (v6.0.7 the latest for now)

wget https://github.com/ousatov-ua/memgraph-ingester/releases/download/v6.0.7/memgraph-ingester.jar

Run Memgraph

docker run -p 7687:7687 -p 7444:7444 --name memgraph memgraph/memgraph-mage:3.9.0

Ingest the project:

Without classpath libs (weaker resolving):

cd /path/to/your/java/project java -jar path/to/memgraph-ingester.jar \ --source path/to/src \ --bolt bolt://localhost:7687 \ --project my-project \ --wipe-project-code \ --wipe-project-memories \ --apply-schema

With classpath libs (better resolving). Example for Maven projects:

cd /path/to/your/java/project CP=$(mvn -q dependency:build-classpath -DincludeScope=test -Dmdep.outputFile=/dev/stdout 2>/dev/null) java -jar path/to/memgraph-ingester.jar \ --source path/to/src \ --bolt bolt://localhost:7687 \ --project my-project \ --wipe-project-code \ --wipe-project-memories \ --apply-schema \ --classpath "$CP"

Append knowledge for your agent

GitHub Copilot

curl -s https://raw.githubusercontent.com/ousatov-ua/memgraph-ingester/refs/heads/main/script/init-memgraph-github.sh \ | bash -s -- my-project

Claude

curl -s https://raw.githubusercontent.com/ousatov-ua/memgraph-ingester/refs/heads/main/script/init-memgraph-claude.sh \ | bash -s -- my-project

Codex

curl -s https://raw.githubusercontent.com/ousatov-ua/memgraph-ingester/refs/heads/main/script/init-memgraph-codex.sh \ | bash -s -- my-project

Gemini

curl -s https://raw.githubusercontent.com/ousatov-ua/memgraph-ingester/refs/heads/main/script/init-memgraph-gemini.sh \ | bash -s -- my-project

Enable MCP Memgraph for your AI agent (below you can find examples) OR put mgconsole in the path

Going further

Maven dependency (optional)

io.github.ousatov-ua memgraph-ingester

Start Memgraph

With Docker Compose:

cd memgraph-platform docker-compose up -d

Just Docker

docker run -p 7687:7687 -p 7444:7444 --name memgraph memgraph/memgraph-mage:3.9.0

Bolt listens on localhost:7687.

Build the ingester

git clone https://github.com/ousatov-ua/memgraph-ingester.git cd memgraph-ingester mvn clean package -Pshade -DskipTests

Produces a shaded fat JAR at target/memgraph-ingester.jar.

Or use published shaded fat JAR in releases page.

Apply the schema (one-time per Memgraph instance)

cat src/main/resources/io/github/ousatov/tools/memgraph/cypher/create-schema.cypher | mgconsole --host localhost --port 7687

Creates uniqueness constraints and lookup indexes for both the code graph and the memory graph. Safe to re-run — existing constraints are reported and skipped.

You can also use the CLI. This command will apply the schema to the memgraph database first, then ingest the project:

java -jar target/memgraph-ingester.jar \ --source /path/to/your/java/project/src/main/java \ --bolt bolt://localhost:7687 \ --project my-project \ --apply-schema

Next command will also wipe all data in the memgraph database first, then will apply the schema and ingest the project:

java -jar target/memgraph-ingester.jar \ --source /path/to/your/java/project/src/main/java \ --bolt bolt://localhost:7687 \ --project my-project \ --wipe-all \ --apply-schema

Ingest a project

This will wipe the Code graph for this project first:

java -jar target/memgraph-ingester.jar \ --source /path/to/your/java/project/src/main/java \ --bolt bolt://localhost:7687 \ --project my-project \ --wipe-project-code

This will wipe the Code and Memory graph for this project first:

java -jar target/memgraph-ingester.jar \ --source /path/to/your/java/project/src/main/java \ --bolt bolt://localhost:7687 \ --project my-project \ --wipe-project-code \ --wipe-project-memories

Verify

MATCH (p:Project)-[:CONTAINS]->(c:Code) RETURN p.name, c.sourceRoots, c.lastIngested;

You should see your project with a fresh lastIngested timestamp.

CLI options

Option Short Required Default Description

--source -s yes

Root directory to scan (e.g. src/main/java)

--bolt -b yes

Bolt URL, e.g. bolt://localhost:7687

--project -P yes

Logical project name. Namespaces all nodes.

--user -u no

Memgraph username (empty by default)

--pass -p no

Memgraph password (empty by default)

--threads -t no 1 Parser threads (default 1). Each thread gets its own Bolt session.

--wipe-project-code no no false Delete this project's code graph before ingesting

--wipe-project-memories no no false Delete this project's memory graph before ingesting

--apply-schema no no false Apply schema before ingesting

--wipe-all no no false Wipe all data (schema will be dropped first)

--incremental no no false Skip files whose last-modified timestamp matches the stored value

--watch -w no false Watch for changes in the source directory and automatically re-ingest

--classpath no no

Additional classpath entries (JARs) for symbol resolution, separated by the platform path separator. Improves CALLS edge and type resolution coverage.

--wipe-project-code only affects code nodes matching the given --project; other codebases in the same Memgraph instance are untouched, and the :Project anchor remains. --wipe-project-memories only affects memory nodes matching the given --project; the code graph and the :Project anchor remain.

Parallel ingestion

Large codebases ingest faster with multiple parser threads:

java -jar target/memgraph-ingester.jar \ --source /path/to/your/java/project/src/main/java \ --bolt bolt://localhost:7687 \ --project my-project \ --wipe-project-code \ --threads 8

Each thread holds its own JavaParser and its own Bolt session. The Driver itself is shared.

Realistic speedup — don't expect linear scaling. JavaParser work is CPU-bound and parallelizes well, but Memgraph Community serializes writes internally, so the write path bottlenecks quickly:

Threads Typical speedup Bottleneck

1 1× (baseline) Sequential parse + write

4 ~2.5–3× Write serialization starts

8 ~3–4× Diminishing returns

16+ ~3–4× Writes fully saturated

4–8 threads is the sweet spot on most machines. Values higher than your CPU core count rarely help.

Determinism note: with --threads > 1, file processing order is non-deterministic. MERGE is idempotent, so results are identical, but log order will vary between runs.

Watch mode

For active development, use --watch (or -w) to monitor the source directory for changes. The ingester will automatically re-ingest modified .java files, update call edges, and refresh code references whenever a change is detected:

java -jar target/memgraph-ingester.jar \ --source /path/to/your/java/project/src/main/java \ --bolt bolt://localhost:7687 \ --project my-project \ --watch

Watch mode uses Java's WatchService for efficient OS-level notifications and includes a small debounce delay to handle multiple rapid writes (e.g., from IDE saves). It recursively watches all subdirectories under the --source root.

Using with AI agents

This repo ships scripts designed to be dropped into any project that's been ingested. It tells AI agents how to scope queries to the right project, how the schema is shaped, when to reach for the graph vs. filesystem search, and how to use Memories for durable decisions and follow-up context.

Per-repo setup

CLAUDE

Use the bundled init-memgraph-claude.sh script, which fetches the template, substitutes the project name, and appends the result to the local CLAUDE.md

Run it from inside the repo you just ingested:

Point at the script in your local checkout

/path/to/memgraph-ingester/script/init-memgraph-claude.sh my-project

Or fetch-and-run straight from GitHub:

curl -s https://raw.githubusercontent.com/ousatov-ua/memgraph-ingester/refs/heads/main/script/init-memgraph-claude.sh \ | bash -s -- my-project

Commit the updated CLAUDE.md. Claude Code reads it on every session start.

CODEX

Use the bundled init-memgraph-codex.sh script, which fetches the template, substitutes the project name, and appends the result to the local AGENTS.md

Run it from inside the repo you just ingested:

Point at the script in your local checkout

/path/to/memgraph-ingester/script/init-memgraph-codex.sh my-project

Or fetch-and-run straight from GitHub:

curl -s https://raw.githubusercontent.com/ousatov-ua/memgraph-ingester/refs/heads/main/script/init-memgraph-codex.sh \ | bash -s -- my-project

Commit the updated AGENTS.md. Codex reads it on every session start.

GEMINI

Use the bundled init-memgraph-gemini.sh script, which fetches the template, substitutes the project name, and appends the result to the local AGENTS.md

Run it from inside the repo you just ingested:

Point at the script in your local checkout

/path/to/memgraph-ingester/script/init-memgraph-gemini.sh my-project

Or fetch-and-run straight from GitHub:

curl -s https://raw.githubusercontent.com/ousatov-ua/memgraph-ingester/refs/heads/main/script/init-memgraph-gemini.sh \ | bash -s -- my-project

Commit the updated AGENTS.md. Gemini reads it on every session start.

GITHUB COPILOT

Use the bundled init-memgraph-github.sh script, which fetches the template, substitutes the project name, and appends the result to the local AGENTS.md

Run it from inside the repo you just ingested:

Point at the script in your local checkout

/path/to/memgraph-ingester/script/init-memgraph-github.sh my-project

Or fetch-and-run straight from GitHub:

curl -s https://raw.githubusercontent.com/ousatov-ua/memgraph-ingester/refs/heads/main/script/init-memgraph-github.sh \ | bash -s -- my-project

Commit the updated AGENTS.md. GitHub

[truncated for AI cost control]