AI News HubLIVE
站内改写

Show HN: AI/ML benchmark for local LLM inference and XGBoost training on GPU/CPU

An open-source benchmark suite that runs a full GPU/CPU benchmark with one command, covering Ollama LLM inference and XGBoost training, and generates an interactive HTML report.

Article intelligence

EngineersAdvanced

Key points

  • Supports Ollama LLM (3B-14B parameter models) and XGBoost training/inference benchmarks
  • Single command execution with automatic HTML report and Streamlit dashboard
  • Optional encrypted result upload to help build a reference database
  • Supports CPU and NVIDIA GPU; partial support for AMD GPU

Why it matters

This matters because supports Ollama LLM (3B-14B parameter models) and XGBoost training/inference benchmarks.

Technical impact

May affect model selection, inference cost, product capability, and evaluation benchmarks.

Notifications You must be signed in to change notification settings

Fork 0

Star 13

BranchesTags

Open more actions menu

Folders and files

NameName

Last commit message

Last commit date

Latest commit

History

70 Commits

70 Commits

benches

benches

images

images

results

results

.gitattributes

.gitattributes

.gitignore

.gitignore

.python-version

.python-version

LICENSE

LICENSE

README.md

README.md

ai_bench_suite.yaml

ai_bench_suite.yaml

ai_ml_bench_diagram.mermaid

ai_ml_bench_diagram.mermaid

bench_results_analysis_altair.html

bench_results_analysis_altair.html

bench_results_analysis_altair.ipynb

bench_results_analysis_altair.ipynb

public.pem

public.pem

pyproject.toml

pyproject.toml

run_suite.py

run_suite.py

uv.lock

uv.lock

Repository files navigation

Objective

One command → a full GPU/CPU benchmark & an interactive HTML report

You can now measure your consumer GPU and/or CPU performance on typical Artificial Intelligence and Machine Learning workloads in a controlled way, with some pre‑set reference results.

The reproducible benchmarks cover:

Ollama LLMs (token latency & throughput on various 3B → 14B parameter models)

XGBoost (training & inference on the HIGGS dataset, on 100k → 10M+ rows)

Everything is orchestrated by a single YAML file (ai_bench_suite.yaml) and a runner script (run_suite.py), so you can launch an entire set of tests with one command.

Results are visible:

immediately at the end of the benchmark, in a notebook that is automatically produced, with comparison against a handful of reference systems;

on a regularly updated Streamlit dashboard, to better interact with a growing number of results: https://ai-ml-gpu-bench.streamlit.app

Quick start

git clone https://github.com/albedan/ai-ml-gpu-bench cd ai-ml-gpu-bench uv run run_suite.py

For Ollama benchmarks, make sure Ollama is installed and running at http://localhost:11434. To automatically pull missing Ollama models during the full benchmark:

uv run run_suite.py --autopull

What happens during a run

A unique run_id is generated.

The benchmarks specified in the configuration YAML file are executed.

The results of each test are recorded in two separate CSVs for XGBoost and Ollama (if both are selected).

The Jupyter notebook is executed and exported to HTML; it opens automatically in the browser (the bars with a thick border are those from the just‑completed run).

If you’d like to help grow the reference result base, the two CSVs are encrypted (RSA 4096 bit) and uploaded to Filebin, submitting only technical data (opt-out available).

A daily ingestion process imports new results and publishes them to the Streamlit dashboard. More on the architecture underneath: https://allaboutdata.substack.com/p/benchmarking-ai-and-ml-on-local-cpugpus

What to expect: two examples

Multiple machines benchmarked on Deepseek-R1 14B via Ollama (Streamlit dashboard):

XGBoost tested on the full HIGGS dataset, both with GPU and CPU (Streamlit dashboard):

Official results are regularly updated and published on the Streamlit dashboard: https://ai-ml-gpu-bench.streamlit.app

For convenience, a quick Jupyter notebook is immediately shown at the end of the benchmark.

Get started!

Requirements

Make sure you have installed at least the must‑have components below

Requirement Why it’s needed How to install Required?

Python ≥ 3.13 Runtime for the scripts https://www.python.org/ Must

uv 0.8.x Super‑fast package manager & lock‑file generator https://docs.astral.sh/uv/getting-started/installation/ Must

CUDA ≥ 12.x GPU benchmark (XGBoost + CuPy, Ollama) NVIDIA Driver + https://developer.nvidia.com/cuda-downloads Optional (only if a GPU is selected in the YAML)

Ollama (running at http://localhost:11434) LLM benchmark via REST API https://ollama.com/download Optional (only if you want to test LLMs)

Ollama Models Models specified in ai_bench_suite.yaml (comment models to exclude them, verify installation with ollama list) https://ollama.com/library Optional (only if you want to test LLMs)

Environment setup

In a local folder, just clone this repository:

git clone https://github.com/albedan/ai-ml-gpu-bench

Python 3.13.* will be automatically installed (unless already present) via uv.

Configuration: ai_bench_suite.yaml

All benchmark parameters are in this YAML file.

System defaults for your machine name and hardware details will be used, unless you manually override any of these three fields.

machine_info: machine: "" # Choose your preferred computer name (default: hostname) cpu: "" # Please specify your CPU gpu: "" # Please specify your GPU

Commenting an LLM model entry in the ollama section will exclude it from the benchmark. If possible, leave them all as provided, to help collecting results on a standard set of LLMs.

For the active LLMs (i.e., uncommented), you can:

Have the benchmark automatically check and if necessary pull them for you, by using the flag --autopull. Consider that it could take a few minutes depending on your connection.

Verify manually if they’re available with ollama list and install them with ollama pull [model_name].

Every combination listed in rows × gpu (for XGBoost) and models × gpu (for Ollama) is run automatically during the benchmark.

Execution

A single command reads the configuration YAML file and orchestrates the test execution, logging and result visualization.

Simply run:

uv run run_suite.py

The first run may take a bit longer because uv will create the environment and install Python/package dependencies automatically. Ollama has to be already installed for LLM benchmarks. You can automatically pull the models specified in the YAML file just by adding --autopull.

--autopull downloads missing Ollama models, but it does not install or update Ollama itself.

At startup, the Ollama suite checks your installed Ollama version against the latest stable release and prints a warning if an update is recommended. The benchmark continues either way.

Consider also the option --fast for benchmarking only on a subset made of the fastest models.

Common commands

Goal Command Notes

Run the full benchmark suite uv run run_suite.py Default: XGBoost + Ollama

Run only Ollama benchmarks uv run run_suite.py --suite ollama Requires Ollama running at http://localhost:11434

Run only XGBoost benchmarks uv run run_suite.py --suite xgboost Useful if you do not want to run LLM tests

Run a faster Ollama subset uv run run_suite.py --suite ollama --fast Uses only the smaller/faster models from the YAML

Pull missing Ollama models automatically uv run run_suite.py --autopull Downloads models only; it does not install or update Ollama

Skip encrypted result upload uv run run_suite.py --no-upload-results Keeps all result files local

Privacy

📦 Detail

Result sharing If enabled, CSV results are encrypted with a public/private key scheme and uploaded to Filebin.

Uploaded data Only technical benchmark data is submitted. No prompts, model outputs, datasets, notebooks, or raw system files are uploaded.

Opt‑out Use --no-upload-results to skip encryption and upload entirely.

Output

CSV: result files xgb.csv and ollama.csv are written, one row per benchmark, with metrics and basic machine metadata.

Notebook (bench_results_analysis_altair.ipynb): executed and opened automatically in the browser. It lets you explore the newly obtained results and compare them with reference benchmarks.

❓ Q&A

Q A

Can I run just a subset of the benchmark? ✅ Sure! Just edit the file ai_bench_suite.yaml, for instance by commenting LLMs you don't want to try, or to run GPU or CPU only.

I don't have a GPU. Is it for me as well? ✅ Yes, you can run the benchmark as it is (it will automatically skip the GPU benchmarks).

Can I run the benchmark on an AMD GPU? 🟨 Partially, Ollama will leverage the GPU, while XGBoost will (likely) run on CPU only.

I have an Nvidia GPU, but XGBoost runs on CPU only ℹ️ Please verify the installation of CUDA toolkit by running nvidia-smi and nvcc -V in a terminal. The first verifies the existence of an Nvidia GPU, the second shows the running CUDA toolkit.

Can I run the bench on an old machine (10+ years)? ✅ Yes you can! I suggest to edit ai_bench_suite.yaml to include only smaller LLMs (phi3:3.8b and qwen3:4b). The benchmark was tested on a 15 years old Intel i5-560M and 8GB of RAM.

I am experiencing issues when downloading the sample dataset ℹ️ Please verify the system certificates. If you do not have Python 3.13 as system interpreter (i.e., it was installed automatically via uv), uv add pip-system-certs can solve the problem.

I have another problem ❗ Please open an issue here on Github.

🔚 Thanks for testing!

If you find an issue or have an idea, open an Issue – or, even better, a Pull Request!

Whenever possible, please keep result sharing enabled to help grow the references! 🚀

Happy benchmarking and experimenting!

About

A suite to benchmark CPU/GPU Python performance in training ML models and running local LLMs

Resources

Readme

License

MIT license

Uh oh!

There was an error while loading. Please reload this page.

Activity

Stars

13 stars

Watchers

0 watching

Forks

0 forks

Report repository

Releases 7

v0.6.3

Latest

May 11, 2026

+ 6 releases

Packages 0

Uh oh!

There was an error while loading. Please reload this page.

Contributors

Uh oh!

There was an error while loading. Please reload this page.

Languages

Python 95.7%

Mermaid 4.3%