Show HN: AI/ML benchmark for local LLM inference and XGBoost training on GPU/CPU
An open-source benchmark suite that runs a full GPU/CPU benchmark with one command, covering Ollama LLM inference and XGBoost training, and generates an interactive HTML report.
Article intelligence
Key points
- Supports Ollama LLM (3B-14B parameter models) and XGBoost training/inference benchmarks
- Single command execution with automatic HTML report and Streamlit dashboard
- Optional encrypted result upload to help build a reference database
- Supports CPU and NVIDIA GPU; partial support for AMD GPU
Why it matters
This matters because supports Ollama LLM (3B-14B parameter models) and XGBoost training/inference benchmarks.
Technical impact
May affect model selection, inference cost, product capability, and evaluation benchmarks.
Notifications You must be signed in to change notification settings
Fork 0
Star 13
BranchesTags
Open more actions menu
Folders and files
NameName
Last commit message
Last commit date
Latest commit
History
70 Commits
70 Commits
benches
benches
images
images
results
results
.gitattributes
.gitattributes
.gitignore
.gitignore
.python-version
.python-version
LICENSE
LICENSE
README.md
README.md
ai_bench_suite.yaml
ai_bench_suite.yaml
ai_ml_bench_diagram.mermaid
ai_ml_bench_diagram.mermaid
bench_results_analysis_altair.html
bench_results_analysis_altair.html
bench_results_analysis_altair.ipynb
bench_results_analysis_altair.ipynb
public.pem
public.pem
pyproject.toml
pyproject.toml
run_suite.py
run_suite.py
uv.lock
uv.lock
Repository files navigation
Objective
One command → a full GPU/CPU benchmark & an interactive HTML report
You can now measure your consumer GPU and/or CPU performance on typical Artificial Intelligence and Machine Learning workloads in a controlled way, with some pre‑set reference results.
The reproducible benchmarks cover:
Ollama LLMs (token latency & throughput on various 3B → 14B parameter models)
XGBoost (training & inference on the HIGGS dataset, on 100k → 10M+ rows)
Everything is orchestrated by a single YAML file (ai_bench_suite.yaml) and a runner script (run_suite.py), so you can launch an entire set of tests with one command.
Results are visible:
immediately at the end of the benchmark, in a notebook that is automatically produced, with comparison against a handful of reference systems;
on a regularly updated Streamlit dashboard, to better interact with a growing number of results: https://ai-ml-gpu-bench.streamlit.app
Quick start
git clone https://github.com/albedan/ai-ml-gpu-bench cd ai-ml-gpu-bench uv run run_suite.py
For Ollama benchmarks, make sure Ollama is installed and running at http://localhost:11434. To automatically pull missing Ollama models during the full benchmark:
uv run run_suite.py --autopull
What happens during a run
A unique run_id is generated.
The benchmarks specified in the configuration YAML file are executed.
The results of each test are recorded in two separate CSVs for XGBoost and Ollama (if both are selected).
The Jupyter notebook is executed and exported to HTML; it opens automatically in the browser (the bars with a thick border are those from the just‑completed run).
If you’d like to help grow the reference result base, the two CSVs are encrypted (RSA 4096 bit) and uploaded to Filebin, submitting only technical data (opt-out available).
A daily ingestion process imports new results and publishes them to the Streamlit dashboard. More on the architecture underneath: https://allaboutdata.substack.com/p/benchmarking-ai-and-ml-on-local-cpugpus
What to expect: two examples
Multiple machines benchmarked on Deepseek-R1 14B via Ollama (Streamlit dashboard):
XGBoost tested on the full HIGGS dataset, both with GPU and CPU (Streamlit dashboard):
Official results are regularly updated and published on the Streamlit dashboard: https://ai-ml-gpu-bench.streamlit.app
For convenience, a quick Jupyter notebook is immediately shown at the end of the benchmark.
Get started!
Requirements
Make sure you have installed at least the must‑have components below
Requirement Why it’s needed How to install Required?
Python ≥ 3.13 Runtime for the scripts https://www.python.org/ Must
uv 0.8.x Super‑fast package manager & lock‑file generator https://docs.astral.sh/uv/getting-started/installation/ Must
CUDA ≥ 12.x GPU benchmark (XGBoost + CuPy, Ollama) NVIDIA Driver + https://developer.nvidia.com/cuda-downloads Optional (only if a GPU is selected in the YAML)
Ollama (running at http://localhost:11434) LLM benchmark via REST API https://ollama.com/download Optional (only if you want to test LLMs)
Ollama Models Models specified in ai_bench_suite.yaml (comment models to exclude them, verify installation with ollama list) https://ollama.com/library Optional (only if you want to test LLMs)
Environment setup
In a local folder, just clone this repository:
git clone https://github.com/albedan/ai-ml-gpu-bench
Python 3.13.* will be automatically installed (unless already present) via uv.
Configuration: ai_bench_suite.yaml
All benchmark parameters are in this YAML file.
System defaults for your machine name and hardware details will be used, unless you manually override any of these three fields.
machine_info: machine: "" # Choose your preferred computer name (default: hostname) cpu: "" # Please specify your CPU gpu: "" # Please specify your GPU
Commenting an LLM model entry in the ollama section will exclude it from the benchmark. If possible, leave them all as provided, to help collecting results on a standard set of LLMs.
For the active LLMs (i.e., uncommented), you can:
Have the benchmark automatically check and if necessary pull them for you, by using the flag --autopull. Consider that it could take a few minutes depending on your connection.
Verify manually if they’re available with ollama list and install them with ollama pull [model_name].
Every combination listed in rows × gpu (for XGBoost) and models × gpu (for Ollama) is run automatically during the benchmark.
Execution
A single command reads the configuration YAML file and orchestrates the test execution, logging and result visualization.
Simply run:
uv run run_suite.py
The first run may take a bit longer because uv will create the environment and install Python/package dependencies automatically. Ollama has to be already installed for LLM benchmarks. You can automatically pull the models specified in the YAML file just by adding --autopull.
--autopull downloads missing Ollama models, but it does not install or update Ollama itself.
At startup, the Ollama suite checks your installed Ollama version against the latest stable release and prints a warning if an update is recommended. The benchmark continues either way.
Consider also the option --fast for benchmarking only on a subset made of the fastest models.
Common commands
Goal Command Notes
Run the full benchmark suite uv run run_suite.py Default: XGBoost + Ollama
Run only Ollama benchmarks uv run run_suite.py --suite ollama Requires Ollama running at http://localhost:11434
Run only XGBoost benchmarks uv run run_suite.py --suite xgboost Useful if you do not want to run LLM tests
Run a faster Ollama subset uv run run_suite.py --suite ollama --fast Uses only the smaller/faster models from the YAML
Pull missing Ollama models automatically uv run run_suite.py --autopull Downloads models only; it does not install or update Ollama
Skip encrypted result upload uv run run_suite.py --no-upload-results Keeps all result files local
Privacy
📦 Detail
Result sharing If enabled, CSV results are encrypted with a public/private key scheme and uploaded to Filebin.
Uploaded data Only technical benchmark data is submitted. No prompts, model outputs, datasets, notebooks, or raw system files are uploaded.
Opt‑out Use --no-upload-results to skip encryption and upload entirely.
Output
CSV: result files xgb.csv and ollama.csv are written, one row per benchmark, with metrics and basic machine metadata.
Notebook (bench_results_analysis_altair.ipynb): executed and opened automatically in the browser. It lets you explore the newly obtained results and compare them with reference benchmarks.
❓ Q&A
Q A
Can I run just a subset of the benchmark? ✅ Sure! Just edit the file ai_bench_suite.yaml, for instance by commenting LLMs you don't want to try, or to run GPU or CPU only.
I don't have a GPU. Is it for me as well? ✅ Yes, you can run the benchmark as it is (it will automatically skip the GPU benchmarks).
Can I run the benchmark on an AMD GPU? 🟨 Partially, Ollama will leverage the GPU, while XGBoost will (likely) run on CPU only.
I have an Nvidia GPU, but XGBoost runs on CPU only ℹ️ Please verify the installation of CUDA toolkit by running nvidia-smi and nvcc -V in a terminal. The first verifies the existence of an Nvidia GPU, the second shows the running CUDA toolkit.
Can I run the bench on an old machine (10+ years)? ✅ Yes you can! I suggest to edit ai_bench_suite.yaml to include only smaller LLMs (phi3:3.8b and qwen3:4b). The benchmark was tested on a 15 years old Intel i5-560M and 8GB of RAM.
I am experiencing issues when downloading the sample dataset ℹ️ Please verify the system certificates. If you do not have Python 3.13 as system interpreter (i.e., it was installed automatically via uv), uv add pip-system-certs can solve the problem.
I have another problem ❗ Please open an issue here on Github.
🔚 Thanks for testing!
If you find an issue or have an idea, open an Issue – or, even better, a Pull Request!
Whenever possible, please keep result sharing enabled to help grow the references! 🚀
Happy benchmarking and experimenting!
About
A suite to benchmark CPU/GPU Python performance in training ML models and running local LLMs
Resources
Readme
License
MIT license
Uh oh!
There was an error while loading. Please reload this page.
Activity
Stars
13 stars
Watchers
0 watching
Forks
0 forks
Report repository
Releases 7
v0.6.3
Latest
May 11, 2026
+ 6 releases
Packages 0
Uh oh!
There was an error while loading. Please reload this page.
Contributors
Uh oh!
There was an error while loading. Please reload this page.
Languages
Python 95.7%
Mermaid 4.3%