GLiNER2-PII: 0.3B open-source PII model outperforms OpenAI's Privacy Filter
A new open-source model GLiNER2-PII with 0.3B parameters achieves state-of-the-art performance on PII detection, surpassing OpenAI's Privacy Filter on the SPY benchmark. It recognizes 42 entity types and is trained on a multilingual synthetic corpus. The model is publicly available on Hugging Face.
Article intelligence
Key points
- Open-source 0.3B parameter model for PII detection
- Outperforms OpenAI Privacy Filter on SPY benchmark
- Recognizes 42 entity types across languages
- Available on Hugging Face for research and deployment
Why it matters
This matters because open-source 0.3B parameter model for PII detection.
Technical impact
May affect model selection, inference cost, product capability, and evaluation benchmarks.
GLiNER2-PII: A Multilingual Model for Personally Identifiable Information Extraction - Pioneer AI by Fastino Labs
GLiNER2-PII: A Multilingual Model for Personally Identifiable Information Extraction
ABSTRACT
Reliable detection of personally identifiable information (PII) is increasingly important across modern data-processing systems, yet the task remains difficult: PII spans are heterogeneous, locale-dependent, context-sensitive, and often embedded in noisy or semi-structured documents. We present GLiNER2-PII, a small 0.3B-parameter model adapted from GLiNER2 and designed to recognize a broad taxonomy of 42 PII entity types at character-span resolution. Training such systems, however, is constrained by the scarcity of shareable annotated data and the privacy risks associated with collecting real PII at scale. To address this challenge, we construct a multilingual synthetic corpus of 4,910 annotated texts using a constraint-driven generation pipeline that produces diverse, realistic examples across languages, domains, formats, and entity distributions. On the challenging SPY benchmark, GLiNER2-PII achieves the highest span-level F1 among five compared systems, including OpenAI Privacy Filter and three GLiNER-based detectors. We publicly release the model on Hugging Face to support further research and practical deployment of open PII detection systems.
Article Link
GLiGuard: Schema-Conditioned Classification for LLM Content Moderation ›