2026-06-08原文8 min readUpdated: 2026-06-08

Building Pakistan Notice Helper: A Small AI Tool for a Very Local Safety Problem

The author developed Pakistan Notice Helper, a safety-focused AI tool for the Hugging Face Build Small Hackathon, designed to help people in Pakistan understand suspicious messages. The tool uses a small model (Qwen3.5 4B) to analyze text or screenshots, providing risk labels, explanations, and safe next steps. It supports English and Urdu, with the Urdu mode featuring a right-to-left layout and Urdu-language assessments. The article shares lessons on model selection, prompting, Urdu UX, and using Codex for rapid development.

SourceHugging Face Blog

Back to Articles

Team Article Published June 8, 2026

Upvote

Abid Ali Awan

kingabzpro

build-small-hackathon

For the Hugging Face Build Small Hackathon, I wanted to build something practical, local, and useful beyond a demo.

The result is Pakistan Notice Helper, a safety-focused AI tool that helps people in Pakistan understand suspicious messages before they click a link, call a number, share an OTP, or make a payment.

The idea came from a common problem: people regularly receive messages that look like they are from banks, couriers, tax authorities, traffic police, utilities, mobile operators, or government departments. Some are real. Many are scams. The hard part is not always reading the message. The hard part is knowing what to do next.

Pakistan Notice Helper is not an authenticity checker. It does not claim that a message is officially genuine or fraudulent. Instead, it works as a triage tool. It accepts text or a screenshot and returns a risk label, a short explanation, visible red flags, and safe next steps.

Why this fits Build Small

The project fits the Backyard AI track because it focuses on a specific local problem: scam-style notices and suspicious messages in Pakistan.

Instead of building a large general-purpose assistant, I wanted to see how far a small model could go when the scope was clear, the product behavior was well-defined, and the interface was designed around real users.

I initially tested a larger Qwen model, but the final production choice became Qwen3.5 4B Q8 through llama.cpp. It passed all high-risk scam cases and both screenshot cases in my ten-case evaluation. That made it a practical choice for a small-model safety assistant.

The project uses:

Hugging Face Space → custom Gradio frontend → queued Gradio Server endpoint → Modal endpoint → CUDA llama.cpp → Qwen3.5 4B Q8 MTP GGUF + vision projector

This gave me a small-model stack that could handle both text and screenshots while staying below the hackathon’s 32B model limit.

What the app does

Pakistan Notice Helper supports both English and Urdu. This was one of the most important product decisions because suspicious messages in Pakistan are often written in English, Urdu, Roman Urdu, or a mix of all three.

Urdu mode is not just a translated interface. When a user switches to Urdu, the app changes the layout to right-to-left, translates the headings, labels, risk cards, validation messages, and result controls, and also asks the model to generate the assessment in clear Urdu script.

This means the user can submit a suspicious message and receive the full safety response in Urdu, including the risk label, explanation, red flags, safe next steps, and optional reply draft when appropriate. For a local safety tool, that matters because advice is easier to trust and act on when it is written in the language people are most comfortable using.

The app looks for warning signs such as:

urgent threats or account suspension language;

requests for OTPs, PINs, passwords, CVVs, CNIC details, or card data;

suspicious payment links or personal mobile numbers;

impersonation of banks, telecom companies, couriers, tax authorities, or police;

prizes, refunds, jobs, or benefits that require an advance fee.

The tool then gives users safer next steps, such as verifying through independently found official channels instead of using the link or phone number inside the suspicious message.

What I learned while building it

This project taught me that building with small models is less about chasing the highest benchmark score and more about finding the right balance between quality, speed, cost, and product safety.

Small models work best when the scope is clear

One of the biggest lessons was that small models can work surprisingly well when the task is carefully bounded.

Pakistan Notice Helper does not need to be a general scam investigator. It needs to identify visible risk signals, avoid overclaiming, and give safe next steps. That made the product scope, prompt design, and output contract just as important as the model itself.

The app is designed to say: this looks risky, here are the warning signs, and here is what you should do safely next. It is not designed to say: this is definitely real or definitely fake.

Starting with a larger model

I started with Qwen3.6 27B, and the quality was excellent. In my testing, it handled suspicious messages very well and produced strong, reliable explanations.

The problem was deployment cost and practicality. The model required much more VRAM, a larger GPU machine, and longer recovery time during cold starts. For a hackathon demo with irregular traffic, that was not ideal. It worked, but it was too expensive and heavy for the kind of small, focused tool I wanted to build.

In terms of quality, I would rate the larger model around 95/100 for this task. But quality alone was not enough. I also had to think about cost, speed, cold starts, and whether the app could stay responsive.

Testing smaller local options

After that, I tried moving to a much smaller vision-language model, MiniCPM-V 4.6 Q8, with the hope that it could run more locally and reduce serving cost.

That experiment did not work well. It was very slow on GPU, and when I tried running it through ZeroGPU, I ran into quota and runtime issues. Even when the interface showed that I still had around 35 minutes of quota left, the app did not behave reliably. I am still not fully sure what caused those issues, but it made the deployment unstable.

I then reverted and deployed the model through Modal. The deployment itself was fast and started responding within a few seconds, but the model quality was not good enough. It struggled with detecting suspicious messages and failed too many of my test cases, so I had to drop it.

Finding the “Goldilocks” model

I then looked through the small open-source model rankings on Artificial Analysis and found what became the best fit for this project: Qwen3.5 4B.

It was small enough to stay aligned with the Build Small spirit, fast enough for the app experience, and capable enough for the safety behavior I needed. Compared with Qwen3.6 27B, I would rate it around 80/100 for this task, while the larger model was closer to 95/100.

But the tradeoff made sense.

The 4B model was cheaper to serve, faster to load, easier to deploy, and practical on a smaller Modal machine. That balance of model quality, speed, cost, and cold-start behavior made it the “Goldilocks” model for Pakistan Notice Helper.

Prompting and output contracts mattered a lot

Some early versions failed in useful ways.

Thinking mode consumed the 500-token output budget before returning the final structured JSON, so I disabled thinking for production. One dense Roman Urdu screenshot reached the original completion limit, so image requests now receive a larger token budget.

Another model response suggested an official-looking domain that had not been verified. That was a serious product issue, so I updated the system prompt to forbid invented URLs, phone numbers, organizations, and facts.

These fixes made the system safer and more predictable. The model was not just being asked to “detect scams.” It was being asked to follow a strict safety contract.

Urdu UX needed real product work

The Urdu interface also needed more work than I expected.

Direct translations sounded unnatural. Some headings needed different line heights. Mixed Urdu and Latin model names could reorder unexpectedly. Mobile controls needed more vertical space, especially in right-to-left layout.

I also tested a bundled Nastaliq webfont. It looked beautiful in isolation, but inside the product UI it reduced readability and made the interface feel less consistent. I removed it and returned to a system Arabic font stack while keeping the improved Urdu copy and RTL layout.

These were not just design details. They affected whether the app felt clear, usable, and trustworthy.

The main lesson

The final lesson was that the best model for a product is not always the biggest model.

For this project, Qwen3.6 27B gave the best raw quality, but Qwen3.5 4B gave the best product balance. It was small, fast, affordable, and good enough for the clearly defined task.

That tradeoff is exactly what made the project feel right for Build Small.

Building with Codex

Codex helped me move much faster across the project, especially because this was not just a simple model demo. Pakistan Notice Helper needed a custom frontend, a Gradio backend, a Modal-hosted llama.cpp server, screenshot support, Urdu mode, tests, documentation, and a safer output pipeline.

The full code is available in the GitHub repository.

I used Codex as an engineering collaborator rather than only a code generator. It helped inspect the existing repository, implement changes, run tests, debug issues, update documentation, and keep the Modal, Gradio, and llama.cpp setup aligned with the deployed system.

One of the most useful parts was building a custom HTML, CSS, and JavaScript interface while still keeping Hugging Face Spaces compatibility through Gradio Server. Instead of using the default Gradio component layout, the app uses a product-style frontend that talks to Gradio’s queued API routes and SSE protocol in the background.

This made the final Space feel more like a real local safety tool than a standard model playground. Codex also helped with repeated UI refinements, including the English/Urdu switch, mobile layout fixes, result cards, cached examples, trace controls, and cleaner documentation for the deployment flow.

For me, the biggest benefit was speed of iteration. I could describe the product behavior I wanted, review the implementation, test it, and then keep refining the app until the frontend, backend, model endpoint, and safety constraints worked together.

Privacy-safe traces

I also added an optional public trace feature so people can understand how the app is being used without exposing private user content.

The trace option is visible inside the app and can be disabled before each request. When enabled, it records only limited request-level metadata, not the full user message or screenshot. Text is redacted and capped. Images are represented through fixed summaries and are not stored.

The trace excludes raw screenshots, links, identifiers, generated explanations, reply drafts, errors, credentials, and any free-form model output that could accidentally repeat private details.

I also published the trace dataset so people can review the schema and see what kind of metadata is shared.

You can view the dataset here:

Pakistan Notice Helper public traces

This matters because sensitive information can leak from more than the original input. A model explanation, reply draft, extracted phone number, URL, or exception message can repeat personal details even when the raw message is removed. To avoid that, the trace system only publishes limited categories, booleans, counts, and fixed summaries.

The app still sends live text and images to the private Modal endpoint for inference, so I do not present this as anonymous local inference. Users are warned not to submit sensitive personal data, and public trace sharing can be turned off before each request.

Results so far

The small evaluation suite is not a real-world accuracy estimate, but it was useful for regression testing.

The final evaluation reached:

Measurement Result

Initial strict passes 9 of 10

Initial average score 89.5/100

Final regression passes 10 of 10

Final regression average score 100/100

High-risk scam cases All passed

Screenshot cases Both passed

MTP draft acceptance 222 of 440 tokens

Draft acceptance rate 50.5%

The most important result was not the score itself. It was that a scoped 4B model could preserve the safety behavior I needed after prompt, output-contract, and UI fixes.

What I would build next

The next major feature would be an agentic verification workflow.

Right now, Pakistan Notice Helper stops at triage. It reads the submitted text or screenshot, identifies visible risk signals, and gives safe next steps. In the next version, I want the app to go one step further and help verify whether the notice is original, copied, or already being discussed online as a scam.

For that workflow, I plan to use Olostep for web search and web scraping. The agent could search the web for current scam warnings, check whether similar messages have been reported by other people, identify the organization being impersonated, and compare the claims against independently discovered official sources.

I also plan to use the OpenAI Agents SDK to manage this verification workflow. The agent would not simply browse randomly. It would follow a controlled process: extract the likely organization, search for related warnings, scrape relevant pages, rank sources, and then present evidence alongside the model’s assessment.

This workflow needs strict safety boundaries. The agent must never trust links, phone numbers, or contact details inside the suspicious message. It should only use independently discovered sources and clearly separate evidence from inference.

There is also a product tradeoff. The current app usually returns results within around 5 seconds, and around 9 seconds on cold start. Adding web search, scraping, source checking, and agent reasoning would increase both inference cost and response time. A full verification workflow could take closer to 30 seconds, which is much slower than the current triage experience.

That is why I kept the hackathon version focused on fast safety guidance. For suspicious messages, speed matters. The current version helps users pause and avoid risky actions quickly, while the future agentic version could provide deeper verification when users are willing to wait longer.

Final thoughts

Pakistan Notice Helper is small by design.

It does not try to solve every type of fraud. It does not claim to verify official notices. It does not replace banks, couriers, telecom providers, or government portals.

Instead, it helps a user pause, notice red flags, and take a safer next step.

I built this project over three intense days of writing, deploying, testing, improving, and smoothing the rough edges. Most of the progress came from spending time with the product itself: testing real cases, noticing small failures, fixing prompts, adjusting the interface, and thinking through the engineering constraints behind every decision.

That process made the project much better. Because I spent more time testing and planning than simply adding features, I had more space to think about model size, latency, cold starts, cost, safety boundaries, Urdu usability, and what a small model could realistically do well.

I learned a lot about building with small models. The biggest lesson was that small models become powerful when the problem is local, focused, and carefully designed. They may not solve everything, but they can still help with real problems when the scope is honest.

We are surrounded by scams, fraud, and suspicious messages in this digital world. Pakistan Notice Helper is not the full answer, but it is a small start: one light in a dark sky, built to help people slow down, stay safer, and make better decisions before it is too late.

That is the kind of small AI I wanted to build for this hackathon: local, focused, honest about its limits, and useful for the people it is designed for.

Project links

You can try the app, review the code, and inspect the public trace dataset here:

Web app: Pakistan Notice Helper on Hugging Face Spaces

YouTube demo: Pakistan Notice Helper Demo: Check Suspicious SMS, Bills, Bank Alerts, and Notices

GitHub repository: kingabzpro/pakistan-notice-helper

Public trace dataset: Pakistan Notice Helper traces on Hugging Face Datasets

The GitHub repository includes the application code, deployment setup, documentation, model experiment notes, and other project details. The Space hosts the live app, while the dataset shows the privacy-safe trace format used to publish limited request-level metadata without exposing raw user messages or screenshots.

Note: The app is intended as safety guidance, not legal, financial, banking, or government verification advice.

Datasets mentioned in this article 1

Spaces mentioned in this article 1