Workspace / More information

More information

New
How Maskify works

Your data, under your control from start to finish.

Maskify masks personal information in documents before sharing. We run the detection on our own infrastructure — no third parties — and we never store an un-masked document.

No clear-text documents

We never keep the original text. We only store the already-masked version.

You keep the map

The reversal map needed to recover the originals is generated in your browser and downloaded only to your disk.

Detection on our turf

Your text is processed on our own servers, never through external APIs or third parties.

A document's journey

Five steps. The only thing that ever leaves your browser is text passing through our detector — and it comes back masked.

01

1. You upload your document

The file is read right in your browser. It isn't written to disk or sent to any server yet.

02

2. We detect the private data

Your text passes through our own servers and we analyse it with two complementary layers: an in-house AI model (no third parties) that understands context, and rules with mathematical validation (IBAN, ID numbers, cards, API keys…). You choose per document whether to use both or rules only. The text lives in memory while processing: nothing is stored.

03

3. You choose what to mask

In the editor you toggle each type and each match. All editing happens in your browser — the draft only lives in this tab.

04

4. You download the result

The masked file and the reversal map are built in your browser and downloaded to your disk. The map never touches our servers.

05

5. We only store what's already masked

On download, we create a history entry with the already-masked document. The original is never kept, nor are any of the values detected in it.

The PDF case

PDFs come with layout, images and signatures worth keeping. To redact for real — without leaving the data hiding inside the file — we treat them in a special way: each page is turned into an image and the sensitive data is covered before saving it.

01

1. Reading the PDF

The PDF opens right in your browser. We pull out only the text and where each word sits; images, signatures and layout stay exactly as they were — none of it leaves your machine.

02

2. Editing as text

The editor treats the PDF like any text document: auto-detection, toggle types on/off, click each match. The original PDF stays safe in your browser and is never uploaded anywhere.

03

3. Export: PDF as image

When you download the redacted PDF, each page becomes an image with white rectangles over the sensitive data. The new file contains no original text — copy/paste recovers nothing.

04

4. The trade-off

You lose text selection and vector crispness: the redacted PDF is an image. In return you get real privacy — the data isn't hidden behind a rectangle, it simply isn't in the file. If you want the text too, grab the masked .txt.

05

5. Recover and re-redact

Your original PDF is kept only in this browser (the most recent ones). When you reopen it from History with your reversal map, we rebuild the redacted PDF exactly like the first time. From another browser or device, the editor offers to re-upload it.

What we do
  • We strip the sensitive text bytes: the redacted PDF holds none of the original data.
  • Images, signatures and layout are preserved on each page.
  • The original PDF never touches our servers — it lives only in your browser.
Trade-offs
  • The redacted PDF is an image: you can't select text in it.
  • Graphs and diagrams lose a bit of sharpness when turned into an image.
  • Each page weighs a little more than in the original PDF.
Recovery
  • Same browser: with the .json map we rebuild the redacted PDF instantly.
  • Different browser / device: the editor asks you to re-upload the original PDF along with the map.
  • No original PDF available: download the masked .txt or a text-only PDF.

The .docx case

Unlike PDF, the .docx stays editable: we remove the sensitive text bytes directly in the XML instead of rasterising the page. We cover body, headers, footers, notes and comments; optionally we also clear author metadata.

01

1. Reading the .docx

A .docx is a ZIP of XML. We unzip it in your browser and walk the parts that hold text: body, headers, footers, footnotes and comments. Nothing leaves the machine.

02

2. Detection on flat text

We concatenate the extracted text and run it through the same detector we use for TXT and PDF. Layout, fonts, tables and images stay untouched.

03

3. In-place replacement

Each match is replaced by splitting the matching XML run. Formatting around the mark is preserved. The original bytes of the sensitive data are removed from the file — unlike PDF, no rasterising is needed.

04

4. Author metadata (optional)

Word stores the author, the last editor and the company in file properties. If you tick «Clear author metadata» in the editor, those fields are emptied alongside the text redaction.

05

5. Re-edit

The original .docx is cached only in your browser (the most recent ones). When you reopen it from History with your reversal map, we rebuild the redacted .docx with whatever style you pick. From another browser, the editor offers to re-upload it.

What we do
  • We walk body, headers, footers, notes and comments.
  • We erase the content of tracked deletions and clear the author of comments.
  • Optional: we clear the Author, Last modified by and Company fields Word stores in file properties.
What we don't process
  • Text inside embedded images (no OCR).
  • The legacy binary .doc format: ask Word to save it as .docx first.
  • Documents with macros (.docm) or password-protected files.
Re-edit
  • Same browser: with the .json map we rebuild the redacted .docx in whatever style you pick.
  • Different browser / device: the editor asks you to re-upload the original .docx along with the map.
  • The downloaded .docx is still editable in Word/LibreOffice.

When the detector falls short

The model is good, but not infallible: unusual names, internal codes from your company, phrases only you know are delicate. For those cases each type in the editor has a «+» button — and, optionally, a memory tailored to you that respects privacy by design.

01

1. Select the text

If the detector misses a name, an internal code or any value only you know is sensitive, mark it by hand: drag the cursor over the text in the editor.

02

2. Click «+» on the type

Each data type (Person, Email, Phone, Secret…) has a «+» button. Your selection becomes a masked label of that type immediately.

03

3. Remember it?

Right after, a single prompt: «Remember for future documents?». We tell you exactly where it would be stored before you confirm.

04

4. Two routes, by data type

If the value has a structure (a phone, an IBAN, a prefixed key) we keep only the regex skeleton — the actual value never crosses the network. If it's free-form (a name, an address) the literal value stays in this browser and never leaves it.

05

5. Carried-over detection

Next time you load a document, your personal list runs on top of the standard detector. You manage all of it from your Profile — drop entries the moment you don't need them.

In your account

Derived patterns

Only the value's skeleton. The actual content you marked is discarded before leaving your browser.

  • Phone
    Mobiles +34 …
    /\b\+34\s\d{3}\s\d{3}\s\d{3}\b/g
  • Account
    IBAN ES…
    /\bES\d{2}(?:[\s-]?\d{4}){5}\b/g
  • Secret
    Keys sk_live_…
    /\bsk_live_[A-Za-z0-9_-]{16,28}\b/g
Available on any device you sign in from.
What's stored: just regex and labels. Zero values.
In this browser

Literal values

For free-form text that can't be generalized without leaking content. Never crosses the network.

  • Person
    «Juan Martínez Ortega»
    kept in IndexedDB
  • Person
    «Lucía Pereira Ruiz»
    kept in IndexedDB
  • Address
    «Calle Mayor 12, 28013 Madrid»
    kept in IndexedDB
Only on this device and this browser. Doesn't sync.
Clearing site data in your browser empties this list.
When it goes to your account

When the value has a reusable skeleton: prefix + digit groups (phones), country + digits (IBAN), known key prefix (sk_live_…).

  • We store only the regex pattern
  • The actual value never leaves your browser
  • Available from any device
When it stays local

When the value is free-form and can't be generalized without leaking the content: people's names, addresses, free phrases, unstructured codes.

  • The literal value is kept in IndexedDB
  • Never crosses the network
  • Lives only on this device and this browser
You manage the memory

Both lists (synced and local) are visible and removable from your Profile. Nothing gets locked in: one entry you no longer want disappears with a click.

Go to Profile →

Under the hood

Open stack, no tricks. You can inspect every piece.

openai/privacy-filter

OpenAI's open-source (Apache-2.0) token classifier for 8 categories of personal data. We run it on our own servers.

github.com/openai/privacy-filter →
Weights on Hugging Face

ONNX q4 quantization loaded once and reused for every request. No downloads from outside on each analysis.

huggingface.co/openai/privacy-filter →
Complementary regex layer

IBAN (mod-97), Spanish DNI/NIE (mod-23), Luhn-checked cards, known API key prefixes. The validation the model wasn't trained on.

Minimal storage

PostgreSQL holds your account and the already-masked documents. The original text never reaches disk. Each entity stores its position but its original value stays empty.

Try it with one of your documents

No signup required to mask. You only need a free account if you want to fetch the result from history later.

Upload a document