Back to Blog
SecurityDecember 28, 20246 min read

PDF Redaction: Why Black Boxes Don't Actually Remove Data

Drawing black boxes over text in PDFs leaves sensitive data fully extractable. Here's the technical explanation of why, and what to do instead.

You've seen it a thousand times: a PDF with black rectangles covering social security numbers, addresses, or confidential terms. It looks redacted. It feels secure.

It's not.

Those black boxes are the digital equivalent of placing sticky notes over text—anyone can peel them back. This isn't a theoretical vulnerability. It's been exploited in court cases, government document leaks, and corporate data breaches.

Let me explain exactly why this happens and what you should do instead.

Inside a PDF File

To understand why black boxes fail, you need to understand how PDFs work internally.

A PDF isn't a simple image of a page. It's a structured document containing multiple types of data:

Content Streams: The actual text and graphics, encoded as a series of drawing commands. "Move to position X,Y, draw these characters in this font."

Annotations: Overlays added on top of content—highlights, sticky notes, stamps, and yes, those black rectangles people use for "redaction."

Metadata: Document properties like author, creation date, title, keywords, and revision history.

Form Data: Interactive field values, which persist even when the visible form changes.

Embedded Objects: Fonts, images, and other resources referenced by the content.

When you use a basic PDF editor to "redact" by drawing a black box, you're adding an annotation. The annotation sits in a separate layer above the content stream. The original text remains exactly where it was.

The Technical Reality

Here's a simplified view of what your "redacted" PDF actually contains:

Content Stream:
Position: 100, 200
Text: "SSN: 123-45-6789"

Annotation Layer:
Rectangle: 95, 195, 180, 215
Fill: Black

The annotation covers the text visually, but both exist independently in the file. Any tool that reads the content stream (which is all of them) can extract the text.

How Anyone Can Recover the Data

Method 1: Copy and Paste

Open the PDF in any reader. Click and drag to select text under the black box. Paste into a text editor. Done.

This works because selection tools read from the content stream, which still contains the text. The black annotation is ignored during selection.

Method 2: Search

Press Ctrl+F. Type a word you suspect is under the redaction. If it finds a match on that page, the "redaction" failed.

Method 3: Text Extraction

Run the file through any PDF text extraction tool:
- pdftotext (command line)
- Online PDF to Word converters
- Any PDF library (PyPDF2, pdf.js, etc.)

These tools output all text content, ignoring annotation layers entirely.

Method 4: PDF Editing Software

Open in Adobe Acrobat or any PDF editor. Select the black rectangle. Delete it. See the original text.

This is often the first thing forensic analysts try when reviewing documents.

Real Cases Where This Failed

This isn't theoretical. Here are documented failures:

2011 - TSA Security Procedures

The Transportation Security Administration released airport screening procedures with portions "redacted." The black boxes were copy-paste transparent, exposing security protocols.

2019 - Manafort Court Filing

Attorneys for Paul Manafort submitted a legal brief with redacted sections. Media outlets simply copied the text to reveal confidential details about meetings with Russian contacts.

Countless FOIA Requests

Government agencies regularly release "redacted" documents that aren't actually redacted. Journalists and researchers routinely extract the hidden text.

These examples involved professionals who should know better. If law firms and federal agencies get this wrong, anyone can.

What Real Redaction Looks Like

True redaction requires modifying the content stream itself, not adding a layer on top. Here's what must happen:

1. Remove text from the content stream

The actual character data must be deleted from the PDF's internal structure. The bytes representing "SSN: 123-45-6789" should no longer exist in the file.

2. Replace with visual indicator

Where the text was, add a black rectangle directly in the content stream (not as an annotation). This shows readers that content was intentionally removed.

3. Strip metadata

Remove document properties that might contain sensitive information: author, subject, keywords, comments, revision history.

4. Flatten annotations

Convert any remaining annotations into regular content so there are no separate layers.

5. Re-save/re-encode

The new file should be a clean PDF with no traces of the original content in its byte structure.

Tools That Actually Redact

Adobe Acrobat Pro

Adobe includes a proper "Redact" tool separate from annotation tools. Critical steps:

1. Tools > Redact > Mark for Redaction
2. Select content
3. Click "Apply Redactions" (this step actually removes the content)
4. Tools > Protect > Remove Hidden Information
5. Save

Most failures happen because people mark for redaction but forget to apply. Marking just flags content—it doesn't remove it.

ActuallyRedactPDF

We built [ActuallyRedactPDF](/) specifically because we were tired of seeing redaction failures. Our approach: flatten the entire document to images.

When you redact with our tool, the PDF becomes essentially a high-resolution scan. There's no text layer to extract because we destroy it entirely. The underlying text doesn't just get hidden—it ceases to exist.

Trade-off: the output isn't searchable or selectable. But when security matters, that's the point.

Print to Image Workflow

The manual but reliable method:

1. Print PDF to high-resolution images (300+ DPI)
2. Edit images in any image editor
3. Draw black rectangles over sensitive areas
4. Combine images into a new PDF

This guarantees text destruction because you're converting vector text to raster images. There is no text data to extract.

Verifying Your Redaction

Never trust—always verify. Before sending a "redacted" document:

1. The Copy Test

Open your redacted PDF. Try to select and copy text from under the black boxes. Paste into Notepad. If text appears, you've failed.

2. The Search Test

Ctrl+F for terms you know were in the sensitive areas. Any matches mean the text persists.

3. The Extraction Test

Run through a PDF-to-text converter. Check if sensitive content appears in the output.

4. The Annotation Check

In a PDF editor, try to select and delete the black rectangles. If they move independently of the content, they're annotations, not true redactions.

We built a free redaction checker that automates this verification. Upload your document and it tests for common failures.

The Bottom Line

A black box drawn over text in a PDF is not redaction. It's a visual coverup that any motivated person can defeat in seconds.

Real redaction means removing content from the file structure, not hiding it. If you're handling sensitive information—legal documents, medical records, financial data, personal information—use tools that actually delete the data.

Then verify. Because when redaction fails, the consequences can be severe.


Ready to properly redact? [ActuallyRedactPDF](/) permanently removes content instead of hiding it.

Try ActuallyRedactPDF

True PDF redaction that permanently removes content, not just hides it.