Easy Ways to Redact PDFs Using C#

Redacting a PDF is the process of removing sensitive or confidential information from PDF documents. Syncfusion’s .NET PDF library provides an easy way to redact PDF using C#.

Redaction isn’t just placing a colored box over text or an image. When we try copying text from under the colored area, we can still see the content, so it’s not redacted. Syncfusion provides a 100% true redaction, which means we completely remove the content from the document. Once the content is redacted, it cannot be undone. It is always a good idea to have a backup of the master document.

Syncfusion PDF Library helps customers reach GDPR compliance by safely removing customer information from a PDF document. You can now distribute files securely by permanently removing confidential information such as financial account numbers, social security numbers, customer email addresses, phone numbers, and credit card information.

The PDF redaction feature is also available in WinForms, WPF, ASP.NET Web Forms, and ASP.NET MVC. Syncfusion PDF Library provides customization options for the redacted area, so you can use colored boxes or leave the area blank. You can specify custom text or redaction codes to appear over the redacted area.

Let’s start with code to redact a PDF using C#

Already referencing the required assemblies from NuGet? Great! Now we need to add a namespace in our class, as in the following code sample.

using Syncfusion.Pdf;
using Syncfusion.Pdf.Graphics;
using Syncfusion.Pdf.Parsing;
using Syncfusion.Pdf.Redaction;

Here, we’ll just remove the email address from the PDF and leave the area blank.

Before redact text in PDF

PDF file before redaction

//Load a PDF document for redaction
PdfLoadedDocument ldoc = new PdfLoadedDocument("../../Input/RedactPDF.pdf");
//Get first page from document
PdfLoadedPage lpage = ldoc.Pages[0] as PdfLoadedPage;

//Create PDF redaction for the page
PdfRedaction redaction = new PdfRedaction(new RectangleF(340,120,140,20));

//Adds the redaction to loaded 1st page
lpage.Redactions.Add(redaction);

//Save the redacted PDF document to disk
ldoc.Save("RedactedPDF.pdf");

//Close the document instance
ldoc.Close(true);

As you can see in the screenshot, the email address in the PDF file is completely removed without any trace and you cannot find or select the redacted content.

Redacted PDF without overlay color

Redacted PDF without text and color

Redact PDF with fill color

Now, we’ll load the same PDF file and redact with red color. This will completely remove the content from the PDF and apply red color over the redacted area.

//Create PDF redaction for the page
PdfRedaction redaction = new PdfRedaction(new RectangleF(340,120,140,20), System.Drawing.Color.Red);

//Adds the redaction to loaded page
lpage.Redactions.Add(redaction);

Redacted PDF with red color

Redacted the text in PDF with red color

Redact PDF with code sets and entries

Certain PDF files, such as invoice, government official forms, contains text or images that are positioned at the fixed position in the PDF page. For example, employee addresses in W-4 tax forms will always be in the same place and can be redacted under the exemption code of US FOIA (b) (6).

//Create redaction area for redacting telephone number with code set.
RectangleF redactionBound = new RectangleF(50, 568, 120, 13);

PdfRedaction redaction = new PdfRedaction(redactionBound);
redaction.Appearance.Graphics.DrawRectangle(PdfBrushes.Black, new RectangleF(0, 0, redactionBound.Width, redactionBound.Height));
redaction.Appearance.Graphics.DrawString("(b) (6)", new PdfStandardFont(PdfFontFamily.Helvetica, 11), PdfBrushes.White, new PointF(0, 0));

//Adds the redaction to loaded page
lpage.Redactions.Add(redaction);

//Create redaction area for redacting address with code set.
RectangleF addressRedaction = new RectangleF(50, 592, 75, 13);
redaction = new PdfRedaction(addressRedaction);
redaction.Appearance.Graphics.DrawRectangle(PdfBrushes.Black, new RectangleF(0, 0, addressRedaction.Width, addressRedaction.Height));
redaction.Appearance.Graphics.DrawString("(b) (6)", new PdfStandardFont(PdfFontFamily.Helvetica, 11), PdfBrushes.White, new PointF(0, 0));
lpage.Redactions.Add(redaction);

Redact PDF with redaction code

Redacted the PDF content with code sets

Redact image in PDF – OCR

PDF Library provides another great feature— OCR a scanned document image in a PDF and redact PDF content using C#. Sometimes, we may have social security numbers (SSN), employee identification numbers, addresses, email IDs, in a scanned PDF file. In those cases, it is very hard to search manually for a specific pattern to redact it. Syncfusion offers an efficient way to find sensitive information in a PDF image using OCR and redact it from the PDF file.

Before redacting PDF
After PDF redaction
Redact a Social Security Number from the PDF image

To do this, install the Syncfusion.PDF.OCR.WPF from NuGet. Copy the Tesseract binaries and language data from the NuGet package location to your application and refer the path to your OCR processor. Add the following namespace and code snippet to your class.

using Syncfusion.OCRProcessor;
using Syncfusion.Pdf.Exporting;
//Initialize the OCR processor
using (OCRProcessor processor = new OCRProcessor(@"../../TesseractBinaries/3.02/"))
{
    //Load the PDF document 
    PdfLoadedDocument lDoc = new PdfLoadedDocument(@"../../Input/FormWithSSN.pdf");

    //Load the PDF page
    PdfLoadedPage loadedPage = lDoc.Pages[0] as PdfLoadedPage;
    //Language to process the OCR
    processor.Settings.Language = Languages.English;

    //Extract image and information from the PDF for processing OCR
    PdfImageInfo[] imageInfoCollection = loadedPage.ImagesInfo;

    foreach (PdfImageInfo imgInfo in imageInfoCollection)
    {
        Bitmap ocrImage = imgInfo.Image as Bitmap;
        OCRLayoutResult result = null;
        float scaleX = 0, scaleY = 0;
        if (ocrImage != null)
        {
            //Process OCR by providing loaded PDF document, Data dictionary and language
            string text = processor.PerformOCR(ocrImage, @"../../LanguagePack/", out result);

            //Calculate the scale factor for the image used in the PDF
            scaleX = imgInfo.Bounds.Height / ocrImage.Height;
            scaleY = imgInfo.Bounds.Width / ocrImage.Width;
        }
        
        //Get the text from page and lines.
        foreach (var page in result.Pages)
        {
            foreach (var line in page.Lines)
            {
                if (line.Text != null)
                {
                    //Regular expression for social security number
                    var ssnMatches = Regex.Matches(line.Text, @"(\d{3})+[ -]*(\d{2})+[ -]*\d{4}", RegexOptions.IgnorePatternWhitespace);
                    if (ssnMatches.Count >= 1)
                    {
                        RectangleF redactionBound = new RectangleF(line.Rectangle.X * scaleX, line.Rectangle.Y * scaleY,
                            (line.Rectangle.Width - line.Rectangle.X) * scaleX, (line.Rectangle.Height - line.Rectangle.Y) * scaleY);
                        
                        //Create PDF redaction for the found SSN location
                        PdfRedaction redaction = new PdfRedaction(redactionBound);

                        //Adds the redaction to loaded page
                        loadedPage.Redactions.Add(redaction);


                    }
                }
            }
        }
    }

    //Save the redacted PDF document in the disk
    lDoc.Save("RedactedPDF.pdf");
    lDoc.Close(true);

    Process.Start("RedactedPDF.pdf");
}

A sample demonstrating the available redaction options in Syncfusion PDF Library can be downloaded from RedactPDF.zip

Wrap up

As you can see, Syncfusion .NET PDF Library provides easy and advanced options to redact PDFs using C#. With Syncfusion PDF Library, you can automate the process to ensure customers’ sensitive information is redacted efficiently without manual work, before sharing with third parties.

To evaluate our PDF redaction using C#, try our online demo. Take a moment to peruse the documentation, where you’ll find other options and features, all with accompanying code examples.

If you have any questions or require clarification about these features, please let us know in the comments below. You can also contact us through our support forum or Direct-Trac. We are happy to assist you!

If you like this post, you may also like:

Share this post:

Comments (1)

[…] Easy Ways to Redact PDFs Using C# (George Livingston) […]

Leave a comment