Edit/remove text/strings in/from PDF files/documents

qubodup 2010-04-20

I really need to get rid of "Word" in a pdf file I have. "Word" is just one word with no spaces and it appears on multiple pages in the document.

My solution: Convert the pdf to ps, then remove/replace Word in the plaintext ps file, then convert it back to pdf.

  1. Get/install Ghostscript, Sed and Xpdf.
  2. Create the file removeWordFromPDF.bat

    @echo off
    if [%1]==[] goto help
    goto START
    
    :START
    echo Converting %1
    rem converting pdf to ps
    pdftops %1 %1temp.ps
    echo sed
    rem removing COPY and other strings
    sed -e s/(Word)/()/ ^
     < %1temp.ps > %1tempNoWord.ps
    echo ps2pdf
    rem converting ps to pdf
    ps2pdf %1tempNoWord.ps %1NoWord.pdf
    exit
    
    :HELP
    echo There is no input pdf file for removing a COPY string
  3. Note 1: If "Word" is part of a longer word, you will have to remove one or both of the parentheses in the sed line
  4. Note 2: If there are Unicode letters in "Word", you will need to convert the pdf to ps and open it in a text editor to find out what their code is.
  5. Either add ";C:\Program Files\Ghostscript\gs8.70\lib" (or whatever version, whatever location you have installed to) to "Path" (see below) or replace the pdf2ps/ps2pdf/sed commands in the script with full paths to the executables.

You can now drag&drop pdf files on the .bat file.

No comments:

Post a Comment