Scanning and Storing PDF’s Is A Bad Idea

I think scanning your paperwork and storing them as PDF’s on your server isn’t much better than just keeping the paper!  Now, many of you who have read other articles I have written or who have talked to me, know that I am totally sold on ECM (paper-less) technology.  So, why am I saying that storing PDF’s isn’t a whole lot better?  Because the reason we store documents, either paper or digitally, is so that we can find them when we need them.

Can’t find the file:

If you are scanning and storing PDF’s on a drive, it often becomes difficult to find the document you want.  You need to know the name of the document and the exact directory it is stored in.  This is not evident at first when you begin scanning, because you have only a few documents.  But, as time goes on, you have many documents, and not knowing the exact name, you may have to open each one at a time until you find the one you want.  This is not much different than rummaging through a four-drawer filing cabinet and it is just as time- consuming.

PDF’s can be a security risk:

I think using PDF’s as your archival format is not a good business decision because your PDF’s may contain dangerous code execution vulnerability or malware.  For example, a flaw in Adobe 8.1 could allow hackers to include dangerous code in PDF files to take control of Window computers.  Most current PDF formats except PDF/A allow Java scripts and executable files to be embedded into a PDF file.

Also, PDF’s stored on your network are vulnerable to ransomware attacks.

Can’t open PDF file:

Not long ago, I tried to open a PDF that was an important contract a client sent me.  I got the following message on my computer monitor:  “There was an error opening this document. The file is damaged and could not be repaired.”  I did a Google search on “corrupt PDF” and found nearly 73,000,000 hits.  I found that there are programs to repair corrupt files and even companies that specialize in attempting to recover severely corrupted files.  Luckily, the client was able to find a copy of the PDF that was not corrupt; otherwise, the details of the contract could have been permanently lost.

Can’t secure the PDF files:

Storing your important documents on your network seems very easy.  You can even develop a good directory structure that makes storing and filing PDF’s a snap.  But, securing it is much more difficult.  How do you keep someone from renaming, moving, viewing (if they shouldn’t have access), or editing archived PDF documents?

I recently met with an organization that had done a wonderful job scanning their important documents and storing them in a very organized manner on their network.  They had a new employee who was assigned the task of scanning.  Everything was going just fine until someone could not find an important document.  It was then that they noticed that many files had been inadvertently moved to the wrong directories.  They didn’t have a good naming convention, index, or anything that could help them determine which files were in wrong directories.  They had a mess on their hands.

Is PDF a really a PDF?

First off did you know there are many different formats of a PDF?  According to the Library of Congress, there are over 11 different PDF formats.  In addition, there also “true” PDFs and image only PDFs.  Many digital copiers and low-end scanners create an image only PDF and more sophisticated document scanners produce PDFs with the addition of a text layer.  And to make things even more complicated, the images in a PDF can be jpeg, png, tiff, or jpeg2000 format. A software engineer who develops tools for paper-less solutions told me if you take into account all these factors there are well over 32 different PDF formats.


So, what is the answer?

I don’t know how many times I have been told: “I don’t need an ECM (paper-less) system, I store my documents on PDF’s on my network.”  Remember, the reason you are storing documents is to find them when you need them!  An ECM (paper-less) systems allow you to find a document quickly without knowing the name or where it is stored.  With indexes and full-text search capability, in just seconds you have the document you need.  And, a good ECM system will natively store your documents as a TIFF.