Seeking Input on Creating a Searchable Digital Archive of Existing Contracts

I’m occasionally asked about how one might go about creating a searchable digital archive of a medium-sized company’s hard-copy contracts. It’s nothing I’ve had any direct experience with. If you have any suggestions, I’d be interested to hear them, and I suspect other readers would too.

About the author

Ken Adams is the leading authority on how to say clearly whatever you want to say in a contract. He’s author of A Manual of Style for Contract Drafting, and he offers online and in-person training around the world. He’s also chief content officer of LegalSifter, Inc., a company that combines artificial intelligence and expertise to assist with review of contracts.

10 thoughts on “Seeking Input on Creating a Searchable Digital Archive of Existing Contracts”

  1. One of the mid-sized companies I worked for used Laserfiche. In evaluating the options it was the most flexible and cost-effective means of creating a digital archive of paper contracts . They also used the program for other types of legal documentation. The archive did take quite a few person hours to set up. But the end product was a user-friendly, searchable database of agreements. I loved that it could be searched by any word in the agreement. Having the archive literally saved me hours per week.

    I’ve also worked with Sharepoint. It can be a good repository and costs are minimal if the company is already using Sharepoint. A fair amount of effort is required to image the documents. It’s also not really searchable by terms within the document (if the PDF or non-OCR image is used).

    The most labor intensive piece is getting the documents from paper to an electronic vesrion. Some vendors will image documents en masse and it’s worth checking around for prices.

  2. Set up an internal Wiki with the documents. Make sure you OCR any PDF documents, and you end up with a searchable, structured database. The ease of use in setting up and the ability for everyone to add documents makes it a good approach.

  3. I second Dave Taylor’s suggestion. They’re also completely searchable by any other search function that can look at the index (the newest versions of the WindowsOS can do it). The tricky part used to be scanning to searchable PDF. But it’s now built into Acrobat Pro.

    Having the documents themselves is but one part of the puzzle. I recommend grabbing a contract management system, like Novatus, in which to store the PDF’s – makes tracking them much easier (as Kristin suggests).

  4. Here’s what worked well for my admin and me when I was the (solo) GC for a publicly-traded software company:

    1. From Microsoft Word, print the signature version to PDF, with a running header showing a unique version code (I normally use a hand-created timestamp, e.g., “2009-11-02 0650 CST”)

    2. When the contract is signed, scan the signed signature page(s) and append it / them to the PDF. You now have a searchable PDF without the possible uncertainties of the OCR process.

    3. Save the PDF (with a suitable file name including the contract effective date) in a simple folder-tree subdirectory structure, organized by company name. We saved all our contracts that way, whether we were the vendor or the customer.

    This wasn’t especially ‘elegant,’ but it had the virtues of simplicity and low cost.

    It made it very easy for my admin to send sales people, etc., copies of specific contracts when asked.

    And when eventually we were acquired by the giant in our field, this filing system made it easy for us to create an electronic data room for due diligence purposes, and also to answer specific questions from the acquirer’s in-house lawyers and outside counsel.

  5. I second D.C. Toedt’s approach. For a few years now I have been sending all my contracts out for signature by emailing a PDF “printed” directly from Word to PDF. I have the other side email me a PDF scan of the signed contract, then I sign it, and then I swap in the scan of the fully signed signature page into the original PDF and email the fully signed contract back to them. This gives me a fully searchable (and easy to read) PDF of the signed contract. Not to mention it saves a lot of hassle and labor on exchanging hard copies.

  6. I am heartened to see suggestions that don’t involve buying into off-the-shelf document management solutions. Only misery and expense await down that road.

    I OCR all PDFs and use a tightly-sealed naming convention containing essential pieces of metadata (three or four depending on the contract type). All names are lower-case, keywords are separated by underscore and the date is always yyyy_mm_dd. No nested folders — everything goes into one folder and is backed up on the network drive. Retrieval is done via Google Desktop Search.

    DC and Andy: Are there concerns in terms of authentication related to swapping in pages? I can see why the answer would be no, but where the sig page is faxed back as is often the case you’ve got the body without the fax timestamp and the sig page with the fax timestamp.

  7. On all my contracts its is plain that the body of the contract was directly generated from a word processing document by software and the signature page was scanned. I don’t worry about someone using that fact as a basis for disputing the accuracy of the document because I have a record of the email I sent the other person with the original PDF for signature, and that is the PDF that I use to swap in the scanned signature page when it comes back. I have never had someone try to deny they signed a document (or deny that a particular provision was in what they signed), but if someone ever tries such a lame trick I plan to tell the person that: (1) I know that they signed the version I emailed them so no matter what they say I will continue to act on the belief that they signed it, (2) if the matter ever ends up in a lawsuit I will first box them into swearing under oath to their story and then hire the best computer forensics experts out there to go through all their company’s computers to disprove their sworn testimony, after which their credibility will be shot with the court and they will be unlikely to prevail in the action, and (3) if they now try and clean up or dispose of any of their computers it will look like an admission. It seems unlikely to me that many, if any, people would pursue a strategy of “that isn’t what I signed” in these days where computers retain digital fingerprints of everything one does. Who knows what computer forensics may, or may not, be able to find, and given that uncertainty denying a signature on PDF document seems like a very risky strategy.

  8. @Theo, I pretty much agree with @AndyFromTuscon — a good email trail should be more than enough to prove up the contents of the signed document, assuming the other side were foolish enough to make a big stink about it.

    Two other points in that regard:

    1. After I merge the scanned signature pages into the PDF document, I email the complete merged PDF to the other side WITH AN EMAIL RECEIPT REQUEST — not only is that an appropriate professional courtesy, it also gives you that much more of an email trail.

    2. Partly as an aid to authentication, but mainly to avoid confusion during negotiations, I make it a practice to include the substance of the file name, including the date AND time (NOT automatically generated) AND the time zone as a running header on every page of each draft of the document.

    So, for example, if I were sending out an NDA draft that I just finished, between ABC, Inc. and XYZ Corp., then the document would have a running header, in 8-point Arial, such as “ABC-XYZ-NDA-2009-11-24-0840-CST” — then the Word file name would be “ABC-XYZ-NDA-2009-11-24-0840-CST.docx” and the PDF file name (if this were a signature version) would be “ABC-XYZ-NDA-2009-11-24-0840-CST.pdf”

    Also, @Theo, I use dashes instead of underscores — I’m given to understand that search engines have an easier time parsing long strings into their components when the components are separated by a dash.

  9. Thanks for the clarifications on sig pages; DC, I plan on adopting your header practice. Interestingly, I use underscores for the precise reason you use dashes: I read on a techie blog that search engines are better able to discern underscore_separated text as metadata. I guess we’ll never know, but wouldn’t be surprised if they were on a par in achieving what we both want in terms of searchability.


Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.