Making searchable screenshots

Note this requires Macports

It’s 3am and you’re hunting for that graph you clipped a couple of weeks ago. Report is due in for 10 the next morning. You swore up and down you’d never do this again, but here we are, can’t find the file or the reference.

If only you could search the text in the actual images themselves.

Well then, here we are, all you need is a sudo port install tesseract and execute this against the directory containing all your images. It’ll perform OCR (character recognition) on all the text in each file and then give you PDFs out the other end. These are fully searchable, and should appear in Spotlight results as soon as they’ve been indexed.

#!/bin/bash

set -f
set -e

if [ ! -f "$1" ] ; then
    echo "File $1 does not exist" ; exit 1
fi

tesseract -l eng --psm 3 "$1" "$(date -r "$1" +"ss_%Y%m%d%H%M%S")" pdf

trash -v -F "$1"