Dev

Anything dev-related; various posts should show below:

Bulk OCRing mixed content and exporting as PDF

This is more written as an aide-memoire to myself than anything. It’s a process I’m currently using for bulk-processing a set of documents of various forms (MS Word, PPT, PDF, LibreOffice etc), converting them all to PDF, running OCR on any embedded images and then sticking the end-result into Elasticsearch via Tika (not documented, plenty documentation elsewhere re this final step).

Making searchable screenshots

Note this requires Macports It’s 3am and you’re hunting for that graph you clipped a couple of weeks ago. Report is due in for 10 the next morning. You swore up and down you’d never do this again, but here we are, can’t find the file or the reference.

Memento project

Only because I keep forgetting where I’ve put the links: Memento project API details Wiki entry on Memento project todo: make a small utility that neatly presents the API to the end user