Bulk OCRing mixed content and exporting as PDF

This is more written as an aide-memoire to myself than anything. It’s a process I’m currently using for bulk-processing a set of documents of various forms (MS Word, PPT, PDF, LibreOffice etc), converting them all to PDF, running OCR on any embedded images and then sticking the end-result into Elasticsearch via Tika (not documented, plenty documentation elsewhere re this final step).

Example of plotting ECG data using d3

See the UI here Following on from here and here, this is just putting together a couple of blocks from bl.ocks.org to plot data from the PhysioNet site. Read converting PhysioNet JSON to CSV to import other data sets.

Todo: d3 ECG

Another one for the todo list. Fancy plotting leads I/II/III out and making and interactive scroller for time vs vector / direction. Input data might be from here or here. Unfortunately both of these are for synthesising a single lead.

Making searchable screenshots

Note this requires Macports It’s 3am and you’re hunting for that graph you clipped a couple of weeks ago. Report is due in for 10 the next morning. You swore up and down you’d never do this again, but here we are, can’t find the file or the reference.