Getting copies of annual returns and company information from Companies House is easy. Searching the data in those returns isn’t quite so easy. CH use a PDF format (PDF/A, akin to fax) that ensures maximum compatability.
$ echo "howdy" | cowsay _______ < howdy > ------- \ ^__^ \ (oo)\_______ (__)\ )\/\ ||----w | || ||
I had a small project to display some simple stats for, for some static content sitting in an AWS S3 bucket. I could have forwarded everything to Elastic+Kibana and showed some fancy graphs and charts, but I was only being asked for what I could easily produce via AWStats.
For S3 logging, awstats needs its LogFormat set up in the following manner: %other %extra1 %time1 %host %logname %other %method %url %methodurl %code %other %extra2 %bytesd %other %extra3 %refererquot %uaquot %other %other %other %other %other %virtualname %other Amazon’s documentation is available here
In my case, I had data stored in a Realm file that I needed to re-export to JSON. First of all we’ll need Realm Studio. Set up, open the Realm file and then export the models, as shown:
This is more written as an aide-memoire to myself than anything. It’s a process I’m currently using for bulk-processing a set of documents of various forms (MS Word, PPT, PDF, LibreOffice etc), converting them all to PDF, running OCR on any embedded images and then sticking the end-result into Elasticsearch via Tika (not documented, plenty documentation elsewhere re this final step).