Jump to content

Automating Text Recognition

- - - - -

This topic has been archived. This means that you cannot reply to this topic.
1 reply to this topic

#1
Amber8

Amber8

    Newbie

  • Members
  • Pip
  • 5 posts
Hi everybody, I need to convert a bunch of pdf's to text searchable. The acrobat OCR function cant do it because the resolution is lower than the minimum required (144dpi). What I started doing is saving the pdf pages as image files, increasing the resolution in an imaging package then printing them again to pdf & doing the OCR. Obviously very repetitive & boring - i can think of much better things to do on Sat night LOL.
I was thinking of writing a script for it (using python since thats the only one I've played with in the past) but I was wondering if there exists already some piece of code to do this. I imagine its a common problem since there is a fair bit on the web talking about it but havent been able to find automated code to do it. Or if anyone has any ideas if any other language might be a better match for this??:confused:

#2
azer24

azer24

    Newbie

  • Members
  • Pip
  • 1 posts
Hi Amber8,

Have you tried using Advanced PDF Manager
Code wise I have seen PDF Indexer for Joomla in PHP.
Or something like PDF to Word converter may work.
Sorry I can't post links yet but hope this helps!!!