

Your system may require 50 MB of virtual memory or more to scan the image. (21.59-by-27.94 cm) result in large images (25 MB) before compression. Pages scanned in 24-bit color, 300 dpi, at 8-1/2–by-11 in. For Adaptive Compression, 300 dpi is recommended for grayscale or RGB input, or 600 dpi for black-and-white input.
#Ocr scanner pdf full
When Recognize Text Using OCR is disabled, full 10-to-3000 dpi resolution range may be used, but the recommended resolution is 72 and higher dpi. Scan in black and white whenever possible. If a page has many unrecognized words or small text (9 points or smaller), try scanning at a higher resolution. At 150 dpi, OCR accuracy is slightly lower, and more font-recognition errors occur at 400 dpi and higher resolution, processing slows, and compressed pages are bigger.
#Ocr scanner pdf pdf
If you save the PDF using Save As, the scanned image may be compressed.įor most pages, black-and-white scanning at 300 dpi produces text best suited for conversion. If this image is appended to a PDF document, and you save the file using the Save option, the scanned image remains uncompressed. Lossless compressions can only be applied to monochrome images. To apply lossless compression to a scanned image, select one of these options under the Optimization Options in the Optimize Scanned PDF dialog box: CCITT Group 4 or JBIG2 (Lossless) for monochrome images. Also, input resolution higher than 600 dpi is downsampled to 600 dpi or lower. If you select Searchable Image or ClearScan for PDF Output Style, input resolution of 72 dpi or higher is required.

Var ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.LEAD)
#Ocr scanner pdf code
Here is some sample code using the Nuget package: using (var document = DocumentFactory.LoadFromFile("test.pdf", new LoadDocumentOptions())) This allows you to parse the text with only a few lines of code and have the SDK apply the OCR for you intelligently for you to extract the text. One such tool is the LEADTOOLS Document SDK. The best method would be to have a tool that will do the determination between image and document PDFs for you and apply OCR only when necessary. If the PDF is image based, then you will need to run an OCR process on it to extract the text. If the PDF is searchable, you should be able to just parse/extract the text directly from the PDF. PDFs can be searchable (documents) or image-based (scans).
