OCR
OCR for PDFs
To OCR all PDFs in a folder using Docker and ocrmypdf
:
- Place your input files in a
pdfs/
directory. - Run the following shell script to OCR each file using English and German (
eng+deu
) language support:
#!/bin/bash
# Directory containing PDFs
PDF_DIR="./pdfs"
# Output directory
OUTPUT_DIR="./ocr_output"
# Create output directory if it doesn't exist
mkdir -p "$OUTPUT_DIR"
# Iterate over all PDF files in the directory
for pdf_file in "$PDF_DIR"/*.pdf; do
# Get the base filename (without path)
base_name=$(basename "$pdf_file")
output_file="$OUTPUT_DIR/$base_name"
echo "Processing: $pdf_file"
docker run --rm -u $(id -u):$(id -g) \
-v "$(pwd)":/mnt jbarlow83/ocrmypdf \
--language eng+deu \
"/mnt/${pdf_file}" "/mnt/${output_file}"
echo "Saved OCR'd file to: $output_file"
done