Scanned PDFs are large because they contain image data (not text). Each page is stored as a high-resolution photograph. A single color page scanned at 300 DPI can be 8-10 MB, while text-based PDFs are typically under 100 KB per page. DPI, color mode, and compression settings dramatically impact file size.
Image Data vs. Text Data: The Fundamental Difference
The primary reason scanned PDFs are massive compared to regular PDFs is that they store image data instead of text data. When you scan a document, the scanner captures a photograph of each page, pixel by pixel.
Text-based PDFs store characters as vectors and fonts, which is extremely space-efficient. The word "document" in a text PDF takes only a few bytes. The same word in a scanned PDF requires thousands of pixels to represent, each storing color information.
| Format Type | Data Storage Method | Typical Size (per page) |
|---|---|---|
| Text PDF | Characters, fonts, vectors | 20-100 KB |
| Scanned PDF (300 DPI, B&W) | Image pixels (bitmap) | 1-2 MB |
| Scanned PDF (300 DPI, Color) | Image pixels with RGB data | 8-10 MB |
| Scanned PDF (600 DPI, Color) | High-res image pixels | 30-35 MB |
DPI (Dots Per Inch) Impact on File Size
DPI determines how many pixels are captured per inch of the scanned document. Higher DPI means more detail but exponentially larger files. The relationship is quadratic: doubling the DPI quadruples the file size.
Understanding DPI Calculations
For a standard US Letter page (8.5" × 11"):
| DPI Setting | Pixel Dimensions | Total Pixels | Approximate Size (Color) |
|---|---|---|---|
| 150 DPI | 1,275 × 1,650 | 2.1 million | 2-3 MB |
| 300 DPI | 2,550 × 3,300 | 8.4 million | 8-10 MB |
| 600 DPI | 5,100 × 6,600 | 33.6 million | 30-35 MB |
| 1200 DPI | 10,200 × 13,200 | 134.6 million | 120-140 MB |
• Reading text documents: 150-200 DPI is sufficient
• General office documents: 300 DPI is the sweet spot
• Archival quality: 400-600 DPI for preservation
• Photos or detailed graphics: 600+ DPI only when necessary
Color vs. Grayscale vs. Black & White
The color mode dramatically affects file size because it determines how much data is stored for each pixel.
Black & White (1-bit)
1 bit per pixel: Each pixel is either black or white. Smallest size, best for text-only documents. ~200-500 KB per page at 300 DPI.
Grayscale (8-bit)
8 bits per pixel: 256 shades of gray. Good for documents with photos or shading. ~1-2 MB per page at 300 DPI.
Color (24-bit RGB)
24 bits per pixel: 16.7 million colors (8 bits each for Red, Green, Blue). Largest size. ~8-10 MB per page at 300 DPI.
Key insight: Color images are approximately 3× larger than grayscale, and grayscale is 8× larger than black & white. For text documents, black & white mode can reduce file sizes by 95% with no loss of readability.
Compression: Uncompressed vs. JPEG vs. CCITT
Compression algorithms reduce file size by encoding image data more efficiently. Different compression types work better for different content.
Common PDF Image Compression Methods
| Compression Type | Best For | Quality | Size Reduction |
|---|---|---|---|
| Uncompressed | Archival/maximum quality | Perfect (lossless) | None (baseline) |
| CCITT Group 4 | Black & white text | Perfect (lossless) | 90-95% reduction |
| ZIP/Flate | Grayscale documents | Perfect (lossless) | 30-60% reduction |
| JPEG (High Quality) | Color photos | Slight loss | 70-80% reduction |
| JPEG (Medium Quality) | Color documents | Noticeable loss | 85-90% reduction |
| JPEG (Low Quality) | Web preview only | Significant loss | 95%+ reduction |
Many scanning software defaults to uncompressed or lightly compressed scans, resulting in unnecessarily large files. Applying JPEG compression at 80-85% quality can reduce color scans by 80% with minimal visible difference.
Multiple Pages Accumulate Size
Unlike text PDFs where each page adds minimal size, scanned PDFs accumulate megabytes per page. A 100-page document scanned in color at 300 DPI without compression can easily exceed 800 MB.
Estimated size = Pages × Per-Page Size
Examples:
• 50 pages × 8 MB (color, 300 DPI) = 400 MB
• 50 pages × 2 MB (grayscale, 300 DPI) = 100 MB
• 50 pages × 500 KB (B&W, 300 DPI) = 25 MB
• 50 pages × 100 KB (B&W compressed) = 5 MB
This is why scanning settings matter tremendously for multi-page documents. The difference between appropriate and excessive settings can be 100× the file size.
How to Reduce Scanned PDF File Size
1. Lower the DPI
• Change from 600 DPI to 300 DPI (75% reduction)
• For text-only: use 150-200 DPI (90% reduction)
2. Switch to Grayscale or B&W
• Color to grayscale: 70% reduction
• Color to black & white: 95% reduction
• Use B&W for text documents
3. Apply JPEG Compression
• Use 80-85% quality for color/grayscale
• 70-80% reduction with minimal quality loss
• Use CCITT compression for B&W (automatic)
4. Add OCR with Text Layer
• OCR creates searchable, smaller text layer
• Can reduce image resolution after OCR
• Makes content searchable and copyable
5. Use PDF Compression Tools
• Adobe Acrobat: "Reduce File Size" feature
• Online tools: Smallpdf, iLovePDF, Reformatly
• Ghostscript: command-line compression
• Preview (Mac): Export with "Reduce File Size"
Optimal Settings by Document Type
| Document Type | Recommended DPI | Color Mode | Compression |
|---|---|---|---|
| Text documents | 150-200 DPI | Black & White | CCITT Group 4 |
| Business documents | 300 DPI | Grayscale or B&W | JPEG 85% or CCITT |
| Documents with color logos | 300 DPI | Color | JPEG 80-85% |
| Photos or magazines | 300-400 DPI | Color | JPEG 85-90% |
| Archival/legal documents | 300-600 DPI | Grayscale or Color | Lossless (ZIP/Flate) |
OCR and Hybrid PDFs
Adding OCR (Optical Character Recognition) to scanned PDFs creates a hybrid document with both the scanned image and an invisible text layer. This makes PDFs searchable and can actually reduce file size in some cases.
Benefits of OCR for File Size
After running OCR, you can reduce the image quality/resolution while maintaining readability through the text layer. The text layer adds minimal size (similar to native PDFs), but allows you to use lower quality images as a "background."
For example: a 300 DPI color scan might be 8 MB per page. After OCR, you could reduce the image to 150 DPI at 70% JPEG quality (now 1 MB per page) while the invisible text layer maintains perfect searchability and copy-paste functionality.
Frequently Asked Questions
Why is my 10-page scan 50 MB?
Your scanner is likely set to high DPI (600+) in color mode without compression. At 600 DPI color with no compression, each page is approximately 30-35 MB uncompressed. Reduce to 300 DPI grayscale with JPEG compression to get under 1 MB per page.
Does reducing PDF size reduce quality?
It depends on the method. Lowering DPI or using lossy JPEG compression will reduce quality, but intelligently done, the quality loss is imperceptible for most uses. Going from 600 to 300 DPI is virtually unnoticeable on screen, and JPEG at 80-85% quality looks identical to the eye. Lossless compression (CCITT for B&W, ZIP for grayscale) reduces size without any quality loss.
Can I compress a PDF without losing text readability?
Absolutely. For text documents, scan in black & white mode at 300 DPI with CCITT compression. This maintains perfect text clarity while keeping file size to 200-500 KB per page. Black & white compression is lossless, so there's zero quality degradation.
What's the best free tool to compress scanned PDFs?
For online tools, Reformatly, Smallpdf, and iLovePDF offer free PDF compression. On desktop, Adobe Acrobat Reader DC (free version) has compression features. For Mac users, Preview's Export function with "Reduce File Size" works well. For advanced users, Ghostscript (free, command-line) provides the most control.
Should I scan documents at maximum DPI for future-proofing?
Only for archival/preservation purposes. For everyday business documents, 300 DPI is plenty for future needs. Ultra-high DPI (1200+) creates enormous files (100+ MB per page) that are difficult to store and share. Even professional archives typically use 400-600 DPI as the maximum. Remember: you can always rescan if higher quality is needed later, but massive files are hard to manage now.