What is FEATHER?

Feather is a portable file format built on Apache Arrow's memory specification, designed for fast reading and writing of data frames between Python (Pandas/Polars) and R. It uses columnar storage with Arrow's in-memory format directly written to disk, enabling zero-copy reads and memory-mapped access. Feather V2 (2020) uses the Arrow IPC format for broader language support. Extremely fast I/O - often 10-100x faster than CSV.

FEATHER is used for temporary data storage in data science workflows, sharing datasets between Python and R, caching intermediate results, and fast data frame serialization. Popular with data scientists using Pandas/Polars for quick saves during exploratory analysis. Not suitable for long-term archival (use Parquet instead) but perfect for fast iteration and language interoperability. Common in Jupyter notebooks for checkpointing work.

Did you know? Feather can read/write data frames 10-100x faster than CSV!

History

Wes McKinney (creator of Pandas) and Hadley Wickham (creator of tidyverse) collaborated to create Feather for seamless Python-R data exchange.

Key Milestones

  • 2016: Feather V1 announced
  • 2017: Pandas and R integration
  • 2020: Feather V2 based on Arrow IPC
  • 2021: Polars adoption
  • 2023: Widespread data science use
  • Present: Standard for fast I/O

Key Features

Core Capabilities

  • Lightning Fast: 10-100x faster than CSV
  • Zero-Copy: Memory-mapped reads
  • Columnar: Efficient storage
  • Python/R: Native support
  • Arrow-Based: Standard memory format
  • Compression: Optional LZ4/Zstd

Common Use Cases

Pandas

Fast DataFrame saves

R Integration

Python-R data exchange

Caching

Intermediate results

Notebooks

Jupyter checkpoints

Advantages

  • Extremely fast read/write
  • Zero-copy memory mapping
  • Perfect Python/R compatibility
  • Arrow ecosystem integration
  • Simple API
  • Lightweight format
  • Preserves all data types

Disadvantages

  • Not for long-term archival
  • Limited compression vs Parquet
  • Binary format (not human-readable)
  • Less ecosystem support than Parquet
  • No schema evolution
  • Designed for temporary use

Technical Information

Format Specifications

Specification Details
File Extension .feather, .arrow
MIME Type application/octet-stream
Version V2 (Arrow IPC format)
Storage Columnar
Compression LZ4, Zstd (optional)
Base Format Apache Arrow memory

Common Tools

  • Python: Pandas, Polars, PyArrow
  • R: arrow package, feather package
  • Julia: Feather.jl, Arrow.jl
  • Processing: Apache Spark (Arrow integration)