Files-as-Filesystems for POSIX Shell Data Processing

Michael Greenberg

doi:10.1145/3477113.3487265

What is it about?

Unix and Unix-like systems run most of the internet; developers, analysts, and scientists rely on Unix tools to do their work every day. The Unix tools are very good at processing line-oriented data---think tables, logs, comma-separated values, simple spreadsheets, etc. In the past, most data was line-oriented. But these days, a lot of data comes in so-called "semi-structured" formats, like JSON. JSON offers rich structure, like nested fields, but it isn't line-oriented. The Unix tools aren't very good at working with these formats. We built a tool called 'ffs' that can map a deeply-nested JSON file to a system of directories and files. The Unix tools---and the Unix shell in particular---excels at processing these directory structures. By phrasing structured data in terms of the filesystem's existing concepts, we can use our favored, trusty Unix tools to work with modern data.

Photo by A R on Unsplash

Why is it important?

The Unix shell is a great way to process data before analysis, and people continue to use it when processing line-oriented formats like comma-separated values (CSV). But there's a discontinuity: once some of your data comes in a modern format like JSON, you have to stop using the shell and start using some industrial-strength programming language. Our ffs tool smooths out this discontinuity, making it easy to explore and play with data in the early stages of analysis.

Perspectives

I like working in the shell, but many people are frustrated by how 'ancient' the shell feels. The problem isn't so much the shell itself, but the shell ecosystem. I'm excited by the prospect of rehabilitating and rejuvenating the shell by making it easy for people to work in 'modern' ways with this powerful, venerable tool.
Michael Greenberg
Stevens Institute of Technology

This page is a summary of: Files-as-Filesystems for POSIX Shell Data Processing, October 2021, ACM (Association for Computing Machinery),
DOI: 10.1145/3477113.3487265.
You can read the full text:

Read

Resources

Contributors

The following have contributed to this page

Michael Greenberg
Stevens Institute of Technology

Working with modern data formats using classic tools by mapping to Unix files and directories

What is it about?

Why is it important?

Perspectives

Resources

ffs: the file filesystem

PLOS pre-recorded presentation

Demo

Open access PDF (author version)

Contributors

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management

Working with modern data formats using classic tools by mapping to Unix files and directories

What is it about?

Featured Image

Why is it important?

Perspectives

Read the Original

Resources

ffs: the file filesystem

PLOS pre-recorded presentation

Demo

Open access PDF (author version)

Contributors

Share this page:

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management