What is it about?
Unix and Unix-like systems run most of the internet; developers, analysts, and scientists rely on Unix tools to do their work every day. The Unix tools are very good at processing line-oriented data---think tables, logs, comma-separated values, simple spreadsheets, etc. In the past, most data was line-oriented. But these days, a lot of data comes in so-called "semi-structured" formats, like JSON. JSON offers rich structure, like nested fields, but it isn't line-oriented. The Unix tools aren't very good at working with these formats. We built a tool called 'ffs' that can map a deeply-nested JSON file to a system of directories and files. The Unix tools---and the Unix shell in particular---excels at processing these directory structures. By phrasing structured data in terms of the filesystem's existing concepts, we can use our favored, trusty Unix tools to work with modern data.
Photo by A R on Unsplash
Why is it important?
The Unix shell is a great way to process data before analysis, and people continue to use it when processing line-oriented formats like comma-separated values (CSV). But there's a discontinuity: once some of your data comes in a modern format like JSON, you have to stop using the shell and start using some industrial-strength programming language. Our ffs tool smooths out this discontinuity, making it easy to explore and play with data in the early stages of analysis.
Read the Original
This page is a summary of: Files-as-Filesystems for POSIX Shell Data Processing, October 2021, ACM (Association for Computing Machinery), DOI: 10.1145/3477113.3487265.
You can read the full text:
The following have contributed to this page