this post was submitted on 23 Jul 2024

32 points (86.4% liked)

Programming

17364 readers

193 users here now

Welcome to the main community in programming.dev! Feel free to post anything relating to programming here!

Cross posting is strongly encouraged in the instance. If you feel your post or another person's post makes sense in another community cross post into it.

Hope you enjoy the instance!

Rules

Follow the programming.dev instance rules
Keep content related to programming in some way
If you're posting long videos try to add in some form of tldr for those who don't want to watch videos

Wormhole

Follow the wormhole through a path of communities [email protected]

founded 1 year ago

MODERATORS

[email protected]

FAMF: Files As Metadata Format (prma.dev)

submitted 3 months ago by [email protected] to c/[email protected]

44 comments fedilink hide all child comments

YAML and TOML suck. Long live the FAMF!

top 44 comments

sorted by: hot top controversial new old

[–] [email protected] 5 points 3 months ago (1 children)

thanks, i hate it

[–] [email protected] 2 points 3 months ago

Sure thing! Awesome!

[–] [email protected] 1 points 3 months ago (1 children)

I'm a bit skeptical about the performance penalty. I know there's a benchmark but I didn't see any details of what was actually benchmarked and where. Windows (AFAIK) still has notoriously slow directory traversal operations. God forbid you're using SSHFS or even NFS. I've seen things with hundreds of YAML nodes before.

Benchmarking this is also tricky because the OS file cache will almost certainly make the second time faster than the first (and probably by a lot).

Also just the usability... I think opening a file to change one value is extreme. You also still have the problem of documentation... Which sure you can solve by putting that in another file, but... You can also do that with just plain old JSON.

I think in the majority of languages, writing a library to process these files would also be more complicated than writing a JSON parser or using an existing library.

Also how do you handle trailing whitespace inserted by a text editor? Do you drop it? Keep it? It probably doesn't matter as long as the configuration is just for a particular program. The program just needs to document it... But then you've got ambiguities between programs that you just don't have to worry about with TOML or JSON.

[–] [email protected] 1 points 3 months ago (1 children)

OK so, you are very much right. You should definitely benchmark it using a simulation of what your data might look like. It should not be that hard. Just make script, that creates bunch of files similar to your data. About the trailing white space, when I am in terminal I just use sed to remove the latest '\n' and in rust I just use .trim(), in go I think there is strings.trim(). It is honestly not that hard. The data structure and parser is not formed the same way as the json, where you have to parse the whole thing. So you don't have to. You just open the files you need read their content. It is a bit more difficult at first since you can't just translate a whole struct directly, but it pays for itself when you want to migrate the data to a new format. So if your structure never changes, probably those formats are easier.

[–] [email protected] 0 points 3 months ago (1 children)

You should definitely benchmark it using a simulation of what your data might look like. It should not be that hard. Just make script, that creates bunch of files similar to your data.

Right, it's just kind of a thing to think about. If your program is something that might conceivably be used of sshfs (as an example) ... this is probably not a great option for your program's configuration.

The data structure and parser is not formed the same way as the json, where you have to parse the whole thing. So you don’t have to. You just open the files you need read their content. It is a bit more difficult at first since you can’t just translate a whole struct directly, but it pays for itself when you want to migrate the data to a new format. So if your structure never changes, probably those formats are easier.

Well a very common thing is to create a "config" object that lives in the long running process (and in some cases can be reloaded without restarting the program).

That model also saves you from unnecessary repeated IO operations (without one off caching and reloading mechanisms) and allows you to centralize any validation (which also means you can give configuration errors on start up).

I do wish various formats were more "streaming" friendly, but configuration isn't really one of them.

In a lot of languages moving between formats is also fairly trivial because the XYZ markup parser parses things into an object map and the ZYK markup writer can write an object map into ZYK format.

Maybe I'm not understanding what you mean by migrating the data to a new format though.

[–] [email protected] 1 points 3 months ago

OK so, for example if you have to change the structure of the configuration file, in a statically typed language. You have to have two representation of the data, the old one, and the new one.You have to first deserialize the data, in the old format, then convert it back to the new format, then replace the old files. The FAMF alternative, allows you just to easily use copy and paste and delete to achieve the same goal. Please keep in mind that you can just make configuration data structure that you can keep in-memory. It is just that the representation of the persisted information is spread between different files and not just one file.

[–] [email protected] 10 points 3 months ago (1 children)

This post misses the entire point of JSON/TOML/YAML and the big advantage it has over databases: readability.

Using a file based approach sounds horrible. Context gets lost very easily, as I need to browse and match outputs of a ton of files to get the full picture, where the traditional methods allow me to see that nearly instantly.

I also chuckled at the exact, horribly confusing example you give: upd_at. A metadata file for an object that already inherently has that metadata. It's metadata on top of metadata, which makes it all the more confusing what the actual truth for the object is.

[–] [email protected] 1 points 3 months ago (1 children)

I know! right?

Some say thay since you can use 'tree' and things like ranger to navigate the files, it should work alright. But I guess if you have one giant metadatafile for all the posts on your blog, it should be much easier to see the whole picture.

As for upd_at, it does not contain information about when the files have been edited, but when the content of the post was meaningfully edited.

So if for example I change the formatting of my times form ISO3339 to another standard, it changes the file metadata, but it does not update the post content, as far as the readers of the blog are concerned with. But I get why you chuckled.

[–] [email protected] 1 points 3 months ago

Tip: find -type f | xargs head (but no it's not comfy)

but I don't think going to "one giant metadatafile" argument helps; personally my attention starts splintering far sooner than that. Most of the time, if I'm looking at meta-data of an object, I'm not just looking at that single object, I'm reasoning about it in relation to other data points (maybe other objects in the same collection, maybe not). If at some point I want to shift my focus from created_at to updated_at or back, I need that transition to be as cheap as eye saccade. So by splitting the data to multiple files you are sort of setting "minimal tax" already pretty high.

That said, for simple projects where you want to have as few dependencies as possible, I think it's fine; it might or might not be better than raw-dogging your own format. I've actually implemented pretty much this format multiple times when I was coding predominantly in Bash. (Heck, eg. my JATS framework is pretty much using FAMF for test run state 😄 .) Just be careful: creating / removing files and directories can be a pretty risky operation -- make a typo in (or fail refactoring) a shell variable and you might be just rm -rf'ing your own "$HOME". It might be one of things you want to do less of, not more.

BTW, I chuckled because you turn from created_at to cre_at for no apparent reason. (I mean, if you like obscure variable names, fine by me, but then why would you call it created_at in the first file?)

BTWBTW, I love your site, I wish most of the web looked like that; the grey gives me sort of nostalgy :D Also you reminded me that I should give Kagi a try...

[–] [email protected] 2 points 3 months ago

Fully committed to directory file structure. Except for value lists. Those are text files you have to parse anyway.

[–] [email protected] 1 points 3 months ago (1 children)

My biggest issue is with how spread out the information will be. You need something other than your standard file and directory explorers. Because you want to see and work with a view across multiple levels of directories and files and their content.

[–] [email protected] 2 points 3 months ago

Definitely. But you would need need something other than those for the working with 100 json files as well. The question is, which kinds of things you would like to have as extra. You can go with jq and prettier syntax highlighting or you can go with tree and cat (and dog). It is the matter of taste. But also, I am always right, because my mom told me I am special .

[–] [email protected] 2 points 3 months ago (1 children)

You can easily parse this using awk, sed, fzf,

Well… I would know how to do it easily in C# or Nushell. But those tools? Maybe it's easy when you're already intuitively familiar with them. But line/string splitting seems anything but with complex utils like that with many params and a custom syntax.

[–] [email protected] 2 points 3 months ago* (last edited 3 months ago) (1 children)

That quote was in the context of simply separating values with newlines (and the list also included "your language’s split or lines function").

Technically you don't even need awk/sed/fzf, just a loop in bash doing read would allow you to parse the input one line at a time.

while read line; do 
   echo $line # or whatever other operation
done < whateverfile

Also, those manpages are a lot less complex than the documentation for C# or Nushell (or bash itself), although maybe working with C#/nushell/bash is "easy when you’re already intuitively familiar with them". I think the point was precisely the fact that doing that is easy in many different contexts because it's a relatively simple way to separate values.

[–] [email protected] 1 points 3 months ago (1 children)

Yeah, I see they did mention "your languages functions". It's just, subjectively, reading awk and sed next to "easily" irritates me. Because I've never found it easy to get into those.

[–] [email protected] 1 points 3 months ago

Sure. You should use whatever you are comfortable with. That's the point. When you don't need special parsers or tools, you can more easily adopt your tooling for the job, because almost every language has tools to deal with files. ( I assume there is some language that doesn't, who knows?)

[–] [email protected] 1 points 3 months ago

I still can't help but long for alternate file streams like NTFS has. It would be so nice to be able to store metadata about a file that the program that reads that file doesn't know how to parse without having to worry about the file and its metadata getting separated when one of the two is moved.

[–] [email protected] 2 points 3 months ago (1 children)

I like this ... a lot.

Is it new?

If there isn't even a todo task manager that handles notes this way, it is. Because man are there myriad implementations of that stuff.

[–] [email protected] 2 points 3 months ago (1 children)

And like you said: all tooling for files works for this .... For example I use F2 (highly recommended btw) for bulk editing filenames based on regex patterns. This could easily used to edit metadata in bulk.

[–] [email protected] 2 points 3 months ago

Oh goody! F2 is great, but the developers are craaazy! They packages commandline Go application with npm!

I also like vimv and vidir for simpler stuff.

[–] [email protected] 2 points 3 months ago (2 children)

I think a more clear name for this would be "filesystem data structures" since the key idea is editing structured data through the filesystem. I can imagine a FUSE driver that can map many types of data to this structure.

[–] [email protected] 3 points 3 months ago* (last edited 3 months ago)

Yes... "metadata" is becoming an overused term. Not all data is metadata.

My first thought when I read the title was about those .nfo files used by Kodi/Jellyfin and other media centers to keep information relative to the media files.

An alternative would be something like FADS (files as data structures) or something like that.

[–] [email protected] 3 points 3 months ago (1 children)

Yes. That is indeed a more interesting name. But think of the acronym.

FDS is not as easy to say FAMF.
FAMF already has an Urban Dictionary entry.

[–] [email protected] 3 points 3 months ago

Lol your second point is irrefutable, I must concede to the choice of FAMF 😹

[–] [email protected] 3 points 3 months ago (1 children)

I actually quite like this idea.

You can take it a step further and use file extensions to determine the format. For example the parser would first search for title, and if it doesn't exist try title.md title.html etc and render the content appropriately.

[–] [email protected] 2 points 3 months ago

That's a pretty cool idea!

[–] [email protected] 8 points 3 months ago (1 children)

It's a very interesting idea. I don't think I'll use it and I think the downsides outweigh the benefits but it is still an interesting idea.

In all of these cases, the answer is not TOML, YAML or JSON — or FAMF for what it’s worth. It is goddamn database.

I was about to boo and hiss, but if you mean something like sqlite as an application file format I'm more tempted to agree.

[–] [email protected] 1 points 3 months ago

Well, I mostly target the places where you don't programmatically generate millions of values. Configurations, entry metadata, etc. Indeed SQLite is much better for when you have a massive amount of data, and you need a better base that a file system. But when that is not the case, a file system is more advanced than whatever tooling are behind toml and yaml.

[–] [email protected] 10 points 3 months ago (2 children)

Not worse than YAML. 'course nothing is...

[–] [email protected] 1 points 3 months ago

/key/=/value

[–] [email protected] 4 points 3 months ago

Famf is definitely is. Just put yaml there.

[–] [email protected] 23 points 3 months ago (1 children)

[–] [email protected] 4 points 3 months ago

Sure. Why not :))

[–] [email protected] 14 points 3 months ago (3 children)

Why waste the inodes?

[–] [email protected] 4 points 3 months ago (1 children)

That was my first reaction just by reading the title.

Mostly because I learned the hard way what inodes are.

[–] [email protected] 3 points 3 months ago (2 children)

Read the content. I address that issue.

[–] [email protected] 3 points 3 months ago* (last edited 3 months ago) (1 children)

For the record, you mention "the limitations of the number of inodes in Unix-like systems", but this is not a limit in Unix, but a limit in filesystem formats (which also extends to Windows and other systems).

So it depends more on what the filesystem is rather than the OS. A FAT32 partition can only hold 65,535 files (2^16), but both ext4 and NTFS can have up to 4,294,967,295 (2^32). If using Btrfs then it jumps to 18,446,744,073,709,551,615 (2^64).

[–] [email protected] 1 points 3 months ago

You are right. Fat32 is not recommended for implementing FAMF.

[–] [email protected] 1 points 3 months ago

I know, I read it because I wanted to know too know if it was addressed

[–] [email protected] 3 points 3 months ago (1 children)

What would you do with billions of inodes?

[–] [email protected] 2 points 3 months ago (1 children)

Run out, far more frequently than you would imagine.

[–] [email protected] 1 points 3 months ago

Well I'd you have so many data entry, yaml and toml are not that helpful either. They would present different sets of problems. You should use a database (perhaps sqlite) for that purpose.

[–] [email protected] 16 points 3 months ago (1 children)

This is also going to make some devs (me) convulse when a PR is like, "small config change. updated 29 files".

[–] [email protected] 1 points 3 months ago

I have one that has 69 (noice) files changed.