this post was submitted on 03 Sep 2024
26 points (100.0% liked)
Programming
17318 readers
33 users here now
Welcome to the main community in programming.dev! Feel free to post anything relating to programming here!
Cross posting is strongly encouraged in the instance. If you feel your post or another person's post makes sense in another community cross post into it.
Hope you enjoy the instance!
Rules
Rules
- Follow the programming.dev instance rules
- Keep content related to programming in some way
- If you're posting long videos try to add in some form of tldr for those who don't want to watch videos
Wormhole
Follow the wormhole through a path of communities [email protected]
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Parquet is really used for big data batch data processing. It's columnar-based file format and is optimized for large, aggregation queries. It's non-human readable so you need a library like apache arrow to read/write to it.
I would use parquet in the following circumstances (or combination of circumstances):
Since the data is columnar-based, doing queries like
select sum(sales) from revenue
is much cheaper and faster if the underlying data is in parquet than csv.The big advantage of csv is that it's more portable. csv as a data file format has been around forever, so it is used in a lot of places where parquet can't be used.