this post was submitted on 02 Aug 2024
348 points (97.5% liked)

Science Memes

10923 readers
2250 users here now

Welcome to c/science_memes @ Mander.xyz!

A place for majestic STEMLORD peacocking, as well as memes about the realities of working in a lab.



Rules

  1. Don't throw mud. Behave like an intellectual and remember the human.
  2. Keep it rooted (on topic).
  3. No spam.
  4. Infographics welcome, get schooled.

This is a science community. We use the Dawkins definition of meme.



Research Committee

Other Mander Communities

Science and Research

Biology and Life Sciences

Physical Sciences

Humanities and Social Sciences

Practical and Applied Sciences

Memes

Miscellaneous

founded 2 years ago
MODERATORS
 
you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 43 points 3 months ago (21 children)

Is 600 MB a lot for pandas? Of course, CSV isn't really optimal but I would've sworn pandas happily works with gigabytes of data.

[–] [email protected] 26 points 3 months ago* (last edited 3 months ago) (6 children)

What do you mean not optimal? This is quite literally the most popular format for any serious data handling and exchange. One byte per separator and newline is all you need. It is not compressed so allows you to stream as well. If you don't need tree structure it is massively better than others

[–] [email protected] 14 points 3 months ago

I think portability and easy parsing is the only advantage od CSV. It's definitely good enough (maybe even the best) for small datasets but if you have a lot of data you need a compressed binary format, something like parquet.

load more comments (5 replies)
load more comments (19 replies)