Which is the effective file comparisons CSV to CSV or JSON to JSON in python?

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP



Which is the effective file comparisons CSV to CSV or JSON to JSON in python?



I working on a python project where the script will request some URLs to get JSON response and out of that response, I have to fetch some data and write it to a file. This script will run on a daily schedule to create day specific file and will then do a comparison of files generated on two consecutive days to publish the difference of data observed.



So I want to know which file format would be best in low memory usage and time effective when a comparison happens? A .csv to .csv comparison or a .json to .json comparison?





It depends on what will be compared. In terms of low memory usage, you could use jsonl format, if you're going to compare line by line (wouldn't be necessary to load entire file for valid json, and it is easier to put json into an object then a csv line).
– Lucas Wieloch
24 mins ago




1 Answer
1



If you can easily put your data into a CSV file, go with CSV. You can compare them row by row, so your memory can be limited to two rows, and the semantics of CSV if much simpler than JSON.



If you care about extremely efficient comparisons, or complex comparisons, put your data into an SQLite database (comes bundled with Python) and build proper indexes. This requires understanding of RDBMS basics, though.



If your data cannot be laid into a CSV, e.g. because it's an arbitrary tree, then go with JSON. You will have to load entire JSON files to do the comparison, unless you write very sophisticated code.



All the memory considerations can be moot, though, if you're using a normal desktop / laptop, or a non-micro AWS instance, or even something like an RPi 2/3, and your files are not many gigabytes, that is, fit well into the available RAM. Loading a few (hundred) megabytes directly into memory and operating on them may be the most efficient solution then.



If you only have e.g. 1000 entries in your files, don't bother with efficiency at all, and write a solution you understand best. You can optimize it later if need be.






By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

Firebase Auth - with Email and Password - Check user already registered

Dynamically update html content plain JS

How to determine optimal route across keyboard