I thought using Python’s gzip is quite straight forward, however the IO performance almost doubled with an extra BufferedWriter:
import io
import gzip
def export_to_json_gz(schema):
json_file = "test.json.gz"
with io.BufferedWriter( gzip.open(temp_dir + json_file, 'wb') ) as gzfile:
for row in stream_table_data(schema):
ujson.dump(row, gzfile, ensure_ascii=False)
gzfile.write('\n')
Now the bottle neck is ujson, how can I make that part faster?
🙂