Skip to content
Snippets Groups Projects

Draft: Partition parquet dataset for sync with s5cmd

Closed Matthew K Defenderfer requested to merge partition-parquet-dataset into main
1 file
+ 2
1
Compare changes
  • Side-by-side
  • Inline
@@ -34,7 +34,8 @@ def main():
ddf = dd.read_parquet(input_parquet)
ddf = ddf.loc[ddf['path'].str.startswith(filter)].sort_values('path')
ddf = ddf.loc[~ddf['mode'].str.startswith('d')]
if 'mode' in ddf.columns:
ddf = ddf.loc[~ddf['mode'].str.startswith('d')]
ddf['cmd'] = ddf['path'].map(lambda x: create_sync_cmd(x, filter=filter, dest=dest), meta=str)
Loading