Add --no-clobber to hive conversion tools
Added --no-clobber
option to convert_flat_to_hive
and hivize
. It operates per tld + acq combination. It searches the hive directory for the specific directory cell containing the given tld and acq data. If there are any parquet
files in that cell directory, it removes all rows with that tld+acq combination from the dataframe. If the dataframe is empty after that, it exits without writing any data. Otherwise, it writes the remaining dataframe entries to a hive dataset like normal.