This MR modifies the previous churn algorithm and database to include the number of bytes affected by file deletion, creation, and modification to see exactly how storage is impacted by file volatility. This did involve changing the exact procedure of how churn is calculated. Instead of using the duplicate method described in !42 (merged), policy dataframes are merged and their sizes, access, and modification data compared. Additionally, two forms of storage change for modified files are calculated. One value is the sum of the total sizes of the new versions of the files, and the other is the net change in storage between the old and new versions.
churn-analysis.ipynb
notebook