Skip to content
Snippets Groups Projects

Modify churn structure to account for storage affected

This MR modifies the previous churn algorithm and database to include the number of bytes affected by file deletion, creation, and modification to see exactly how storage is impacted by file volatility. This did involve changing the exact procedure of how churn is calculated. Instead of using the duplicate method described in !42 (merged), policy dataframes are merged and their sizes, access, and modification data compared. Additionally, two forms of storage change for modified files are calculated. One value is the sum of the total sizes of the new versions of the files, and the other is the net change in storage between the old and new versions.

Major Changes

  1. Churn algorithm is changed to use dataframe merges instead of concatentation and marking duplicates
  2. The following were added to the churn database table:
  3. Fields for storage affected by file changes (total modified, net modified, total deleted, and total created)
  4. Fields for files which were accessed but not churned and the corresponding sizes of those files

Minor Changes

  1. Added example plots using churned storage and files accessed to the churn-analysis.ipynb notebook

Merge request reports

Approval is optional
Ready to merge by members who can write to the target branch.

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
Please register or sign in to reply
Loading