Storage on clusters is generally designed for speed, not long-term capacity or data safety. Data storage on clusters should be considered temporary space. Space is limited, and users are constantly and rapidly generating new data. Hence, it is important for all users to move data off the cluster as soon as possible in order to keep cluster storage available for other jobs.
This does not mean that you cannot leave data on the cluster for further analysis. However, all data generated on the cluster should be immediately copied to another location after being generated, so that it will be safe from disasters such as hardware failures or fires. It should be removed from the cluster as soon as it is safely stored in two other locations where it is accessible for further analysis.