A very large portion of the files stored on clients' disks are either duplicative or used only once. With examination perhaps 50% of what's stored on a clients' disk could be archived without loss of functionality. No modern operating system comes with a built in utility that performs this analysis, although third party products like storage resource managers do exactly that. Large amounts of unused data can impact network performance when it comes to technologies such as snapshot backups, mirroring, and other duplicative data technologies causing you to invest in more equipment and services than you might really need.
The fact is that a major portion of all data is unused or worse yet junk: support files, historical data files, un-installation files, temp files in caches, and other transitory data. A very large amount of data is stored in windows local profiles. The storing of profiles on local systems for some user who has logged on once and will never be seen or heard from again can download a considerable quantity of backup data that is simply not needed. The problem is pervasive and a general one affecting all modern operating systems.
A similar situation is encountered in enterprise messaging systems and database management systems where a large percentage of the data is static, by that I mean that the data is both unchanging and never accessed by users. At least when examining separate folders and files in a file system you can look at the filestamps and determine when the data was created, last modified, and most importantly last accessed (Windows does this in an object's property sheet), but for transactional records such as e-mail and databases a last accessed property often is not natively part of a database's internal engine but needs to be programmed in. If attachments are stored as separate files, you can use their properties to analyze their usage. Still you will find that most enterprise messaging systems keep a server based data store and duplicate that store on a user's local hard drive (depending on settings).
However you do it, it pays to analyze your client's data set so that you can archive data that isn't being used, what we might like to call "static" data. Consider this situation. In your e-mail client or system you may have one or more data stores that are growing and changing constantly. If you never segment your data store and separate out older data, then every time you backup your store in a snapshot you are backing up the entire store. However, if you segment out messages that are older than 30 days your backup job is much smaller. The goal here is to still have the older messages available for searching, but out of the backup stream. As the data gets older still and is infrequently if ever accessed, the goal is to further remove that data, perhaps to a near online storage medium (like tape), or to a permanent or semi-permanent medium like optical disk.
Analyzing your organization's data access needs and patterns isn't just an intellectual exercise, it is an effective program for managing data that can reduce your network system requirements, your disk storage needs, and your administrative overhead substantially. The ultimate solution to this problem is to think about your data in terms of layers. Having performed an analysis using something like a Storage Resource Manager (SRM) or similar tool, you want to create a system such as the ones Hierarchical Storage Management (HSM) create where you have active data storage, near online storage, and archival storage supported by different technologies and managed by the software. In times of tight budgets this exercise can save your organization a considerable amount of money and time.
Barrie Sosinsky is president of consulting company Sosinsky and Associates (Medfield MA). He has written extensively on a variety of computer topics. His company specializes in custom software (database and Web related), training and technical documentation.