Scott Galvin, Executive Vice President, Technical Solutions
Are you a saver? I don’t mean financially, but more in terms of do you save stuff? Is your philosophy along the lines of “I may find a need for that later so I’d better hold on to it.”? Or, are you a “cleaner” in that you subscribe more to a “Don’t need it, get it out of here.” mind-set? Does your philosophy carry over to the way you manage your data? While people are roughly split 50-50 between “savers” and “cleaners,” studies show unprecedented data growth across the board. Clearly, this is not entirely due to a philosophy of what to keep and what to throw away. It’s the natural order of technology to get bigger, faster, and to require more resources. So, if you are just following the natural flow of technology, why should you be concerned? Simply put, data sprawl is going to cost you. Even more, it’s going to cost you money to back it up. It’s going to cost you or your team production time locating the needed information. It’s going to cost you flexibility to prepare for disaster recovery and business continuity, severely limiting your options and forcing you to contend with inflated fees and recovery times. Listed below are some of the usual suspects causing your data to unnecessarily inflate and some ways to consider to control the sprawl.
Home directories, old installation files, outdated file backups– Home directories often serve as the collection place of jokes, personal pictures and movies, old email, and various other data that is either non-business related or just out of date. Old installation files and outdated backup files are always placed on the network in an effort to improve efficiency and safeguard changes being made, always with the best of intentions. However, when the new software version comes out or the new software upgrade is successfully working, rarely does the admin remember to remove those previous files, leaving them out there indefinitely to unnecessarily eat up space. Suggestion: Place size quotas to limit the growth of home directories. Or, if that’s too extreme, institute a recurring review process where you can identify the biggest offenders and remind those users to remove the data that they don’t have to have. A similar review process should be instituted for your installation file directories to remove those outdated versions of software never to be installed again. Finally, a good method of dealing with those outdated file backups is to create the copy in a centralized “Marked for Deletion” folder instead of creating the copy in the same directory as the application itself. This “Marked for Deletion” folder should serve as a repository for any file that you suspect can be deleted but would like to give it some extra time to make sure. Review it every 6 months and if the files contained within haven’t been requested, administrate as you see fit.
Place space quotas on home directories
Review home directory and install directory contents quarterly for content that can be deleted
Maintain and review a “Marked for Deletion” folder to test items before permanently deleting
Mailboxes– Without a concerted effort to remind people, many institutions would have 100% of their users utilizing their Outlook as a file storage mechanism rather than the mail client it was designed to be. Even people who know better, who have heard this message before will from time to time fall into this same habit of just keeping items here out of convenience. Indefinite retention of all items within the inbox, sent items and even deleted items can cause the mailbox to bloat to a size of 10GB or more. Suggestions: While a size quota on each mailbox is the optimal solution, email always tends to be a sensitive area and implementing such a measure can create an undue amount of turmoil. Educating the users to make a habit out of deleting all unneeded email is a must. Additionally, a review process to determine who is keeping larger items such as emailed pictures and graphics is, much like home directories, going to have to be performed on a regular basis. One additional note I want to add here is for everyone to think through the effects of the AutoArchive function. Archiving your users’ email instead of actually having them evaluate what they need can, depending on where you store the archive file, be putting the archived data at risk of not being backed up. This would be the case if you were to place the archive on your local hard drive; if you copy the archive to a home directory, you are merely transferring the problem to another location. A much more effective solution is to implement a true mail archival solution that can retain copies of your email offsite, eliminating the need to keep all of this data in locations where it will be backed up.
Implement size quotas on mailboxes
Review mailboxes quarterly for content that can be deleted
Imaging-Not surprisingly, imaging is where I often see the data size for institutions spiral out of control. Not only do these imaging solutions collect large amounts of data measured by the gigabyte and often terabyte, but they also collect huge numbers of individual files. It’s not uncommon to see an imaging solution have file counts that number in the millions. File count is important because many of the data recovery solutions on the market can be slowed just as badly by excessive numbers of files (they each have to be inspected before they can be restored) as it can by large data sizes. Many institutions have done away with their image archival processes. The mind-set that storage is cheap and so I will just keep it out there has taken hold in the industry. Due to this mind-set, we have seen data sizes bloat. The hidden effect that is just coming to light now is that while storage is cheap, the services required to produce a realistic chance at disaster recovery, or even offsite storage, cannot be priced so inexpensively, severely limiting the financial institution’s options for a cost effective solution. Suggestions: I recommend that the archival process be put back into action. Burn the image files that are rarely accessed to multiple CDs stored in multiple locations. If you are not comfortable with that, I’ve seen institutions copy images to a USB hard drive and place it in the vault in addition to the CDs. Bottom line is to remove those images from your system once they are archived.
Start an archiving program where image files over a certain age are archived off to CD or Hard drive
Perform archive quarterly
Virtualized infrastructure-This is by far the newest category that we added to our list of data sprawl areas. There are many advantages to incorporating virtualization into your environment. However like most powerful technologies, a requisite amount of planning and discipline within the execution are required. The most common data size issues arise when network administrators begin virtualizing their network without a plan that has been thoroughly vetted. Next thing you know, you have a dedicated virtual server for every application within the network, regardless of how small or insignificant. Beyond the obvious licensing issues, the unnecessary repeated duplication of the system files makes for an inflated footprint that makes both storage and management difficult and inefficient.
Take the time to properly lay out a strategic plan before beginning any virtualization project
Stick to the plan during implementation
Periodically review your servers and their roles to make sure that consolidation is not a possibility
Evaluate all new additions on the network to ensure that a role you are creating can’t be combined with an existing server
None of the concepts presented here are revolutionary, some, such as archival, are downright old school. They are all, however, solid practices that most of us for one reason or another have never implemented or gotten away from. The control of your data size is a problem that will continue to manifest itself in hidden costs and limited options. I contend that with a little evaluation and planning, it is an aspect of your network that can be improved significantly and will pay off for you in the long run.