Have you planned for a backhoe at a construction site six blocks over cutting your (only) Internet connection? How about a car accident that knocks down the utility pole outside your main office, and severs the connection to your core provider? Have you looked to make sure there are no water pipes in the ceiling above your communications closet? If you aren’t fully prepared for communications outages, then you are not alone. While you cannot account for every contingency that might befall your bank or credit union, it certainly pays to prepare. During your next WAN infrastructure review, consider the following concepts to help build better resiliency for your WAN communications.
Have a Primary and Failover Site for Your WAN Connectivity
You should never put all your eggs in one basket, and having a single hub through which all devices must connect creates a single point of failure. In addition to your main office or operations center, consider upgrading a branch location to act as a backup communications hub for failover purposes. Both your primary and backup locations should be set up with connectivity for the Internet, WAN (MPLS/ T1/ Metro-E/Etc.) and the Core, at a minimum.
Be mindful of the following considerations regarding your secondary/failover site:
- You should have a fully functional firewall protecting any Internet connections at your communications failover site. Similarly, if you choose to leverage VPN technology and inexpensive Internet connectivity to provide a secondary connection for your WAN (branch) communications, then make sure that you have the appropriate firewalls or other devices in place at all locations to facilitate this plan.
- Don’t forget about specialty communications equipment. If you have a separate appliance for Fedline access or a router for VPN connectivity back to your ATM provider, then be sure to duplicate these devices at your secondary location.
- If you implement two different connections which use the same media or physical wire (e.g., phone and WAN data), then you have concentrated your risk. It only takes one line to be severed for both your connections to go down.
Automatic vs. Manually Assisted Failover
Now that we have discussed the kinds of solutions you want to have in place and where you want them, let’s discuss the technology behind maximizing these tools. It’s essential to understand that there are two types of failover: automatic failover and manually assisted failover. While the natural initial reaction is to opt for automatic failover, this may be cost-prohibitive, or may not be possible with your mix of technologies and vendors. Choosing the right option for your financial institution requires a full understanding of the differences between these two options. Let’s look at a few scenarios:
As the title implies, an automatic failover involves routing devices automatically adjusting routing and data flows based on conditions detected on the network. For example, picture a financial institution that has four branches with redundant connectivity at the main office and a designated Disaster Recovery (DR) site. If Internet connectivity were to go down at the main office, then the routing devices at the remaining branches would detect the outage and automatically start sending traffic destined for the Internet to the DR site. This allows the other branches to continue working, sometimes nearly seamlessly, and minimizes the outage to only the main office.
When the problems are resolved at the main office, then the branches will detect that their preferred path is once again available, and will reroute to send Internet traffic through the main office. This option is ideal, because no action is necessary by the networking team to change routes at all the branches. This minimizes the downtime during failover/ failback events.
While this option is usually the fastest way to adapt to network outages, it requires significant setup, testing and administration time. Additionally, all devices involved must be capable of using the same protocols to detect and adjust to changes in the environment.
Manually Assisted Failover
As mentioned above, automatic failover may not be feasible in all situations, and there are other scenarios where administrators may want to retain some manual control. One common reason to opt for manual failover is when an institution hosts its own DR equipment. If you have built a hot DR site with equipment and connectivity mirroring your production environment, the last thing you want to do is automatically fail all operations to DR equipment based on a temporary glitch in one of your telco circuits. While this may sound harmless enough, it creates a situation where you are working with live data on two different systems and likely ending up with a messy data merge, lost files and end-user frustration.
When adding data and server resources into the mix, administrators might prefer to tightly control when to “flip the switch” to cut over to DR resources and adjust communication routes. This option may be more desirable for savvy administrators overseeing complex networks, but the additional control often comes at the expense of failover/failback speed.
A Backup is Not a True Backup until it is Tested
Having a plan in place is a nice first step to build your redundancy and communications resilience, but the smallest of overlooked details can quickly derail your efforts. You wouldn’t trust your critical data backups without periodically testing restore capabilities, so why wouldn’t you test your communications backups?
Test your communications failover plans (at least) once a year to verify your WAN resiliency works as intended. Be sure to thoroughly document not only what went right with your test, but also what went wrong or what adjustments were necessary. This documentation allows you to learn from mistakes and address any gaps in your plans. Auditors and examiners will also want to review this testing documentation, so you should aim for incremental improvements from year to year and test to test.
Financial institutions may overlook another important backup need by neglecting to back up the configurations for routers and smart switches. Routing configurations can balloon in complexity over time as automatic failover is added and routing is optimized, and you do not want to lose all of the hard work that went into building those configurations due to failed hardware. Be sure to back up the router or switch configurations after configuration changes to ensure the fastest recovery from failed equipment. If you are uncomfortable managing these backups on your own, there are services available to monitor networking equipment that also automatically copy down device configurations on a regular schedule.
Finding and configuring the right mix of technologies to keep your financial institution running can be a daunting task. If you would like some help figuring out how to navigate the different circuit and failover options available, then consider enlisting the help of technology experts. The right technology partner should be familiar with the unique needs of financial institutions to help you stay technically afloat without running afoul of regulatory requirements.