• Recent Posts

  • Browse Categories

  • Archives

New Blog!

Hello everyone, as you know Double-Take is now a part of Vision Solutions Inc.

Please visit http://blog.visionsolutions.com/ to keep up-to-date on the same great information you are used to, like the new features in the Double-Take 5.3 product release!

Follow us on Twitter @VSIDTInfo

Facebook: www.facebook.com/VisionDoubleTake

Vision: 101 – Exchange 2010 Support in DTAW 5.3

First Customer Ship (FCS) for Double-Take Availability for Windows (DTAW) 5.3 is out, and with it there are many great feature sets that users can take advantage of. Over the next few weeks, I’ll be detailing some of these.

First up – one near and dear to my own heart as an Exchange Server subject matter expert – expanded support for Exchange Server 2010. 5.2 allowed for Full Server Fail Over support for single-server configurations of Exchange 2010, but didn’t permit the protection of DAG-enabled solutions. FCS 5.3 brings not only multi-server configuration support, but also DAG support for more flexibility in your organization.

Database Availability Groups (DAG) is a new technology in Exchange 2010 that replaces Windows Failover Clustering for local high-availability of Exchange mailbox services. Multiple copies of the mail databases are housed on multiple servers, allowing for server and storage failures to be overcome. We’ve been doing this for years with the GeoCluster tool-set, so we know it works well. It doesn’t, however, work great over a WAN link. To be clear here, DAG will function over a WAN, but without robust bandwidth, it will have issues.

DTAW is designed to run over a WAN without trouble, and to handle things like link hiccups and other normal interruptions between sites. It isn’t uncommon to see clients use Windows Clustering at their production sites, and DTAW to replicate and failover to another physical location when necessary. In much the same way, clients can now create DAG protection groups at their primary locations, and use DTAW to replicate the data and provide a failover pathway outside of DAG to a secondary physical location. This is done without interfering with how DAG operates on the production servers, allowing both technologies to operate side-by-side.

Of course, we also support protection of a single server to another single server, and multiple combinations of physical and virtual devices on either side. But the addition of DAG support allows us to resume our history of protection of production clustered Exchange Servers to a secondary site.

Vision 101: Backup hasn’t gone away

There’s been quite a lot of news these days about great ways to ramp up High Availability systems while ignoring and/or removing the Backup solution sets that are in place already. While we here at Vision Solutions, Inc. fully condone the use of HA solution sets (and make the market leaders for Windows, Linux, AIX and AS/400), we believe that Backup is not a solution set to be cast out just yet. Today, we’ll look at the two top reasons why this is the case.

1 – Compliance issues. Legal departments and government regulations continue to require some method by which data is held outside of systems that end-users have direct access to. Traditionally this has meant tape backups that are held off-site, but modern backup solutions (such as Double-Take RecoverNow) can provide this user-shifted data storage on spinning disk instead. The goal of this type of backup is to make sure that if a user deletes data from a production machine (which would also delete it from the HA devices), there is still a copy of that data somewhere that can be used for discovery and compliance, held at various points-in-time.

2 – The “Whoops Factor.” Into every life, a little rain must fall. So too, into every data system, some bad data will fall. In order to protect against that bad data becoming the only data, you need a time-shifted version of the data held someplace safely. Backup tools allow you to keep this kind of time-shifted data, and if the backups are created and/or stored at another location, then even physical corruption can’t impact them. This gives you the ability to restore some or all information on-demand, in case your branch manager accidentally deletes the Users’ Home Directories store on your file server.

User- and time-shifting data aren’t the only reasons to keep up with Backup. There are many other reasons that your Backup tools should not be ignored or phased out. Quite the contrary, actually; Backup should be modernized right along with your HA solution sets, and used to meet the changing needs of your systems and users now and moving forward.

Are you ready for Cloud recovery?

There is a lot of hype around cloud at the moment, and IT departments within most financial organizations are justifiably skeptical about what they see as more marketing spin than substance.

With this in mind, it helps to ask a few key questions:

1. Can the cloud offering protect all of my servers and applications?

2. Can it protect the operating system and applications as well as the data?

3. Does it provide a mechanism to recover the data/servers without significant downtime?

4. Can I actually failover to the cloud and stay up and running?

5. Can I test the failover process to ensure the servers are recoverable?

6. Can I just pay for what I use or do I need dedicated servers in the cloud?

Looking at this list of questions, there is still a great deal of education required around the potential that cloud computing can offer around improving disaster recovery. Using best practice techniques around recovery of data is one route to ensuring that any implementation is a successful one.

When Cloud recovery makes sense

For smaller companies, where the previous cost of DR means that their RTO and RPO targets are large, then a cloud-based approach can bring down this gap in an affordable way.

For larger organisations, the cloud can complement their existing continuity strategy: the number of machines that are protected can be increased, while it can also offer another location for data to be stored in the unlikely event of multiple sites being affected.

A cloud-based DR strategy can also overcome the issue with downtime experienced during the recovery phase. If you are putting full server instances into the cloud, then you can potentially run those workloads in the event of a disaster affecting the main site. This is the main difference between online storage being used for DR, and a full use of the cloud for recovery.

This approach does require that technical resources are available from the cloud provider, such as the ability to handle larger boot volumes, as well as the scale to cope with multiple workloads being booted and run at the same time.

However, it gets around the problem of lost time while any data is recovered. This option to work with an up-to-date copy of their data and applications reduces the potential window of downtime, by shrinking the RTO for users to get back online and productive.

Vision:101 – Throttling

Many clients are interested in replication across WAN links, but many also don’t want the Double-Take replication connection to use up all their bandwidth during the business day. In these instances, the throttling engine can help limit the amount of throughput that Double-Take products use to move data.

Throttling is available in many of the setup workflows for Double-Take Availability for Windows and RecoverNow. In addition, it can be accessed via the Replication Console, in the Connection Manager window, once a connection is established.

Throttling sets the maximum amount of bandwidth that can be used for data transmission, but not for command traffic. So you should set the throttle about 3% less than the actual bandwidth limit to allow for command traffic to occur without exceeding your expected limits. You can also schedule throttles to turn on and off by day of week and time of day. Very helpful if you want different throttles for the work day vs. the weekend.

Here’s some things you need to know about throttling:

- Schedules are always applied at source-server local time. So if you turn a throttle on at 3am, no matter where the target server is, the throttle will go into effect at 3am local time at the location of the source server.

- Schedules need to be turned both on and off. Otherwise, you will turn a throttle on and it will stay in effect forever.

- Throttling changes that are not applied using a schedule (i.e. fixed-bandwidth limits) take effect immediately. You do not need to disconnect/reconnect. Likewise, a scheduled throttle will turn on or off without re-mirroring.

- Schedules can be exported and imported between servers. This requires that you use the throttling settings in the Replication Console and the Connection Manager. Above the section where you can set the throttle times/days, you will see the export and import commands. Make sure you keep the first bullet point in mind (local time use) when importing.

- The more you throttle, the less data you can send over the life of the connection. Be careful not to throttle so much that you cannot send data before the queues fill up. This will help to avoid unnecessary re-mirror operations.

Throttling can make life easier for you by limiting the amount of bandwidth Double-Take will use. As long as you have enough to move the data you generate each day, you can throttle as necessary to keep your other vital data flowing at peak efficiency.

Cost Savings of Cloud Computing

Looking at the complexity and management resources involved in business continuity projects, the cloud offers a way to remove or reduce much of the potential costs associated with DR. Much of the spending on business continuity is not from the specialized tools for replication and recovery but the extra facilities and equipment required to make traditional continuity solutions work. This extra investment will also sit relatively idle most of the time.

Cloud computing in general offers a route around this problem: instead of having to select, implement and manage back-end IT infrastructure directly, the infrastructure or service is delivered through a Web portal. Data can be copied over to a cloud storage provider rather than sitting at a second site, and all that is paid for is the amount of resources consumed. Cloud services providers can also provide more than just online storage. The cloud can provide computing capacity as well as storage, so the data can be manipulated and processed remotely as well.

The other main benefit from the cloud is that the organization does not have to sign up to a long-term commitment. If a better deal is available from a different cloud provider then the organization should be able to migrate its data and resources over.

Costs of Cloud Computing

To ensure that the organization meets its RTO and RPO targets, the traditional approach to DR would require replacement equipment standing by at an off-site location with the necessary software and configuration to quickly transfer users and data. Disk-to-disk keep the backup systems constantly updated, so why isn’t every server in the world protected? Usually the answer is cost, which can build up when you have a two-hour recovery target to meet and you have to take into account:

• The upfront investment

• Technical complexity

• Operational complexity and project management of a new data center

• Complicated projects sometimes fail

Vision:101 – Does Latency Matter?

Latency is a fact of networking life. Even signals traveling at the speed of electronic impulses will take some amount of time to get from one end of a link to another. Granted, the times are measured in milliseconds (ms), but it is still not instantaneous. When speaking in terms of replication between sites, latency can have an impact on your RPO, and on your ability to serve clients after a failover. As such, it’s something to be concerned about; even when you are well within your own metrics.

Link latency during replication impacts how fast data packages can be moved from a sending server to a receiving server. While the Vision Solutions products are designed to allow for high latency without causing errors, the longer it takes for each packet to reach the destination, the fewer packets per second we can transmit. This translates to less data moving over a given amount of time, and therefore increases Recovery Point Objective (RPO) metrics. While there is no way to tell how much latency will create what levels of RPO, you generally want to aim for somewhere under 130ms round-trip if you are concerned with RPO numbers. Once again, VSI products can handle much larger latency metrics with no issues for data integrity, but higher numbers will create larger RPO metrics.

Once you fail over, latency can impact your end-users’ ability to access their systems. Higher latency means slower connectivity for end-users trying to use applications across the WAN link. If all users are distributed anyway, the impact will be less dramatic than if folks who are usually on a LAN suddenly have to go over a long-haul network. This might be acceptable if the target devices are only in use for a limited amount of time, but it is a good thing to explore and plan for. Some applications may not tolerate higher latency, so speak with your vendors to make sure the link between your users and their systems doesn’t exceed safety limits.

Latency is nothing to be afraid of. It happens to all networks, and only becomes a problem when it remains high for extended periods of time. As long as you can maintain a TCP/IP connection to a target site, we can replicate the data. How much we can move over a given period of time, and if your users will be able to effectively use the systems on failover, can be impacted by higher latency. So be aware of what your RPO metrics need to be, and ensure that you can cross whatever links are required to re-establish operations after failover.

Reducing DR Costs with Cloud Computing

When your applications, data or servers aren’t available due to a disaster or outage, business is slowed or stopped altogether. The discipline of business continuity helps organizations protect themselves against threats to their ability to work, from large disaster events such as a fire or flood, through to smaller IT problems such as lost data or a broken server.

The cloud is now being used to solve these problems around backup and recovery, while still keeping costs low enough for all companies to benefit. The main question that organizations are asking about using the cloud is whether this is really a viable option for them and can it fulfill its promises?

To answer this question, organizations have to look at the basics of disaster recovery (DR) and continuity planning. This involves evaluating three things:

• How current the organization’s data has to be during normal business processes, and how much data its can stand to lose. This is the recovery point objective (RPO).

• How quickly the organization wants to be back up and running after a disaster. This is the recovery time objective (RTO).

• How much control the organization has to retain over its data. Can company information leave the company’s direct control, or should it keep control internally? Can data leave the country or region boundaries?

Follow

Get every new post delivered to your Inbox.