Abstract
Backup Storage Challenges
Beyond capacity constraints, many organizations cannot centralize their entire IT infrastructure, leaving remote sites either with IT staffing overhead or without sufficient, or any, protection. Tape and other external media do not scale effectively across sites and require additional intervention from local/remote IT staff to ensure consistent backups and to allow for growth. Organizations deploying site-to-site connections tend to back up the remote sites to a main data center, saturating bandwidth and causing long backup and restore delays. Technology like deduplication that can significantly reduce the amount of data needed to transmit can both avoid overloading corporate networks and reduce backup/restore times.
What is Deduplication?
Barracuda Deduplication
Instead of post-process, Barracuda Backup created its inline deduplication (Figure 1) technology. With inline deduplication, the appliance performs deduplication in one step as the data is ingested, eliminating the need for the superfluous landing-space capacity required by slower two-step postprocess deduplication. Barracuda inline deduplication helps organizations save money by eliminating the need for a larger disk array dedicated to holding ingested data before deduplication can begin. This deduplication method also can help reduce risk of lost data by accelerating time-to-backup processing and full replication since data is queued for replication as the backup job is being processed.
Deploying a Barracuda Backup appliance significantly improves DR readiness by reducing time to get data offsite through inline deduplication and instant replication. Because data is deduplicated inline, it can be ready for replication more quickly than if it had to be processed after the backup process fully completed. And because there is no need to ingest the entire data set prior to replication commencing, data can be moved offsite as it is backed up and deduplicated, providing faster offsite protection.
Comparing post-process and Barracuda inline deduplication:
- Cost: Because post-process deduplication requires a landing space before data can be turned into a deduplicated state, dependent on the size of your data set, a larger and more expensive device is often required. With the Barracuda inline solution, a larger device is not necessary.
- Time: Post-process deduplication must wait until the backup job is finished before it can start the deduplication process and then replicate the data. Although this may appear to accelerate the backup process, in fact, data is not yet fully protected because the deduplication and replication process have yet to be completed. This serial processing and multi-stage activity can lead to significant delays to full data protection compared to realtime inline deduplication with simultaneous replication.
- Risk: With post-process deduplication, time to DR readiness is extended and any failures in three-step post-process activity can lead to a corrupt data set. Should a network problem occur or a site go down, despite the backup job reported as complete, data can be lost.
Barracuda’s Deduplication Methodology
- Source Deduplication, where local data is deduplicated at the source and sent to the Barracuda Backup appliance in deduplicated form, minimizing LAN bandwidth and data sent to the local server. Also called client-side deduplication.
- Target Deduplication, where data is deduplicated directly on the backup appliance across sources, minimizing the amount of data that needs to be cached and replicated.
- Global Deduplication, where data is deduplicated across all local servers that have been replicated to a central appliance or cloud.
Source:
Source deduplication is implemented through the Barracuda Backup Agent. During its installation, a small database is created on the server to keep track of data chunks so only unique data seen by the agent is compressed and sent to the appliance for processing, reducing network traffic and the backup window. (Figure 2).For VMware backups, Barracuda leverages VMware’s vStorage APIs for Data Protection (VADP) to back up virtual disks. With VADP, Barracuda can use Changed Block Tracking (CBT) to send only unique chunks to the Barracuda Backup appliance (Figure 3).
For Microsoft Hyper-V backups, the Barracuda Backup Agent reduces the backup window by deduplicating the VHD files on the host server to minimize the amount of data sent to the backup appliance.
Target deduplication occurs on the Barracuda Backup appliance to eliminate redundancy across all local agents and minimize the amount of local cache and cloud capacity needed to store backups. Organizations backing up SAN or NAS filers (Figure 4), VMware, and file share often cannot use sourcebased deduplication, leaving target deduplication as the primary methodology.
Global:
Global deduplication is implemented on either an appliance used as a central replication target or in the cloud, eliminating redundancy across backup appliances throughout worldwide infrastructure and allowing organizations to reduce capacity needed to store backup data in a compressed and encrypted state (Figure 5).Barracuda Backup leverages three-stage deduplication for multiple data sources as described in the following tables:
Microsoft Exchange Server
- Barracuda Windows Backup Agent Backup Method
- Source, Target, and GlobalDeduplication Method
Microsoft SQL Server
- Barracuda Windows Backup Agent Backup Method
- Source, Target, and GlobalDeduplication Method
Microsoft Hyper-V
- Barracuda Windows Backup Agent Backup Method
- Source, Target, and GlobalDeduplication Method
Microsoft Windows
- Barracuda Windows Backup Agent Backup Method
- Source, Target, and GlobalDeduplication Method
Microsoft Active Directory
- Barracuda Windows Backup Agent Backup Method
- Source, Target, and GlobalDeduplication Method
Microsoft SharePoint
- Barracuda Windows Backup Agent Backup Method
- Source, Target, and GlobalDeduplication Method
Lotus Domino Server
- Windows Volume Shadow Copy Service (VSS) Backup Method
- Source, Target, and GlobalDeduplication Method
Linux Systems
- Barracuda Linux Backup Agent Backup Method
- Source, Target, and GlobalDeduplication Method
Novell Open Enterprise Server 2 SP2.0+
- Barracuda Linux Backup Agent Backup Method
- Source, Target, and GlobalDeduplication Method
Mac OS X
- Barracuda Macintosh Backup Agent Backup Method
- Source, Target, and GlobalDeduplication Method
Unix
- CIFS/SSHFS Backup Method
- Target, and GlobalDeduplication Method
Network Addressable Storage
- CIFS/Access Control Lists (ACL) Backup Method
- Target, and GlobalDeduplication Method
VMware Server and Guests
- VDAP with Changed Block Tracking (CBT) Backup Method
- Source, Target, and GlobalDeduplication Method
Barracuda Deduplication Implementation
The length of backup data chunks used in deduplication is based on the type and size of the file. Each chunk is then given three unique hashes (digital fingerprints): MD5sum, SHA1, and size of the file. Each hash is unique for each chuck and is stored in a database by the Barracuda Backup Agent running on the local server along with a database on the local appliance. As the backup runs, each calculated hash value is compared to the values of those of chunks already processed, and if the value is unique, the chunk is transmitted to the appliance. For hash values already seen, only a small pointer is sent to the appliance. Once the data is added to the local Barracuda Backup appliance, the hashes are compared again across all agents. If duplicate entries are found, the appliance stores a single copy of the data on local appliance and makes note that it has been backed up, and can be restored to any server requesting the hash.
The following information should be observed for optimal performance of the Barracuda Backup Agent:
Deduplication Database Sizing:
The local deduplication database is a small portion of the total file system size. It increases linearly with total stored deduplicated data. When sizing, plan for roughly 1-3 GB of database size per TB of stored deduplicated data; e.g., 2 TB of deduplicated data equates to a 4 GB deduplication database.Processor Utilization
The Barracuda Backup Agent can increase processor usage during a backup, because it uses the client machine’s CPU in the source deduplication process along with compression. There is no limit set by the local Barracuda Backup Agent to limit the client machine’s processor resources during backup and restore.Memory Utilization
The Barracuda Backup Agent will increase system memory usage during a backup. The agent uses the client machine’s memory to store backup data before sending the compressing and sending the hashes to the Barracuda Backup appliance. The agent uses up to 512 MB of memory during the backup process to store data chunks, allowing it to quickly walk the file system.The Barracuda Solution vs. Other Deduplication Methodologies
Fixed Block vs. Variable Block
Fixed block deduplication is the simplest method of deduplication. Fixed block examines specific chunks of a given size of the dataset being backed up. Since the chunk size never changes, fixed block deduplication uses a limited amount of CPU and disk processing. However, reduction is limited since a predefined block misses duplicate data on certain data sets compared to more advanced types of deduplication.Variable block, application-aware deduplication is an advanced method that looks at the data set/ application being backed up, and increases or decreases the block size for optimal results. Since the chunk size changes based on data being backed up, additional CPU and disk resources are needed to accomplish deduplication but data reduction is maximized. Barracuda Backup uses variable block deduplication for all three stages. Barracuda’s advanced variable block deduplication analyzes the data type and chunk size, setting a block size to obtain the greatest level of deduplication without taxing CPU and disk processing in the process.
Because Barracuda Backup is a hardware appliance, it can provide variable block deduplication without loading the CPU and disk resources, optimizing the underlying hardware and software in the backup appliance for this chunk method for maximum data ingest rates.