|
Application consistency
Appliance based
replication
Array based replication
Consistency groups
Continuous data protection
(CDP)
Continuous remote replication
(CRR)
Crash consistency
Deduplication
Five 9s availability
Input/Outputs per second
(IOPS)
Point-in-time data recovery
Redundant Array of
Independent Nodes (RAIN)
Recovery Point Objective
(RPO)
Recovery Time Objective
(RTO)
Replication
Snapshot
Write Splitter
Application
consistency
Guarantees that the application's data is in a consistent
state at the point-in-time when it is replicated,
backed-up or snapped.
The application will be placed into a quiescent state
which will commit in-memory transactions and then halt
writes to the database and log files.
For example Microsoft Volume Shadow Copy Service (VSS)
provides a framework for application consistent backup,
snapshot and replication of Microsoft Exchange and SQL
Server. The alternative is crash
consistency.
Appliance
based replication
Replication
is performed by an appliance connected either to the
hosts, SAN Fabric or storage array. Two options are
available:
Out-of-band
The appliance resides outside of the primary data
path therefore an applications I/O does not
flow through the appliance. Implementing an out-of-band
appliance delivers replication without impacting an
applications I/O operations (i.e. performance
and reliability).
In-band
The appliance resides in the primary data path therefore
an applications I/O flows through the appliance.
Implementing an in-band appliance delivers replication
which can have an impact on an applications
I/O operations (i.e. performance and reliability).
Array based
replication
Replication
is performed within the storage processors of the array,
no additional hardware or host resources are required.
Consistency
groups
A collection of volumes that when replicated remain
in a consistent state with respect to each other. All
data will be synchronised to the exact same point-in-time.
For example you can protect data from one or more applications
that use multiple volumes.
Continuous
data protection (CDP)
Automatically saves a copy of every change made to a
volume locally, essentially capturing every version
of the data that the user saves, allowing the administrator
to restore data to any point-in-time.
Application
consistency points can be periodically scheduled
to avoid having to recovery from a crash
consistent image. The RPO
is zero.
CDP is different from traditional backups in that there
are no backup schedules and you don't have to specify
the point-in-time
to which you would like to recover until you are ready
to perform a restore. Traditional backups can only restore
data to the point at which the backup was taken.
Continuous
remote replication (CRR)
Automatically saves a copy of every change made
to a volume remotely, essentially capturing every version
of the data that the user saves, allowing the administrator
to restore data to any point-in-time.
Application
consistency points can be periodically scheduled
to avoid having to recovery from a crash
consistent image. The RPO
is typically seconds or greater and it supports unlimited
distances between storage devices. Two options are available:
Continuous
asynchronous
Each write transaction is acknowledged locally at
the source side and then sent to the target side.
The primary advantage of continuous asynchronous replication
is its ability to provide synchronous-like replication
without degrading the performance of host applications.
Near
continuous snapshots
Transfers data that has changed between one consistent
image of the storage subsystem and the next. The use
of high-frequency snapshots largely overcomes the
shortcomings of the snapshot not being up-to-date.
Typically powerful bandwidth reduction compression
technologies can be applied resulting in a significant
savings in bandwidth.
Crash
consistency
The application's data is not put into a consistent
state when it is replicated,
backed-up or snapped.
The data is in the same state as if there had been a
power outage, hardware failure or software crash. Most
applications have a built-in crash recovery mechanism
that will allow it to recover a crash consistent copy
of its data. The alternative is application
consistency.
Deduplication
Enterprise data is highly redundant, with identical
files and sub-file data segments stored within systems.
Deduplication solutions assign each data segment a unique
ID, based on its content, which is used to compare it
with other data segments that have already been backed
up. Only new, unique data segments are stored and typically
deduplication occurs across sites and servers, hence
the term 'global deduplication.
Deduplication can occur at the data source or the backup
target. With source-based deduplication, data is deduplicated
as the backup process begins and before the data is
sent over the network. This provides the benefit of
shorter backup windows and lowered bandwidth requirements,
making it ideal for remote or WAN-based backup, VMware,
large file servers, and other environments where the
backup process is hampered by network or other resource
bottlenecks.
For target deduplication the main challenge being addressed
is the growth of back-end storage. The backup application
sends data to the target storage device and the data
is deduplicated at the device, either immediately or
at a scheduled time. It is found in VTLs and LAN backup
to disk appliances or platforms and provides the benefit
of plug and play with existing backup applications.
Unlike source based deduplication this will not remove
bottlenecks in getting the data to the backup storage
device.
Five 9s
availability
The equivalent of an average of 5.26 minutes of unplanned
downtime per year, or 99.999% system availability. Nowadays
this level of enterprise class availability is required
for most critical business data.
Input/Outputs
per second (IOPS)
The total number of reads (typically around 70%) and
writes (typically around 30%) per second provided by
a disk system.
Fibre Channel/SAS disks can provide twice the IOPS
provided by SATA disks and Flash disks can provide thirty
times the IOPS provided by Fibre Channel/SAS disks.
Point-in-time
data recovery
Journals all data changes to a dedicated volume allowing
recovery to any point-in-time. For example if a volume
is corrupt then it could be recovered to the point prior
to the corruption occurring. Also see CDP.
Redundant
Array of Independent Nodes (RAIN)
RAIN works in a similar fashion to RAID to deliver
high availability, but rather than protecting against
disk failure it protects against server failure. It
uses a grid architecture, which allows for online expansion
for increased scalability.
Recovery Point
Objective (RPO)
The acceptable amount of data as defined by an organisation
that can be lost in the event of a disaster measured
in time. For example an RPO of 2 hours requires the
data to be restored at a point-in-time no earlier than
2 hours prior to the disaster occurring.
The RPO in conjunction with the RTO
is the basis on which a business continuity strategy
is developed.
Recovery Time
Objective (RTO)
The duration of time as defined by an organisation within
which a business process must be restored after a disaster.
For example an RTO of two hours requires systems to
be back up and running and accessible within 2 hours
of the disaster occurring.
Replication
The process of copying or mirroring data from one storage
device to another, within the same storage array, or
to a different array located locally or remotely. Typically
only protects the most recent copy of the data and if
it becomes corrupted will simply "protect"
the corrupt data. CDP
will protect against the effects of data corruption
by allowing a restore to a previous, uncorrupted version.
Two options are available:
Synchronous
Guarantees zero data loss by mirroring writes
to a secondary storage device. A write is not considered
complete until acknowledged by both storage devices.
Performance drops proportional to distance, as latency
increases, therefore it is only suitable when there
is limited distance (100km or less) between storage
devices. The RPO
is zero.
Asynchronous
The write is considered complete as soon as the primary
storage device acknowledges it. The secondary storage
device is updated, but lags behind the primary. Performance
is not impacted therefore it supports unlimited distances
between storage devices. The RPO
is typically 30 minutes or greater.
Snapshot
A copy of a set of files and directories as they were
at a particular point-in-time. Snapshots can be mounted
read-only, or read-write, used to instantly restore
the current data to a given point-in-time, and can be
used for parallel processing such as accelerated backups,
reporting and testing. Two options are available:
Logical
view
Maintains a log of changes and combines the production
volume with these changes to create a logical point-in-time
volume. Takes seconds to create and requires significantly
less space than a clone.
Clone
Physically independent full copy of the production
volume. Can take a considerable amount of time to
initially create and requires the same space as the
production volume.
Write
Splitter
Replicates data to a secondary storage device by intercepting
application writes. Options include:
Host operating
system
Requires an agent to be installed on each server and
therefore will have a small impact on CPU utilisation
on the host.
Intelligent
Fabric
Provided within the FC switch, from vendors such as
Brocade and Cisco, therefore will not have any impact
on host performance and will not require the installation
of a host agent.
Storage
array
Provided within the array's storage processor therefore
will not have any impact on host performance and will
not require the installation of a host agent.
|