Data Protection and Data Security: Together at Last?

Anyone who has been around the corporate data center for a couple of decades has probably grown accustomed to seeing separate disciplines and/or departments for data protection/disaster recovery planning and information security.  Such a distinction has deep historical roots, but one must wonder whether it still makes any sense.

Data protection is part of disaster recovery planning (or business continuity planning if you prefer), which is a set of strategies and processes for preventing avoidable "disasters" (unplanned interruption events) and for minimizing the impact of disasters that cannot be prevented.  Data protection is central to DR because, aside from personnel, data is a unique corporate asset that cannot be replaced.  The only way to protect data is a strategy of redundancy:  make a copy and store the copy sufficiently distant from the original so that the same disaster event cannot destroy both the original and the copy.


In addition to disaster avoidance and data protection, a good DR capability also includes provisions for application, network and user recoveries, plus processes for testing, training and change management.  DMI provides a data protection/disaster recovery planning course and certification (Certified Data Protection Specialist or CDPS), by the way, if you are assigned the planning task and need some guidance.

Information security planning is very similar to DR planning.  Structurally, it aims to protect mission critical business processes and data assets, but it uses a number of interlocking strategies that are unique to security. 


Infosec has developed its own vocabulary and its own set of strategies for securing applications, networks and facility perimeter and endpoints, and of course, data assets.  Then, these strategies are supplemented by processes for active monitoring and periodic review to ensure that security provisions are keeping data private.

There is usually very little communication between the DR folks and the Infosec folks, except when DR needs to be concerned about recovering data that may be encrypted, or gaining access to an application or set of infrastructure in an emergency that is otherwise locked down by security's access control systems.  Conversely, the Infosec folks may only interact with the DR/data protection folks to ensure that continuous data protection capabilities are being deployed and leveraged to enable quick restore following a malware attack or a ransomware attack by "rewinding" data to a point before the attack occurred.

Both disciplines have much to learn from each other. DR, for example, has already flirted with nutty quantitative techniques for matching protection services to specific data given the threats to the organization, business unit, or infrastructure.  These quantitative methods, Single Loss Expectancy and Annual Loss Expectancy, were silly on their face and have been mostly abandoned by DR planners today.  The key problem with such techniques is that they require planners to have meaningful data regarding the probabilities of threat potentials being realized.  We have over 100 years of hurricane tracking data, but no one knew for sure when or where a hurricane was going to strike the US mainland in 2017.

Security is moving down this path, at present.  Attack surface reduction modeling techniques are the same sort of quasi-scientific quantitative-sounding methodologies as ALE and SLE in the DR world.  Some view them as an improvement over the threat/cost modeling that was used by many Infosec practitioners in the 1990s, but not by much.  Back then, we were told that the cost to protect should not be significantly greater than the cost to bad guys to circumvent the protection.  Only, the relationship was assymetrical:  the bad guys incurred little to no expense to test the security of their targets or to defeat the measures that were being taken to keep them out.

There is much more to this story, but DMI members who are interested should probably take the DMI workshop for Certified Data Security Specialists (CDSS) to get more information.

Bottom line:  DR and Infosec should be working together going forward in all aspects of data protection planning.  Moreiver, both DR and Infosec ought to be subsumed under the rubrick of cognitive data management in the future, since both data protection and data privacy/security are actually best delivered as services associated wtih data based on granular business-savvy policies. 

A Zettabyte Apocalypse?

Trends in data growth are downright scary.  Per Barry M. Ferrite, as well as leading IT analysts, data is on pace to grow from current levels to more than 60 zettabytes (ZBs) by 2020 and to more than 163 ZBs by 2025.  Driving this data growth are three trends:  the digitization of information formerly stored in analog formats (the so-called digital democracy), the mobile commerce phenomenon, and the Internet of Things.


This tsunami of new data will challenge large organizations and those that are data intensive.  One prominent cloud architect has noted that the current manufacturing output of the disk industry in terms of capacity is around 780 exabytes per year.  The flash industry produces approximately 500 exabytes of capacity annually.  Even with forecasted capacity improvements, the world will still confront a severe capacity shortage by 2020. 

Tape could help fill the void, with demonstrations of over 300 TB per cartridge coming from IBM and tape manufacturers.  How soon tape drives and cartridges supporting these capacities will come to market remains to be seen.


Teaching Storage Fundamentals? Why Not Make It Fun?

DMI has begun using an avatar, Barry M. Ferrite, "your trusted storage AI", to provide entertaining and informative public service announcements about storage technology and data management.  This follows a series of "edutainment" videos we made in 2012-2013 to talk about the state of storage industry infighting at that time. 

Each episode of Storage Wars was a mash-up of Star Wars and Annoying Orange. For their "historical value," here was our version of Storage Wars -- Episodes IV, V and VI (labeled Storage Wars, Storage Wars 2 and Storage Wars 3 for YouTube storage.}





 Hope you enjoy the trip down memory lane.  DMI will be creating more edutainment videos in the future to teach storage fundamentals.

Why Not Used Gear?

Barry M. Ferrite responded last May to inquiries from many DMI members regarding how to bend the cost curve of storage, which currently accounts for between 30 to 70 cents of every dollar spent annually for IT hardware.  He talked about the secondary market, a place where you could buy used hardware at a fraction of the price of new gear, and build out your capacity without breaking the bank.  

Barry introduced us to ASCDI, an organization for secondary market equipment sales that imposes a code of ethics on members to ensure that consumers get the products they were promised and in good working order.  Have a listen.



DMI thanks Joe Marion of ASCDI for offering his perspective for this video.

Barry M. Ferrite Talks Tape

Tape?  What's that?

Scary as it seems, this is actually not an uncommon question from novice IT personnel, especially those who have been taught their trade at schools offered by hypervisor computing vendors or flash technology companies.  Yet, tape storage is coming back into vogue in industrial clouds, large data centers and certain vertical industry markets.

Barry M. Ferrite, our trusted storage AI, offered this public service announcement on LTO-7 tape about a year ago to help acquaint newbies with the merits of tape technology.  He will likely revisit this subject shortly with the release of LTO-8.



Don't count out tape as part of your storage infrastructure.


Slow VMs? Adaptive Parallel I/O Tech May Be the Solution

A little over a year ago, DataCore Software's late Chief Scientist, Ziya Aral, released a groundbreaking piece of technology he called adaptive parallel I/O that showed the way to alleviate RAW I/O congestion causing applications, especially virtual machines running in hypervisor environments, to run slowly.  

Demonstrations of the effectiveness of adaptive parallel I/O in reducing latency and boosting performance of VMs demonstrated the silliness of arguments by leading hypervisor vendors that slow storage was to blame for poor VM perfomance.  Storage was not the problem; the decreasing rate at which I/Os could be placed onto the I/O bus (RAW I/O speed) was the problem.

The problem was that hypervisor vendors really don't seem to want to place blame where it belongs -- with hypervisors and how they use logical cores in multi-core processors.  In better times, the error of such an assertion (that storage was responsible for application performance) could be shown just by looking at queue depths on the hosting server.  If the queue depth was deep, then slow storage I/O was to blame.  Conversely, if queue depths were shallow, as they typically are in hypervisor computing settings we've seen, then the problem lies elsewhere.

Aral and DataCore showed that RAW I/O speeds were to blame and they provided a software shim that converts unused logical CPU cores into a parallel I/O processing engine to resolve the problem.  Here is our avatar, Barry M. Ferrite, reviewing the technology in its early days -- at about the same time as Star Wars Episode VII was about to be released.



Since the initial release of Adaptive Parallel I/O technology, DataCore has steadily improved its results as measured by the Storage Performance Council, reaching millions of IOs per second in SPC benchmarks...on commodity servers from Lenovo and other manufacturers.

So, why isn't adaptive parallel I/O part of software-defined storage?

Barry M. Ferrite Warns of Z-Pocalypse, Recommends Archiving

In a couple of public service announcements made last year, Barry M. Ferrite, DMI's "trusted storage AI," warned of a coming Z-Pocalypse (zettabyte apocalypse).  Archiving is the only solution for dealing with the data deluge. 

These PSAs provided some "edutainment" to help folks get started with their archive planning.  We hope it helps...



Continuing on this message, Barry returned in the next PSA with this additional information...



Amusing but serious, we hope to add more guidance from Barry in the future on the topics of archive and data management.

Introducing Barry M. Ferrite, DMI's Trusted Storage AI

Now a little over a year old, the Data Management Institute's "trusted storage AI" (artificial intelligence) is one Barry M. Ferrite.  Here is his first appearance with many yet to come.



We look forward to Barry's occasional public service announcements on all things storage.

Surveying the Data Management Vendor Market: Methodology
The data management market today comprises many products and technologies, but comparatively few that include all of the components and constructs enumerated above.  To demonstrate the differences, we surveyed the offerings of vendors that frequently appear in trade press accounts, industry analyst reports and web searches.  Our list originally included the following companies: 
  • Avere Systems*
  • Axaem
  • CTERA*
  • Clarity NOW Data Frameworks
  • Cloudian HyperStore
  • Cohesity*
  • Egnyte
  • ElastiFile*
  • Gray Meta Platform
  • IBM*
  • Komprise
  • Nasuni*
  • Panzura*
  • Primary Data*
  • QStar Technologies*
  • Qubix
  • Seven10
  • SGL
  • ShinyDocs
  • StarFish Global
  • StorageDNA*
  • STRONGBOX Data Solutions*
  • SwiftStack Object Storage*
  • Talon*
  • Tarmin*
  • Varonis
  • Versity Storage Manager 
Only a subset of these firms responded to our requests for interview (denoted with asterisks) which we submitted by email either to the point of contact identified on their websites or in press releases.  After scheduling interviews, we invited respondents to provide us with their “standard analyst or customer product pitch” – usually delivered as a presentation across a web-based platform – and we followed up with questions to enable comparisons of the products with each other.  We wrote up our notes from each interview and submitted them to the vendor to ensure that we had not misconstrued or misunderstood their products.    These interviews were updated to ensure their accuracy when comments were received back from the respondents.  Following are those discussions.
What is Cognitive Data Management?

Ideally, a data management solution will provide a means to monitor data itself – the status of data as reflected in its metadata – since this is how data is instrumented for management in the first place.  Metadata can provide insights into data ownership at the application, user, server, and business process level.  It also provides information about data access and update frequency and physical location.



A real data management solution will offer a robust mechanism for consolidating and indexing this file metadata into a unified or global namespace construct.  This provides uniform access to file listings to all authorized users (machine and human) and a location where policies for managing data over time can be readily applied.

That suggests a second function of a comprehensive or real data management solution.  It must provide a mechanism for creating management policies and for assigning those policies to specific data to manage it through its useful life.  

A data management policy may offer simplistic directions.  For example, it may specify that when accesses to the data fall to zero for thirty days, the data should be migrated off of expensive high performance storage to a less expensive lower performance storage target.  However, data management policies can also define more complex interrelationships between data, or they may define specific and granular service changes to data that are to be applied at different times in the data lifecycle.  Initially, for example, data may require continuous data protection in the form of a snapshot every few seconds or minutes in order to capture rapidly accruing changes to the data.  Over time, however, as update frequency slows, the protective services assigned to the data may also need change – from continuous data protection snapshots to nightly backups, for example.  Such granular service changes may also be defined in a policy.

The policy management framework provides a means to define and use the information from a global namespace to meet the changing storage resource requirements and storage service requirements (protection, preservation and privacy are defined as discrete services) of the data itself.  The work of provisioning storage resources and services to data, however, anticipates two additional components of a data management solution.

In addition to a policy management framework and global namespace, a true data management solution requires a storage resource management component and a storage services component.  The storage resource management component inventories and tracks the status of the storage that may be used to provide hosting for data.  This component monitors the responsiveness of the storage resource to access requests as well as its current capacity usage.  It also tracks the performance of various paths to the storage component via networks, interconnects, or fabrics.  

The storage services management component performs roughly the same work as the storage resource manager, but with respect to storage services for protection, preservation and privacy.  This management engine identifies all service providers, whether they are software providers operated on dedicated storage controllers, or as part of a software-defined storage stack operated on a server, or as stand-alone third party software products.  The service manager identifies the load on each provider to ensure that no one provider is overloaded with too many service requests.

Together with the policy management framework and global namespace, storage resource and storage service managers provide all of the information required by decision-makers to select the appropriate resources and services to provision to the appropriate data at the appropriate time in fulfillment of policy requirements.  That is an intelligent data management service – with a human decision-maker providing the “intelligence” to apply the policy and provision resources and services to data.

However, given the amount of data in even a small-to-medium-sized business computing environment, human decision-makers may be overwhelmed by the sheer volume of data management work that is required.  For this reason, cognitive computing has found its way into the ideal data management solution.  

A cognitive computing engine – whether in the form of an algorithm, a Boolean logic tree, or an artificial intelligence construct – supplements manual methods of data management and makes possible the efficient handling of extremely large and diverse data management workloads.  This cognitive engine is the centerpiece of “cognitive data management” and is rapidly becoming the sine qua non of contemporary data management technology and a key differentiator between data management solutions in the market.