Posts

Interesting Infographic on LTO Tape

From Spectra Logic, this infographic tracks the evolution of LTO Ultrium Tape through its many generations.  Check it out:

LTO Infographic from Spectra Logic

The company is also preparing a  webinar for 2 November, 2017 on the Future of Tape Technology.  Register HERE if you want to attend.

What Data, Exactly, are We Legally Required to Retain?

What are your data preservation requirements?  If you are setting up an archive, chances are you need to find out.  The challenge is that finding a reliable list of legal and regulatory requirements for data preservation sounds a lot easier than it is.

Sungard, a purveyor of hot site services a decade or so back, had someone who was dedicated to maintaining a list of regulatory requirements for data preservation. Circa 2008, the list looked like this:

 

 

 

 

In that year, analysts were projecting that some $70B had been spent on regulatory compliance, mostly on the use of consulting servies to identify relevant laws and regulations and to establish rention policies.  At the time, the big problem confronting firms was that they were discovering a need to dip into the till again to develop compliant deletion policies.

Alas, the list has not been kept up to date since I last checked, and finding a coherent compilation of data preservation requirements via Internate search engines is a pain.  The concept of data preservation to satisfy regulatory requirements is conflated with lots and lots of rants from folks who, rightly or wrongly, believe that their government, internet service provider or telco are collecting information about them and preserving it for use against them at some future date.

Clearly, different market verticals have different data retention/preservation requirements.  There are also state and national rules and regulations to consider, especially in Europe where the movement to enable on-request identity erasure from corporate and governmental databases has gathered steam.

Watch this space to learn about additional post-2008 retention and deletion rules as we uncover them.  And if you or your business are required to retain certain types of data because of a regulation or law, please use the comment section to let us know.  We hope to have a full listing of all regulations and legal requirements related to data preservation and deletion for use by DMI members and visitors.

Thanks.

 

Looking for Data Management Tools that Work: Watch this Space

Data management has always labored under the impression that it was just too difficult a task to take on.  Face it: there is a lot of data recorded on storage media in most firms.  It mostly consists of files created by users or applications that wasted no effort identifying the contents of the file in an objectively intelligible way. 

Some of this data may have importance or value; but, much does not. So, just beginning the data management exercise -- or one of the subordinate data management tasks like developing an information security strategy or a data protection strategy or an archive strategy -- first requires the segregation of data into classes:  what's important, what's required to be retained in accordance with assorted laws or regulations (and do you even know which regs or laws are applicable to you?), what needs to be retained and for how long, etc. 

Sorting through the storage "junk drawer" is considered a laborious task that absolutely no one wants to be assigned.  And, assuming you do manage to sort your existing data, it is never enough.  There is another wave of data coming behind the one that created the mess you already have.  Talk about the Myth of Sysiphus.

What?  You are still reading.  Are you nuts?

Of course, everyone is hoping that data management will get easier, that wizards of automation will define tools to help corall and segregate all the bits.

Some offer a rip and replace strategy:  rip out your existing file system and replace it with object storage.  With object storage, all of your data is wrapped into a database construct that is rich with metadata.  Sounds like just the thing, but it is a strategy that is easiest to deploy in a "greenfield" situation -- not one that is readily deployed after years of amassing undifferentiated data.

Another strategy is to deduplicate everything.  That is, use software or hardware data reduction to squeeze more anonymous bits into a fixed amount of storage space.  This may fix the capacity issue associated with the data explosion...but only temporarily.

Another strategy is to find all files that haven't been accessed in 30, 60 or 90 days, then just export those files into a cheap storage repository somewhere.  If any of the data is ever needed again -- say, for legal discovery -- just provide a copy of this junk drawer, whether on premises or in a cloud, and let someone else sort through it all.

Bottom line:  just getting data into a manageable state is a pain.  Needed are tools that can apply policies to data automatically, based on metadata.  At a minimum, we should have automated tools to identify duplicates and dreck, so it can be deleted, and other tools that can place the remaining data into a low cost archive for later re-reference.  This isn't perfect, but it is possible with what we have today.

Going forward, we need to set up a strategy for marking files in a more intelligent way.  That may involve adding a step to the workflow in which the file creator creates keywords and tags on files when saving them -- a step that can't be overwritten by the user!  Virtually every productivity app has the capability for the user to enter granular descriptions of files, and some actually save this data about the data to a metadata construct appropriate for the file system or object model used to format the data itself.

If that seems too "brute force," another option is to mark the files transparently as they are saved.  Link file classification to who the user is who created the file based on a user ID or login or something.  If the user works in accounting, treat all of his or her output as accounting data and apply a policy to the data appropriate to accounting data.  That can be done by referencing an access control system like Active Directory to identify the department-qua-subnetwork in which the user works. 

Another approach might be to tag the data based on the workstation used to create the file.  Microsoft opened up its File Classification Infrastructure a few years ago.  That's the thing that shows attributes for files when you right click the file name:  HIDDEN, SECURE, ARCHIVE, etc.  With FCI opened up for user modification, each PC in the shop can be customized with additional attributes (like ACCOUNTING) that will be stored with data created on that workstation. 

Whether you mark the file by user role or by workstation/department, it isn't as effective as manually entering granular metadata for every file that is created.  So it won't be as effective as, say, deploying an object storage solution and manually migrating files into that object storage system while editing the metadata of each file.  You will get a lot of "false positives" and this will mitigate the efficiency of your storage or your archive or whatever.

 

 

Unfortunately, the tools for data management are difficult to get information on.  As reported in another blog post, doing an internet search for data management solutions yields a bunch of stuff that really has nothing to do with the metadata-based application of storage policy to files and objects.  Many of the tools are bridges to cloud services, or they are backup software tools whose vendors are trying to teach some new tricks, like archive.  Others are just a wholesale effort of the vendor to grab you by your data, figuring that your hearts and minds will follow.

We believe that cognitive data management is the future.  Take tools for storage resource management and monitoring and for storage service management and monitoring and for global namespace creation and monitoring, then integrate the information contained in all three (all of which is being updated at all times) so that the right data is stored on the right storage and receives the right services (privacy, protection and preservation) based on a policy that is created by busienss and technology users who are in a position to know what the data is and how it needs to be handled.

Such cognitive data management tools are only now beginning to appear in the market.  Watch this space for the latest information on what the developers are coming up with to simplify data management.

Tape and Clouds? New Besties in the Storage Realm

With data growth measured in the tens of zettabytes, the combined capacities of disk and flash storage will be insufficient to store all the bits.  That's where tape comes in, with its long runway of capacity growth milestones on the horizon.  

Even the cloudies, who are among the first to encounter the data deluge, get the need for tape.  IBM told us some stories about tape in the cloud when we went to Tucson, AZ recently to hear about their LTO-8 tape drive announcement.

 

 

Thanks to Colleen Sanchez, Lee Jesionowski, Tony Pearson and Ed Childers for their time and their insights.

 

Archive is the Killer App for Tape

During our recent visit to IBM in Tucson, AZ, we were honored to meet with tape experts Lee Jesionowski, Calline Sanchez, Tony Pearson and Ed Childers about tape futures and drivers behind the current renaissance of the technology.  One message that came through loud and clear was that current concerns about malware, ransomware and unauthorized disclosures of private data have built a fire under planners to consider was to secure their data better.  That includes the use of tape.

 

 

Between the natural air gap provided by tape and the pervasive data encryption service delivered on tape drives from IBM and other vendors, tape rules when it comes to security.

Thanks to IBM for having us out to the Executive Briefing Center and good luck with today's announcement of LTO-8 technology.

 

Security May Be a Big Driver of Tape Adoption Says IBM

During our recent visit to IBM Executive Briefing Center in Tucson, AZ, we had a chance to chat with IBM big brains about the future of tape technology and drivers of the current tape renaissance.  Here's what they said about the role of security concerns in tape adoption.

 

 

Thanks to IBM for having DMI out for the briefing, and especially to Calline Sanchez, Tony Pearson, Lee Jesionowski and Ed Childers.

IBM Announces LTO-8 Tape Drive

At IBM's invitation, the Data Management Institute traveled to the Executive Briefing Center in Tucson, AZ a couple of weeks ago to shoot a video blog with IBM smart folks, Calline Sanchez, Lee Jesionowski, Tony Pearson and Ed Childers.  The subject was tape and we were receiving a preview of Big Blue's forthcoming announcement of supporting technology for LTO Ultrium tape Generation 8.

While LTO-8 is not yet available, that hasn't stopped IBM from getting ready with a new drive and support for the new standard in its Spectrum Archive software and various storage enclosures.  Here is the full interview:

 

 

Thanks to IBM for having us out for this advanced briefing.

Data Protection and Data Security: Together at Last?

Anyone who has been around the corporate data center for a couple of decades has probably grown accustomed to seeing separate disciplines and/or departments for data protection/disaster recovery planning and information security.  Such a distinction has deep historical roots, but one must wonder whether it still makes any sense.

Data protection is part of disaster recovery planning (or business continuity planning if you prefer), which is a set of strategies and processes for preventing avoidable "disasters" (unplanned interruption events) and for minimizing the impact of disasters that cannot be prevented.  Data protection is central to DR because, aside from personnel, data is a unique corporate asset that cannot be replaced.  The only way to protect data is a strategy of redundancy:  make a copy and store the copy sufficiently distant from the original so that the same disaster event cannot destroy both the original and the copy.

 

In addition to disaster avoidance and data protection, a good DR capability also includes provisions for application, network and user recoveries, plus processes for testing, training and change management.  DMI provides a data protection/disaster recovery planning course and certification (Certified Data Protection Specialist or CDPS), by the way, if you are assigned the planning task and need some guidance.

Information security planning is very similar to DR planning.  Structurally, it aims to protect mission critical business processes and data assets, but it uses a number of interlocking strategies that are unique to security. 

 

Infosec has developed its own vocabulary and its own set of strategies for securing applications, networks and facility perimeter and endpoints, and of course, data assets.  Then, these strategies are supplemented by processes for active monitoring and periodic review to ensure that security provisions are keeping data private.

There is usually very little communication between the DR folks and the Infosec folks, except when DR needs to be concerned about recovering data that may be encrypted, or gaining access to an application or set of infrastructure in an emergency that is otherwise locked down by security's access control systems.  Conversely, the Infosec folks may only interact with the DR/data protection folks to ensure that continuous data protection capabilities are being deployed and leveraged to enable quick restore following a malware attack or a ransomware attack by "rewinding" data to a point before the attack occurred.

Both disciplines have much to learn from each other. DR, for example, has already flirted with nutty quantitative techniques for matching protection services to specific data given the threats to the organization, business unit, or infrastructure.  These quantitative methods, Single Loss Expectancy and Annual Loss Expectancy, were silly on their face and have been mostly abandoned by DR planners today.  The key problem with such techniques is that they require planners to have meaningful data regarding the probabilities of threat potentials being realized.  We have over 100 years of hurricane tracking data, but no one knew for sure when or where a hurricane was going to strike the US mainland in 2017.

Security is moving down this path, at present.  Attack surface reduction modeling techniques are the same sort of quasi-scientific quantitative-sounding methodologies as ALE and SLE in the DR world.  Some view them as an improvement over the threat/cost modeling that was used by many Infosec practitioners in the 1990s, but not by much.  Back then, we were told that the cost to protect should not be significantly greater than the cost to bad guys to circumvent the protection.  Only, the relationship was assymetrical:  the bad guys incurred little to no expense to test the security of their targets or to defeat the measures that were being taken to keep them out.

There is much more to this story, but DMI members who are interested should probably take the DMI workshop for Certified Data Security Specialists (CDSS) to get more information.

Bottom line:  DR and Infosec should be working together going forward in all aspects of data protection planning.  Moreiver, both DR and Infosec ought to be subsumed under the rubrick of cognitive data management in the future, since both data protection and data privacy/security are actually best delivered as services associated wtih data based on granular business-savvy policies. 

A Zettabyte Apocalypse?

Trends in data growth are downright scary.  Per Barry M. Ferrite, as well as leading IT analysts, data is on pace to grow from current levels to more than 60 zettabytes (ZBs) by 2020 and to more than 163 ZBs by 2025.  Driving this data growth are three trends:  the digitization of information formerly stored in analog formats (the so-called digital democracy), the mobile commerce phenomenon, and the Internet of Things.

 

This tsunami of new data will challenge large organizations and those that are data intensive.  One prominent cloud architect has noted that the current manufacturing output of the disk industry in terms of capacity is around 780 exabytes per year.  The flash industry produces approximately 500 exabytes of capacity annually.  Even with forecasted capacity improvements, the world will still confront a severe capacity shortage by 2020. 

Tape could help fill the void, with demonstrations of over 300 TB per cartridge coming from IBM and tape manufacturers.  How soon tape drives and cartridges supporting these capacities will come to market remains to be seen.

 

Teaching Storage Fundamentals? Why Not Make It Fun?

DMI has begun using an avatar, Barry M. Ferrite, "your trusted storage AI", to provide entertaining and informative public service announcements about storage technology and data management.  This follows a series of "edutainment" videos we made in 2012-2013 to talk about the state of storage industry infighting at that time. 

Each episode of Storage Wars was a mash-up of Star Wars and Annoying Orange. For their "historical value," here was our version of Storage Wars -- Episodes IV, V and VI (labeled Storage Wars, Storage Wars 2 and Storage Wars 3 for YouTube storage.}

 

 

 

 

 Hope you enjoy the trip down memory lane.  DMI will be creating more edutainment videos in the future to teach storage fundamentals.