To IT planners, data archiving is being seen more and more as a strategic imperative. The reasons are several:
- The growth of data is not being matched by a corresponding growth in disk or flash capacity. If analyst projections are correct, the total output of the disk and flash industries will be sufficient to store only two percent of the nearly 60 zettabytes of new data expected by 2020. The only way to cope with the deluge will be to migrate data that is less frequently accessed and updated to an archival repository, either on premises or in a cloud.
- Internet of Things and Big Data Analytics are behind much of the zettabyte growth of data. Much of this data is of ephemeral value, used only within seconds, minutes or days of when it is collected to derive business value. However, for longer term trend analysis, IoT and BDA data must be retained in an archive and recalled periodically for use in analytics processes.
- Finally, regulations governing data preservation are increasing in number and scope worldwide. In many vertical markets, a large part of data originated by companies will need to be preserved in storage long after their useful life has been exceeded.
These and other issues are pushing archive into the must-have category. The conundrum confronting planners is how to build an archive strategy that is cost effective, that scales readily to the capacities required now and in the future, and that is easy to use and highly manageable – preferably with the assistance of cognitive, automated tools.
This is the topic of the CDAP. or Certified Data Archive Professional, training program. This event is the fourth in a series of data management workshops that span nearly six hours and provide not only the latest guidance on technologies and best practices, but also incorporate briefings from some of the industry’s top mindshare leaders in the field of data management. At the conclusion of the event, trainees receive a free certification from the Data Management Institute attesting to their knowledge and skills in the field of data archiving.
The web-based workshop proceeds in six segments:
Segment 1 - THE CHALLENGES OF ARCHIVE
This segment describes the drivers behind the current swell of interest in data archive and defines the parameters for developing an effective archive strategy. It includes a data management perspective that uses policy-based automation to facilitate the migration of data into an archive platform and assesses the role of cloud service providers in the archive space.
Segment 2 - CHARACTERIZING DATA FOR ARCHIVE
The preponderance of today’s data is stored as “unstructured data” in different formats using a combination of file systems and object systems. A unified archive approach must be able to handle this diversity of file structures to create a holistic methodology for placing and retrieving data in an archive. Decisions need to be made early on regarding the data that requires archive and the format of the data in storage. Object storage vendors talk a good game about their metadata-rich approach to data characterization, but file systems remain the dominant model for storing unstructured bits. Some form of normalization is required and this session will discuss some of the options.
Segment 3 - DEFINING THE ARCHIVE PLATFORM
Traditionally, archive has been the domain of specialized software vendors who provide tools for selecting and migrating data into a pre-engineered archive platform, usually leaving behind a stub in the active file system namespace with a “forwarding address” for the location of the data in the archive. With the hyperscale growth of storage, both in the data center and in storage service clouds, the question can be raised whether such a strategy is viable going forward. This segment looks at physical storage types and seeks to improve our understanding of storage architecture and options so that intelligent design choices can be made around archival platform components.
Segment 4 - OPERATING THE ARCHIVE
Successful archiving involves a combination of data copy, data migration, and data deletion. Each of these processes, while deceptively simple at first glance, are fraught with myriad complexities and decision points that need to be addressed and resolved in an archive strategy. Some are engineering concerns: what is the most efficient way to get data into an archive, which link types should be used, etc. Other issues have to do with legal or regulatory parameters governing access control, privacy requirements, retention timeframes, etc. This segment looks at the central activities of archiving to help define the software functionality requirements for successful data preservation.
Segment 5 - ARCHIVE AS A SERVICE
This segment examines the increasing role of cloud service providers in the archive space. We will look at the archive services offered by prominent cloud vendors and identify the trade-offs and the advantages of integrating clouds into your archival strategy.
Segment 6 - ARCHIVE IS PART OF COGNITIVE DATA MANAGEMENT
Data archive continues to be treated as a rarified discipline with its own hardware and software components and its own processes and practices. However, it is actually a sub-domain of data management. Archive is actually the allocation of data preservation services to data assets based on an understanding of assets and their preservation requirements and codified in a general policy for data management that describes and defines not only the archive/preservation requirements of the data, but also their privacy/security and protection/disaster recovery needs. Given the amount of data to be administered, cognitive computing is being enlisted into the effort of holistic and comprehensive data management. This segment looks at archive within the context of the cognitive data management domain.
The trainees who attend the workshop will receive a certificate from the Data Management Institute designating them as Certified Data Archive Professionals (CDAP) and will be invited to participate in the Institute’s Data Archive Community of Interest where they can access other workshops and additional certification training. More than 80,000 have already been certified by DMI since 2003.
Links to the on-line course materials, hosted by Virtualization Review, will be provided shortly. Watch this space.