HPC & Large Data Protection and Movement

Talk to an Atempo Expert!

Handling the Big Data Challenge!
HPC & Large Data Protection and Movement.
Protect and Move Very High Data Volumes and Billions of Files

Contact an Atempo Expert

Data volumes stored in HPC scientific research labs and enterprise research centers are growing at over 30% per year. Vastly increased compute performances are pushing storage loads to the limit.

Data storage has become a key component in HPC architecture choices.

Data Movement between compute/mid/archive tier storages is becoming a genuine technological and economical challenge.

Video Testimonial: DiRAC Memory Intensive Service, Durham University

View this 4-minute testimonial from DiRAC describing how the Miria for Archiving solution meets universities requirements for research project archiving to tape.

For more details, please read the full article on our blog.

Click to read the full story

Miria for Archiving is incredibly powerful and feature-rich. We’re very impressed so far. Archiving data flow performance on Lustre file systems is running at full tape speeds, which is perfect.

Dr Alastair Basden, Technical Lead for the DiRAC Memory Intensive Service, Durham University

Advances in HPC compute and storage  workloads continue to drive data management challenges; as new storage technologies are leveraged comes the challenges of multiple storage silos, heterogeneous file systems and long-term storage requirements, this drives the need for reliable and efficient data orchestration tools.

Laurence Horrocks-Barlow, Technical Director, OCF - Atempo, UK HPC Partner

  • Strength in numbers:
    parallel file systems

  • Parallel file systems in general and Lustre-based file systems in particular are the mainstay of HPC file storage architectures (see: www.top500.org). Lustre leverages end-to-end data throughput which can exceed 10 GB/sec and can also scale to many thousands of clients. This makes it ideal for HPC clusters and high-volume data environments.
  • To keep pace with data levels while managing budgets and offering a shared high computing service platform to multiple groups of users, research labs organize their HPC storage in tiers. Each tier has a dedicated purpose as well as distinct hardware, capacity and I/O characteristics. With the potential to rapidly generate many petabytes of data, users need to manage data generated at the burst or scratch tiers. The middle and storage tiers are where we need to move, manage and protect data.
    • Home Space: In the middle, this storage tier is typically used by research teams to document processes, prepare code, write articles, manage their application source codes etc.
    • Archive Space: storage used to preserve data and information for long-term purposes provides high volume capacities that can be organized to provide different levels of service, such as private archives or shared data archive. Depending on the research data and the domain of activity, this archive can be on-prem or cloud-based, or private or shared, or a mix.
  • Over and above the data which is generated on Lustre HPC architectures, other components also need protecting:
    • Applications required for data computational needs,
    • Lustre MDS and MDT components that are critical to managing and also recovering file system data
Exploding Data Volumes
Large data volumes imply that duplicating or replicating data takes time. There is often a bottleneck somewhere. Parallelizing data flow is key. Miria data movers can scale up to any infrastructure, matching data throughput and the maximum available bandwidth.
Storage and vendor agnostic
Disk, object storage, optical disk, tape, cloud storage media, including combinations of these, can be based on one or more storage technologies. Data movement and protection are impacted by available budgets and performance needs, Miria offers a very broad storage compatibility list with deep integration guaranteeing optimal performance.
Facing proprietary backup formats
Atempo's Miria means you won’t have to cope with proprietary formats. Only open formats are deployed. Maintain the original folder structure of your data, with native format on disk and benefit from the widely used TAR and LTFS on tape. File access for recovery is rapid and straightforward. Miria is also an ideal tool for long-term retention, archive and data sharing with third parties.
Speed and scalability
Our customers use Miria to handle extremely large files as well as hundreds of millions of small files which transit between petabyte-level storage spaces. Thanks to tight storage integration, Miria can leverage snapshots and FastScan technologies with selected vendors. The large volumes of data are transferred by scalable farms of Miria Data Movers, each capable of moving several GB/s. To increase performance, simply add a data mover. Scalability is no longer a concern. All data movements are restricted to authorized storage spaces and are logged.
Data Recovery
You just lost something from storage. Fortunately Miria has duplicated copies. The copy is complete with all the folder and sub-folder structure. Recovery can either be automated or you can browse the storage manually to get the file(s) you need. If you need to retrieve an archived project or folder, use metadata searches or browse and find the relevant version of your assets and restore them to where you need.
Integrating object storage
In the past, large volumes were often stored on tape. Now, when it comes to making full and incremental backups of large amounts of data, tape is challenged by object storage. Object Storage can swiftly restore any version of a file. Write speeds are managed by having the right number of Miria Data Movers making it a powerful and scalable backup and movement solution for Big Data.

Need Help?