Saturday 24 February 2018

4 steps to implementing high-performance computing for big data processing

If your company needs high-performance computing for its big data, an in-house operation might work best. Here’s what you need to know, including how high-performance computing and Hadoop differ.

big data processing

big data processing

In the big data world, not every company needs high performance computing (HPC), but nearly all who work with big data have adopted Hadoop-style analytics computing.

The difference between HPC and Hadoop can be hard to distinguish because it is possible to run Hadoop analytics jobs on HPC gear, although not vice versa. Both HPC and Hadoop analytics use parallel processing of data, but in a Hadoop/analytics environment, data is stored on commodity hardware and distributed across multiple nodes of this hardware. In HPC, where the size of data files is much greater, data storage is centralized. HPC, because of the sheer volume of its files, also requires more expensive networking communications, such as Infiniband, because the size of the files it processes require high throughput and low latency.

The message for company CIOs is clear: if you can avoid HPC and just use Hadoop for your analytics, do it. It is cheaper, easier for your staff to run, and might even be able to run in the cloud, where someone else (like a third party vendor) can run it.

Unfortunately, being an all-Hadoop shop is not possible for the many life sciences, weather, pharmaceutical, mining, medical, government, and academic companies and institutions that require HPC for processing. Because file size is large and processing needs are extreme, standard network communications, or connecting with the cloud, aren’t alternatives, either.

In short, HPC is a perfect example of a big data platform that is best run in-house in a data center. Because of this, the challenge becomes—how do you (and your staff) assure that the very expensive hardware you invest in is the best shape to do the job you need it to do?

“This is a challenge that many companies that must use HPC for their big data processing face,” said Alex Lesser, chief strategy officer at PSCC Labs, a big data Hadoop and HPC platform provider. “Most of these companies have a history of supporting a traditional IT infrastructure. They are comfortable getting out of this mindset to tackle a Hadoop analytics computing environment themselves because it uses commodity hardware they are already familiar with, but when it comes to HPC, the response is often “let the vendor take care of it.”

If considering a move to HPC seems right for your company, here are four steps to take:

1. Confirm that you have high-level support for HPC

Upper management and the board don’t have to be HPC gurus, but their understanding and support should never be presumed. Both groups should have enough understanding about HPC and what it can do for your company to be unequivocally in support of the sizable hardware, software and training investments you are likely to make. This means that they must be educated on two fronts: 1) what HPC is, and why it is different from plain old analytics and needs special hardware and software; 2) why it is necessary for the company to use HPC versus plain old analytics in order to meet its business objectives. Both of these educational efforts should be undertaken by the CIO or the CDO.

 Read More Here

Go to Source

The post 4 steps to implementing high-performance computing for big data processing appeared first on Statii News.



source http://news.statii.co.uk/4-steps-to-implementing-high-performance-computing-for-big-data-processing/

1 comment:

  1. smart outsourcing solutions is the best outsourcing training
    in Dhaka, if you start outsourcing please
    visit us: outsourcing training in bangladash

    ReplyDelete