Sunday 3 November 2013

When worlds collide

We know that mainframes are rock solid workhorses that ensure the banks and insurance companies and airlines and pretty much every other large organization get their work done correctly and swiftly. And we know that access to mainframes has been extended outside the world of green screens to anyone on a browser with proper authorization. And we also know that there’s little distinction between the world of cloud computing and distributed mainframe computing. But the latest big thing is Big Data – and that seems like a different world.

Big Data is used to refer to huge amounts (exabytes) of data, often unstructured, that can originate from a variety of sources – such as cameras, weather satellites, credit card machines, barcode readers, the Internet of Things, anything! This Big Data usually sits on Linux or Windows boxes and some of the early developers were Google, Amazon, and Facebook. The data is stored in HBase, a non-relational, distributed database, written in Java. And the file system is what’s called a Hadoop Distributed File System (HDFS). At runtime, a process maps the data and reduces it – that’s called MapReduce.

So how do these two worlds come together? For a start a lot of the things you need for Big Data are Open Source and come from the Apache Foundation. IBM is a member of the foundation and has a number of products that extend Big Data’s functionality. IBM provides InfoSphere BigInsights, Data Stage, Streams, and Guardium. There’s Big SQL with Big Insights V2.1, and the spreadsheet-like Big Sheets.

If you want to run Big Data – Hadoop – on your mainframe, you’ll need to do it in a Linux partition (Linux on System z). But IBM isn’t the only mainframe software vendor that’s getting in on the act. We’ve recently heard from BMC, Syncsort, Compuware, and Informatica about their products.

BMC has extended its Control-M automated mainframe job scheduler with Control-M for Hadoop. The product enables the creation and management of Hadoop workflows in an automated environment and is aimed at Hadoop application developers and enterprise IT administrators who are using Hadoop as part of their production workload.

Syncsort has Hadoop Connectivity, which prevents Hadoop becoming another silo within an enterprise. The product makes it easy to get data in and out of Hadoop. The product provides: native connectivity to all major data sources and targets; native mainframe connectivity and support for EBCDIC/ASCII, VSAM, Packed decimal, Comp-3, and more; heterogeneous database access on Hadoop; direct I/O access for faster data transfers; and high-performance compression.

Compuware has extended its Application Performance Management (APM) software with Compuware APM for Big Data. This, they claim, allows organizations to tame Big Data applications to eliminate inefficiencies and rapidly identify and resolve problems. Using PurePath Technology, it provides visibility into Hadoop and NoSQL applications. Organizations, they say, use Compuware APM for Big Data to reduce costs, analyse issues, and ensure optimal efficiency from their Big Data investments.

Informatica PowerExchange for Hadoop provides native high-performance connectivity to the Hadoop Distributed File System (HDFS). It enables organizations to take advantage of Hadoop’s storage and processing power using their existing IT infrastructure and resources. PowerExchange for Hadoop can bring any and all enterprise data into Hadoop for data integration and processing. Fully integrated with Informatica PowerCenter, it moves data into and out of Hadoop in batch or real time using universal connectivity to all data, including mainframe, databases, and applications, both on-premises and in the cloud. Informatica PowerCenter Big Data Edition is, they claim, highly scalable, high-performance enterprise data integration software that works with both Hadoop and traditional data management infrastructures.

Clearly, these two different worlds have more than collided – we are beginning to see the integration of these previously quite separate worlds with software from a number of vendors helping users with the integration process. And as users, we get the best of both worlds!

No comments: