Friday, April 27, 2012

Fast data hits the big data fast lane

This guest post comes courtesy of Tony Baer's OnStrategies blog. Tony is senior analyst at Ovum.

By Tony Baer

Of the 3 “V’s” of Big Data – volume, variety, velocity (we’d add “Value” as the 4th V) – velocity
has been the unsung ‘V.’ With the spotlight on Hadoop, the popular image of Big Data is large petabyte data stores of unstructured data (which are the first two V’s). While Big Data has been thought of as large stores of data at rest, it can also be about data in motion.

“Fast Data” refers to processes that require lower latencies than would otherwise be possible with optimized disk-based storage. Fast Data is not a single technology, but a spectrum of approaches that process data that might or might not be stored. It could encompass event processing, in-memory databases, or hybrid data stores that optimize cache with disk.

Fast Data is nothing new, but because of the cost of memory, was traditionally restricted to a handful of extremely high-value use cases. For instance:
  • Wall Street firms routinely analyze live market feeds, and in many cases, run sophisticated complex event processing (CEP) programs on event streams (often in real time) to make operational decisions.
  • Telcos have handled such data in optimizing network operations while leading logistics firms have used CEP to optimize their transport networks.

    While Big Data has been thought of as large stores of data at rest, it can also be about data in motion.

  • In-memory databases, used as a faster alternative to disk, have similarly been around for well over a decade, having been employed for program stock trading, telecommunications equipment, airline schedulers, and large destination online retail (e.g., Amazon).
Hybrid in-memory and disk have also become commonplace, especially amongst data warehousing systems (e.g., Teradata, Kognitio), and more recently among the emergent class of advanced SQL analytic platforms (e.g., Greenplum, Teradata Aster, IBM Netezza, HP Vertica, ParAccel) that employ smart caching in conjunction with a number of other bells and whistles to juice SQL performance and scaling (e.g., flatter indexes, extensive use of various data compression schemes, columnar table structures, etc.).

Many of these systems are in turn packaged as appliances that come with specially tuned, high-performance backplanes and direct attached disk.

Finally, caching is hardly unknown to the database world. Hot spots of data that are frequently accessed are often placed in cache, as are snapshots of database configurations that are often stored to support restore processes, and so on.

So what’s changed?


The usual factors: the same data explosion that created the urgency for Big Data is also generating demand for making the data instantly actionable. Bandwidth, commodity hardware and, of course, declining memory prices, are further forcing the issue: Fast Data is no longer limited to specialized, premium use cases for enterprises with infinite budgets.

Not surprisingly, pure in-memory databases are now going mainstream: Oracle and SAP are choosing in-memory as one of the next places where they are establishing competitive stakes: SAP HANA vs. Oracle Exalytics.

Both Oracle and SAP for now are targeting analytic processing, including OLAP (by raising the size limits on OLAP cubes) and more complex, multi-stage analytic problems that traditionally would have required batch runs (such as multivariate pricing) or would not have been run at all (too complex, too much delay).

Not surprisingly, pure in-memory databases are now going mainstream.



More to the point, SAP is counting on HANA as a major pillar of its stretch goal to become the #2 database player by 2015, which means expanding HANA’s target to include next generation enterprise transactional applications with embedded analytics.

Potential use cases for Fast Data could encompass:
  • A homeland security agency monitoring the borders requiring the ability to parse, decipher, and act on complex occurrences in real time to prevent suspicious people from entering the country
  • Capital markets trading firms requiring real-time analytics and sophisticated event processing to conduct algorithmic or high-frequency trades
  • Entities managing smart infrastructure which must digest torrents of sensory data to make real-time decisions that optimize use of transportation or public utility infrastructure
  • B2B consumer products firms monitoring social networks may require real-time response to understand sudden swings in customer sentiment
For such organizations, Fast Data is no longer a luxury, but a necessity.

More specialized use cases are similarly emerging now that the core in-memory technology is becoming more affordable. YarcData, a startup from venerable HPC player Cray Computer, is targeting graph data, which represents data with many-to-many relationships. Graph computing is extremely process-intensive, and as such, has traditionally been run in batch when involving Internet-size sets of data. YarcData adopts a classic hybrid approach that pipelines computations in memory, but persisting data to disk. YarcData is the tip of the iceberg – we expect to see more specialized applications that utilize hybrid caching that combine speed with scale.

Memory’s not the new disk


T
he movement – or tiering – of data to faster or slower media is also nothing new. What is new is that data in memory may no longer be such a transient thing, and if memory is relied upon for in situ processing of data in motion or rapid processing of data at rest, memory cannot simply be treated as the new disk. Excluding specialized forms of memory such as ROM, by nature anything that’s solid state is volatile: there goes your power… and there goes your data.

Not surprisingly, in-memory systems such as HANA still replicate to disk to reduce volatility. For conventional disk data stores that increasingly leverage memory, Storage Switzerland’s George Crump makes the case that caching practices must become smarter to avoid misses (where data gets mistakenly swapped out).

There are also balance of system considerations: memory may be fast, but is its processing speed well matched with processor?



There are also balance of system considerations: memory may be fast, but is its processing speed well matched with processor? Maybe solid state overcomes I/O issues associated with disk, but may still be vulnerable to coupling issues if processors get bottlenecked or MapReduce jobs are not optimized.

Declining memory process are putting Fast Data on the fast lane to mainstream. But as the technology is now becoming affordable, we’re still early in the learning curve for how to design for it.

This guest post comes courtesy of Tony Baer's OnStrategies blog. Tony is senior analyst at Ovum.

You may also be interested in:

Thursday, April 26, 2012

Case study: Strategic approach to disaster recovery and data lifecycle management pays off for Australia's SAI Global

Listen to the podcast. Find it on iTunes/iPod. Read a full transcript or download a copy. Sponsor: VMware.

The latest BriefingsDirect case study discussion focuses on how business standards and compliance services provider SAI Global is benefiting from a strategic view of IT enabled disaster recovery (DR).

Learn here how SAI Global has brought advanced backup and DR best practices into play for its users and customers. Examine too how this has not only provided business continuity assurance, but it has also provided beneficial data lifecycle management and virtualization efficiency improvement.

Mark Iveli, IT System Engineer at SAI Global, based in Sydney, Australia, details on how standardizing DR has helped improve many aspects of SAI Global’s business reliability. The discussion is moderated by Dana Gardner, Principal Analyst at Interarbor Solutions. [Disclosure: VMware is a sponsor of BriefingsDirect podcasts.]

Here are some excerpts:
Iveli: When we started to get into DR, we handled it from an IT point of view and it was very much like an iceberg. We looked at the technology and said, "This is what we need from a technology point of view." As we started to get further into the journey, we realized that there was so much more that we were overlooking.

We were working with the businesses to go through what they had, what they didn’t have, what we needed from them to make sure that we could deliver what they needed. Then we started to realize it was a bigger project.

The initiative for DR started about 18 months ago with our board, and it was a directive to improve the way we had been doing things. That meant a complete review of our processes and documentation.

We had a number of business units that all had different strategies for their disaster recovery, and different timings and mechanisms to report on it.

Through the use of VMware Site Recovery Manager (SRM) in the DR project, we've been able to centralize all of the DR processes, provide consistent reporting, and be able to schedule these business units to do all of their testing in parallel with each other.

So we can make a DR session, so to speak, within the business and just run through the process for them and give them their reports at the end of it.

We've installed SRM 4.1 and our installation was handled by an outsource company, VCPro. They were engaged with us to do the installation and help us get the design right from a technical point of view.

Trying to make it a daily operational activity is where the biggest challenge is, because the implementation was done in a project methodology.



Trying to make it a daily operational activity is where the biggest challenge is, because the implementation was done in a project methodology. Handing it across to the operational teams to make it a daily operation, or a daily task, is where we're seeing some challenges.

I'm a systems engineer with SAI Global, and I've been with the company for three years. When the DR project started to gather some momentum, I asked to be a significant part of the project. I got the nod and was seconded to the DR project team because of my knowledge of VMware.

That’s what my role is now -- keeping the SRM environment tuned and in line with what the business needs. That’s where we're at with SRM.

Complete review

The first 12 months of this journey so far has been all around cleaning up, getting our documentation up to spec, making sure that every business unit understood and was able to articulate their environments well. Then, we brought all that together so that we could say what’s the technology that’s going to encapsulate all of these processes and documentation to deliver what the business needs, which is our recovery point objective (RPO) and for our recovery time objective (RTO).

SAI Global is an umbrella company. We have three to four main areas of interest. The first one, which we're probably most well-known for, is our Five Ticks brand, and that’s the ASIS standards. The publication, the collection, the customization to your business is all done through our publishing section of the business.

That then flows into an assurance side of the business, which goes out and does auditing, training, and certification against the standards that we sell.

We continue to buy new companies, and part of the acquisition trail that we have been on has been to buy some compliance businesses. That’s where we provide governance risk and compliance services through the use of Board Manager, GRC Manager, Cintellate, and in the U.S., Integrity 360.

Finally, last year, we acquired a company that deals solely in property settlement, and they're quite a significant section of the business that deals a lot with banks and convincing firms in handling property settlements.

So we're a little bit diverse. All three of those business sections have their own IT requirements.

Gardner: Like many businesses, your brand is super important. The trust associated with your performance is something you will take seriously. So DR, backup and recovery, business continuity, are top-line issues for you.

Because of what we do, especially around the property settlement and interactions with the banks, DR is critical for us.



Is there anything about what you've been doing as a company that you think makes DR specifically important for you?

Iveli: From SAI Global’s point of view, because of what we do, especially around the property settlement and interactions with the banks, DR is critical for us.

Our publishing business feels that their website needs to be available five nines. When we showed them what DR is capable of doing, they really jumped on board and supported it. They put DR as high importance for them.

As far as businesses go, everyone needs to be planning for this. I read an article recently where something like 85 percent of businesses in the Asia-Pacific region don’t have a proper DR strategy in place. With the events that have happened here in Australia recently with the floods, and when you look at the New Zealand earthquakes and that sort of stuff, you wonder where the businesses are putting DR and how much importance they've got on it. It’s probably only going to take a significant event before they change their minds.

Gardner: I was intrigued, Mark, when you said what DR is capable of doing. Do you feel that there is a misperception, perhaps an under-appreciation of what DR is?

Process in place

Iveli: The larger DR whole was just that these business units had a process in place, but it was an older process and a lot of the process was designed around a physical environment.

With SAI Global being almost 100 percent virtual, moving them into a virtual space opened their minds up to what was possible. So when we can sit down with the business units and say, "We're going to do this DR test," they ask if it will impact production. No, it won’t. How is it happening? "Well, we are going to do this, this, and this in the background. And you will actually have access to your application the way it is today, it’s just going to be isolated and fenced off."

They say, "This is what we've been waiting for." We can actually do this sort of stuff. They're starting to see and ask, "Can we use this to test the next version of the applications and can we test this to kind of map out our upgrade path?"

We're starting to move now into a slightly different world, but it has been the catalyst of DR that’s enabled them to start thinking in these new ways, which they weren’t able to do before.

Gardner: So being able to completely switch over and recover with very little interruption in terms of the testing, with very little downtime or loss, the opportunity then is to say, "What else can we do with this capability?"

It has been the catalyst of DR that’s enabled them to start thinking in these new ways, which they weren’t able to do before.



Iveli: Absolutely. With this new process, we've taken the approach of baby steps, and we're just looking to get some operational maturity into the environment first, before we start to push the boundaries and do things like disaster avoidance.

Having the ability to just bring these environments across in a state that’s identical to production is eye-opening for them. Where the business wants to take it is the next challenge, and that’s probably how do we take our DR plan to version 2.0.

We need to start to work with the likes of VMware and ask what our options are now. We have this in place, people are liking it, but they want to take it into a more highly available solution. What do we do next? Use vCloud Director? Do we need to get our sites in an active/active pairing?

However, whatever the next technology step is for us, that’s where the business are now starting to think ahead. That’s nice from an alignment point of view.

Gardner: Those DR maturation approaches put you in a position to further leverage virtualization. Is there sort of a virtuous adoption pattern, when you combine modern DR with widespread virtualization?

Iveli: Because all of a sudden, your machines are just a file on a data store somewhere, now you can move these things around. As the physical technologies continue to advance -- the speed of our networks, the speed of the storage environments, metro clustering, long haul replication -- these technologies are allowing businesses to think outside of the box and look at ways in which they can provide faster recovery, higher availability, more elastic environments.

You're not pinned down to just one data center in Sydney. You could have a data center in Sydney and a data center in New Zealand, for instance, and we can keep both of those sites online and in sync. That’s couple of years down the track for our business, but that’s a possibility somehow through the use of more virtualization technology.

Gardner: Any advice for those listening in who are beginning their journey? For those folks that are recognizing the risks and seeing these larger benefits, these more strategic benefits, how would you encourage them to begin their journey, what advice might you offer?

Iveli: The advice would be to get hired guns in. With DR, you're not going to be able to do everything yourself. So spend a little bit more money and make sure that you get some consultants in like VCPro. Without these guys, we probably would have struggled a little bit just making sure that our design was right. These guys ensured that we had best practice in our designs.

Before you get into DR, do your homework. Make sure that your production environment is pristine. Clean it up. Make sure that you don’t have anything in there that’s wasting your resources.

Come around with a strong business case for DR. Make sure that you've got everybody on board and you have the support of the business.

Make sure that your production environment is pristine. Clean it up. Make sure that you don’t have anything in there that’s wasting your resources.



When you get into DR, make sure that you secure dedicated resources for it. Don't just rely on people coming in and out of the project. Make sure that you can lead people to the resource and you make sure that they are fully engaged in the design aspects and the implementation aspects.

And as you progress with DR, incorporate it as early as you can into your everyday IT operation. We're seeing that, because we held it back from our operations, just handing it over and having them manage the hardware and the ESX and the logical layers, the environment, they were struggling just to get their head around it and what was what, where should this go, where should that go.

And once it’s in place, celebrate. It can be a long haul. It can be quite a trying time. So when you finally get it done, make sure that you celebrate it.

Gardner: And perhaps a higher degree of peace of mind that goes with that.

Iveli: Well, you'll find out when you get through it, how much easier this is making your life, how much better you can sleep at night.
Listen to the podcast. Find it on iTunes/iPod. Read a full transcript or download a copy. Sponsor: VMware.

You may also be interested in: