Thursday, August 16, 2012

Columbia Sportswear extends deep server virtualization to improved ERP operations, disaster recovery efficiencies

Listen to the podcast. Find it on iTunes/iPod. Read a full transcript or download a copy. Sponsor: VMware.

The latest BriefingsDirect end-user case-study uncovers how outerwear and sportswear maker and distributor Columbia Sportswear has used virtualization techniques and benefits to significantly improve their business operations.

We’ll see how Columbia Sportswear’s use of deep virtualization assisted in rationalizing its platforms and data center, as well as led to benefits in their enterprise resource planning (ERP) implementation. We’ll also learn how virtualizing mission-critical applications formed a foundation for improved disaster recovery (DR) best practices.

Stay with us now to learn more about how better systems make for better applications that deliver better business results with Michael Leeper, Senior Manager of IT Engineering at Columbia Sportswear, and Suzan Frye, Manager of Systems Engineering at Columbia Sportswear, in Portland, Oregon. The discussion is moderated by BriefingsDirect's Dana Gardner, Principal Analyst at Interarbor Solutions. [Disclosure: VMware is a sponsor of BriefingsDirect podcasts.]

Here are some excerpts:
Gardner: Tell me a little bit about how you got into virtualization. What were some of the requirements that you needed to fulfill at the data center level?

Leeper: Pre-2009, we'd experimented with virtualization. It'd be one of those things that I had my teams working on, mostly so we could tell my boss that we were doing it, but there wasn’t a significant focus on it. It was a nice toy to play with in the corner and it helped us in some small areas, but there were no big wins there.

Columbia Sportswear is the worldwide leader in apparel and accessories. We sell primarily outerwear and sportswear products, and a little bit of footwear, globally. We have about 4,000 employees, 50 some-odd physical locations, not counting retail, around the world. The products are primarily manufactured in Asia with sales distribution happening in both Europe and United States.

My teams out of the U.S. manage our global footprint, and we are the sole source of IT support globally from here.

In mid-2009, the board of directors at Columbia decided that we, as a company, needed a much stronger DR plan. That included the construction of a new data center for us to house our production environments offsite.

As we were working through the requirements of that project with my teams, it became pretty clear for us that virtualization was the way we were going to make that happen. For various reasons, we set off on this path of virtualization for our primary data center, as we were working through issues surrounding multiple data centers and DR processes.

Our technologies weren't based on the physical world any more. We were finding more issues in physical than we were in virtual. So we started down this path to virtualize our entire production world. By that point, mid-2010 had come around, and we were ready to go. We had built our DR stack that virtualized our primary data centers taking us to the 80 percent to 90 percent virtual machine (VM) rate.

Extremely successful


We were extremely successful in that process. We were able to move our primary data center over a couple of weekends with very little downtime to the end users, and that was all built on VMware technology.

About a week after we had finished that project, I got a call from our CIO, who said he had purchased a new ERP system, and Columbia was going to start down the path of a fully new ERP implementation.

I was being asked at that time what platform we should run it on, and we had a clean slate to look everywhere we could to find what our favorite, what we felt was the most safe and stable platform to run the crown jewels of the company which is ERP. For us that was going to be the SAP stack.

So it wasn't a hard decision to virtualize ERP for us. We were 90 percent virtual anyway. That’s what we were good at, and that’s where teams were staffed and skilled at. What we did was design the platform that we felt was going to meet our corporate standards and really meet our goals. For us that was running ERP on VMware.

Gardner: It sounds as if you had a good rationale for moving into a highly virtualized environment, but that then it made it easier for you to do other things.

It wasn't a hard decision to virtualize ERP for us. We were 90 percent virtual anyway.


Leeper: There are a couple of things there. Specifically in the migration to virtualization, we knew we were going to have to go through the effort of moving operating systems from one site to another. We determined that we could do that once on the physical side, relatively easily, and probably the same amount of effort as doing it once by converting physical to virtual.

The problem was that the next time we wanted to move services back from one facility to another in the physical world, we're going to have to do that work again. In the virtual space, we never had to do it again.

To make the teams go through the effort of virtualizing a server to then move it to another data center, we all need to do is do the work once. For my engineers, any time we get them to do the mundane stuff once it's better than doing it multiple times. So we got that effort taken care of in that early phase of the project to virtualize our environments.

For the ERP platform specifically, this was a net new implementation. We were converting from a JD Edwards environment running on IBM big iron to a brand-new SAP stack. We didn’t have anything to migrate. This was really built from scratch.

So we didn’t have to worry about a lot of the legacy configurations or legacy environments that may have been there for us. We got to build it new. And by that point in our journey, virtualized was the only way for us to do it. That’s what we do, it’s how we do it, and that's what we’re good at.

Across the board


Gardner: I saw some statistics that you went from 25 percent to 75 percent virtualization in about eight months, which is really impressive. How did you get the pace and what was important in keeping that pace going?

Frye: The only way we could do it was with virtualization, and using the efficiencies we gained. We centrally manage all of IT and engineering globally out of our headquarters in Portland. When we were given the initial project to move our data center and not only move our data center but provide DR services as well, it was a really easy sell to the business.

We could go to the business and explain to them the benefits of virtualization and what it would mean for their application. They wouldn’t have to rebuild and they wouldn’t have to bring in the vendor or any consultants. We can just take their systems, virtualize them, move them to our new data center, and then provide that automatic DR with Site Recovery Manager (SRM).

We had nine months to move our data center and we basically were all hands on deck, everybody on the server engineering team, storage, and networking teams as well. And we had executive support and sponsorship. It was very easy for us to go to the business market virtualization to the business and start down that path where we were socializing the idea. A lot of people, of course, were dragging their feet a little bit. We all know that story.

Once they realized that we could move their application, bring it back up, and then move it between data centers almost seamlessly, it was an instant win for us.



But once they realized that we could move their application, bring it back up, and then move it between data centers almost seamlessly, it was an instant win for us. We went from that 20 percent to 30 percent virtualization. We had about 75 percent when we were in the middle of our DR project, and today we’re actually at around 93 percent.

I think it surprises people that we have a "virtualize first" strategy today. Now it’s assumed that your system will be virtual and then all the benefits, the flexibility, the portability, the optimization, and the efficiencies that come with it.

But like most companies, we had to start with some of our lower tier or lower service-level agreement (SLA) systems, our development systems, and start working with the business on getting them to understand some of the benefits that they could gain by working with virtual systems.

Performance is there

Again people are always surprised. Will you have SQL virtualized? Do you have SAP virtualized? And the answer is yes, today we do, and the performance is there, the optimization is there, and that flexibility is there.

If you’re just starting out today, my advice would be to go ahead and start small. Give the business what they want, do it right, and give it the resources it needs to have. Don’t under-promise, over-deliver, and let the business start seeing the efficiencies that they can realize, and some of those hidden efficiencies as well.

We can support DR testing. We can support almost instant data refreshes, cloning, and snapping, so their upgrades are more seamless, and they have an easier back-out plan.

From an engineering and development perspective, we're giving them technologies that they could only dream of four or five years ago. And it’s really benefited the business in that we’re auto-provisioning. We’re provisioning in minutes versus days. We’re granting resources when needed.

It’s a more dynamic process for the business, and we’re really seeing that people are saying, "You’re not just a cost center anymore. You’re enabling us, you’re helping us to do what we need to do and basically doing it on-demand." So our team has really started shining these last few years, especially because of our high virtualization percentage.

If you set off trying to truly attack an entire data center virtualization project, you’re probably not going to be really successful at it



Leeper: For a company that's looking to move to this virtualization space, they’ve got to get some wins. You’ve got to tackle some environments or some projects that you can be successful at, and hopefully by partnering with some business users and business owners who are willing to take a little bit of a chance.

If you set off trying to truly attack an entire data center virtualization project, you’re probably not going to be really successful at it. There are a lot of ways that the business, application vendors, and various things can throw some roadblocks in this.

Once you start chipping away at a couple of them and get above the easy stuff, go find one that maybe on paper is a little difficult, but go get that one done. Then you can very quickly point back to success on that piece and start working your way through the rest of them.

Frye: As we were rolling out on some of our Tier 1 mission-critical applications, it was decided by the business that they wanted to test DR. They were going down the path of doing that the old-fashioned way by backing up databases, restoring databases, and taking weeks to do that, days and weeks.

We said, "We think we have a better way with SRM and our replication technologies. We have that data here. Why don't you let us clone that data and stand it up for you?" Literally, within 10 seconds, they had a replica of their data.

So we were enabling them to do their DR testing with SRM, on demand, when they wanted to do that, as well as giving them the benefit of doing the faster cloning and data refreshes. That was just a day-to-day, operational activity that they had no idea we could do for them.

It goes back to working with business and letting them know what you can do. From a day-to-day, practical perspective that was one of our biggest wins.



It goes back to working with business and letting them know what you can do. From a day-to-day, practical perspective that was one of our biggest wins. It's going to specific business units and application owners and saying, "We think we have a better way. What do you think about this?" Once they got their hands on it, just looking at their faces was really a good moment for us.

Gardner: Where do you go next with your virtualization payoff?

Private cloud

Leeper: We consider ourselves having up a private cloud on-site. My team will probably start laughing at me for using that term, but we do believe we have a very flexible and dynamic environment to deploy, based on business request on premises, and we're pretty proud of that. It works pretty well for us.

Where we go next is all over the place. One of the things we're pretty happy about is the fact that we can think about things a little differently now than probably a lot of our peers, because of how migratory our workloads can be, given the virtualization.

We started looking into things like hybrid cloud approaches and the idea of maybe moving some of our workloads out of our premises, our own data facilities, to a cloud provider somewhere else.

For us, that's not necessarily the discussion around the classic public cloud strategies for scalability and some of those things. For us, it's a temporary space at times, if we are, say, moving an office, we want to be able to provide zero downtime, and we have physical equipment on-premises.

It would be nice to be able to shutdown their physical equipment, move their data, move their workloads up to a temporary spot for four or five weeks, and then bring it back at some point, and let users never see an outage while they are working from home or on the road.

There are some interesting scenarios around DR for us and locations where we don't have real-time DR set up.



There are some interesting scenarios around significant DR for us and locations where we don't have real-time DR set up. For instance, we were looking into some issues in Japan, when Japan unfortunately a year or so ago was dealing with the earthquake and the tsunami fallout in power.

We were looking at how we can possibly move our data out of the country for a period of time, while the infrastructure was stabilizing, specifically power, and then maybe bring it back when things settle down again.

Unfortunately we weren't quite virtual on the edge yet there, but today we think that's something we could do. Thinking about how and where we move data to be at the right place at the right time is where we think the next big win for us.

Then, we get into the application profiles that users are asking for and their ability to spin up environments very quickly to just test something. It lets us get out of having IT as being the roadblock to innovation. A lot of times the business or part of our innovation teams come up with some idea on a concept, an application, or whatever it is. They don't have to wait for IT to fulfill their needs. The environments are right there for them.

So I challenge the teams routinely to think a little bit differently about how we've done things in the past, because our architecture is dramatically different than it was even two years ago.
Listen to the podcast. Find it on iTunes/iPod. Read a full transcript or download a copy. Sponsor: VMware.

You may also be interested in:

Wednesday, August 15, 2012

ServiceMesh Agility Platform 8.0 aims to help enterprises rein in 'shadow IT' and enforce governance over rogue cloud usage

Cloud management and services orchestration platform provider ServiceMesh recently delivered Agility Platform 8.0, a major upgrade with features to help better govern and manage private, public, and hybrid cloud usage.

The platform provides Global 2000 enterprises with a consolidated platform for the consistent management, governance, orchestration and delivery of cloud applications, platforms and services. The control over application services -- without squelching the innovation of self-provisioned benefits -- has become acute for many organizations. Managing services by each cloud, SaaS provider or on-premises platform is complex, expensive and unwieldy.

And so ServiceMesh has identified the governance and policy-enabled orchestration of ecosystem-wide services as a crucial, burgeoning requirement for agile businesses, said Chairman and CEO Eric Pulier. "This is a policy-centric approach ... You need to gain a holistic view of applications," he said.

Agility Platform 8.0, which is delivered as an on-premises virtual appliance, allows companies to leverage services in on-demand, self-service IT service management (ITSM) operating model. The platform remains independent of the cloud or enterprise applications and services. APIs are available for developers so that new services can leverage Agility right away, even as it supports legacy and existing hybrid-delivered services, said Pulier.

The result is to compress IT service delivery times, lower IT operating costs, and increase investments in IT innovation, said ServiceMesh, a venture-backed start-up in Santa Monica, CA. Commonwealth Bank of Australia is using ServiceMesh to improve its services management.

ServiceMesh has a bold vision of enterprise agility via holistic services orchestration capabilities that manage both on-premises and cloud-based services, with automation of service lifecycles through policy-based definitions and enforcement.

Enterprise customers today are clearly seeking solutions to the dual challenges of making their current IT organizations more responsive to business change, while also ensuring that business users will not get around internal IT resource constraints and delays by selecting an unauthorized external cloud provider’s self-service, pay-as-you-go IT resources. So-called shadow IT deployment of services muddies the water, especially around control and security. BYOD is another complicating factor for more and more organizations.

What's more, governance, risk and compliance (GRC) requirements are also demanding the types of centrally managed solutions from Agility Platform 8.0, said Pulier. Services management policies can vary from department to department, region to region, even as an enterprise wants to standardize on cloud or SaaS applications. Automated orchestration and events processing logic allows for such complexity of services delivery, while banking on the efficiency and cost-savings of consolidated services origins.

Accelerate adoption


T
he ServiceMesh platform allows organizations to accelerate the adoption of cloud services across the enterprise and move business applications into the cloud with complete governance and control, said Pulier. The Agility Platform automates the deployment and management of cloud applications and platforms and ensures the portability of these services throughout their lifecycle, independent of the underlying private, public or hybrid cloud environment.

I have certainly seen many ways emerge in the market to try and solve the services management complexity equation, and they vary from VDI, to app stores, to SOA registries, to SOA ESBs, to PPM and extended configuration management databases (CMDBs).

Pulier says the ServiceMesh architected platform provides "a better source of truth" than these other approaches about services across their full lifecycle, and across vast IT infrastructure heterogeneity. "It's more than a catalog, and federates back to the CMDB and other management capabilities," he said.

"You need a holistic view of the problem, and to provide a platform for the business, not just the IT department," he said. This approach "creates infrastructure- and cloud-independent applications management," said Pulier.

ServiceMesh is targeting its platform at both enterprises and cloud services providers. Expect more news on the channel at VMworld later this month. While the ServiceMesh platform is on-premises now, it may also be deployed at the cloud provider layer, and many of its capabilities can also be delivered as a service.

More specifically, Agility Platform 8.0 leverages an extensible policy engine that enables the creation and enforcement of an unlimited range of custom policies. Among the features ServiceMesh offers are:
  • Wizard-based capabilities to discover and automatically import existing virtual machines (VMs) deployed from other third-party provisioning tools in either private or public cloud environments. Upon VM import, the platform enforces user specified policies on those VMs to ensure the desired governance, security and control. VMs can then be published through a service catalog.
  • Capabilities to monitor cloud-provider performance and adherence to SLAs, and to compare different cloud services, measuring a range of different cloud-provider operational parameters, such as average VM provisioning time, number of failed or degraded instances, maximum number of concurrent provisioning requests executed and others.

    More enterprises are realizing that they must evolve toward a self-service, on-demand IT operating model to increase their ability to innovate and address new market opportunities quickly.


  • Support for hybrid cloud strategies by enabling workload portability across a broad range of heterogeneous private and public cloud technologies. The latest release extends these capabilities with support for Microsoft System Center Virtual Machine Manager 2012 and Microsoft Hyper-V.
  • Improved extensible policy-based governance controls with new policy types to govern the sharing of pay-as-you-go IT resources within large corporate settings, including new options to control IT resource scheduling, sharing, leasing and chargeback.
  • A cloud-native architecture that dynamically scales to meet system demand, using only the amount of resources needed to rapidly execute provisioning requests, orchestrate auto-scaling operations, and perform other management functions.
You may also be interested in:

Tuesday, August 14, 2012

Raf Los: Extend IT security to its mature state of enterprise resiliency

Listen to the podcast. Find it on iTunes/iPod. Read a full transcript or download a copy. Sponsor: HP.

This edition of the HP Discover Performance podcast series examines the future of security in the enterprise. As the need to protect and manage assets extends beyond the four walls of the organization -- with the adoption of cloud and mobile -- how should the thinking about traditional security adjust?

I recently had a chance to sit down with Raf Los of HP Software to gather his perspective on the answer to that question. Los has an interesting personal perspective on the concept of “enterprise resiliency,” which I initially heard about through his blog, Following the White Rabbit. He's also on Twitter at @wh1t3rabbit. [Disclosure: HP is a sponsor of BriefingsDirect podcasts.]

Join me as we unpack with Los the concept of enterprise resiliency as the natural maturity direction for IT security in general.

Here are some excerpts from our discussion:
Gardner: Tell me a little bit about your vision. We all understand security and why it’s important, but you've developed, I think, an expanded category for security.

Los: Security, over the years, has evolved from an absolute concept of a binary decision: is it secure or is it not? As we move forward, I believe very strongly that what we’re evolving into is, as we’ve heard people talk about, risk management.

Risk management starts to include things that are beyond the security borders. As I talked to customers out here, I was having an "aha" moment. A little while ago, at one of our converged cloud chats, we were talking about how things fail. Everything fails at some point, and chaos takes over.

So rather than talking about security, which is a set of absolutes or a concrete topic, and boxing ourselves into threats from a security perspective, the evolution of that goes into enterprise resiliency. What that means is that it’s a combination of recoverability, security, performance, and all the other things that bring together a well-oiled business that can let you take a shot to the gut, get back up, and keep going.

A lot of the CISOs nowadays are set up to fail by their organizations. It’s a non-winning position, because you're put into a position where the board of directors, if you’re lucky, or your CTO or your CIO asks, "How much money do you need to secure this organization?"

That's horrible, and no matter what you say, you lose. If you say nothing, you lose. If you have $10 million, a billion dollars, there's no amount of money you can spend to make your company completely secure.

Acceptable risk

So what are you aiming for? You're aiming for a level of acceptable risk. Well, acceptable risk of what and how and how much you’re aiming for. It’s not just acceptable risk. We’re looking at acceptable risk from a security perspective, but we need to incorporate the fact that we're going to get owned.

We need to get out of our ivory towers and we need to start thinking about the fact that attacks happen and insiders happen. There are things that are going to transpire that are beyond our control and things that we cannot plan for.

Technology will fail. People and processes will fail. Our own technologies, our own minds will fail us. Our best friends will fail us. People get tempted. This is a human nature that the weakest element will always be a human being, and there's no patch for that.

So how do we move and get back to business as usual? How we get back to being a resilient business. That’s a cool concept -- that I have enterprise resiliency.

Gardner: This makes great sense to me, because we’ve been talking, over the past several years, about how security needs to be applied to different parts of the organization holistically and needs to be thought of in advance, be built in, and become part of a lifecycle.

We need to get out of our ivory towers and we need to start thinking about the fact that attacks happen and insiders happen.



But it makes double sense to me to expand the purview of security. It really is in making sure that there's performance resiliency, failover resiliency, backup and recovery resiliency, and data backup and duplication resiliency. So why not look at it through the resiliency lens? It makes a great deal of sense.

Los: Absolutely, and that’s exactly where this is coming from. I’ve actually given a series of talks and called it the introduction of Chief Chaos Officer. It’s not an actual role you’re going to see on monster.com, but it’s just a concept. It’s kind of like the aging Killcraft, a Chaos Monkey thing from Netflix.

Can you, as an organization, get comfortable with the fact that things will fail? In the talk that I gave, it comes from the perspective of you’ve got a lot of great security technology. You've probably got full disk encryption. You back up. You have firewalls, redundant networks, and all these things that you do.

You have procedures that you’re supposed to follow in the red book, a big red binder that sits on your incident response handler's desk, and you have all these things that are supposed to be followed.

Your people are trained, and your developers are supposedly writing better source code. These are all things that we can test through penetration testing, which means on Sunday between 7:00 p.m. and Monday 3:00 a.m. on the following four IPs, but only when we’re ready.

No patch for the human

A
nd it’s like, okay, we've tested ourselves, we’re confident that we’re secure. I'm making kind of a scrunchy face, because that’s not really what this means. I've worked with folks who are red-team testers. I've yet to meet a red team that's failed, because, as I said, there's no patch for the human.

When you can’t penetrate a system or an organization via a new Zero-day, you'll walk in through the front door by walking and carrying flowers from the CEO's wife or something, and you'll own the organization that way.

But the question isn’t whether you'll be owned or not. What happens next is the big question, and it encompasses things like how good is your PR strategy. Do you have all the legal pieces in place? When your backup system fails or your entire data center gets wiped out by Hurricane Katrina, in a worst-case scenario, do you just sort of throw up your hands and go, "Well, that stinks? Well, we were in the cloud." Oh, your cloud just got wiped out. Now what?

Gardner: I've been speaking with a number of folks lately who hold the opinion that, at least for small-to-medium sized businesses (SMBs), going to the cloud can improve their security and resiliency sufficiently to make it a no-brainer. For enterprises, it might be a longer haul and there might be more complications and issues to manage.

Do you agree with that that the SMB can outsource some of this resiliency to the cloud provider who needs to do it and has the resources and experience to do it better than the SMBs do?

There's certainly a ton of benefit to be gained from going to a shared model like a cloud.



Los: There's a number of SMBs that can greatly benefit from the fact that good security talent is expensive and good security talent that can actually work towards a more resilient, more secure enterprise is very difficult to come by. It’s becoming scarce.

So small companies do the best they can with what they have their hands on. And there's certainly a ton of benefit to be gained from going to a shared model like a cloud. Does it raise the bar for everybody? I can’t say yes. On the whole, do I believe it raises the bar? Absolutely. Let's take the angle of threat intelligence.

I'm a small entity with five IP addresses on the Internet. How do I know what bad guys look like? If I have my five IP addresses in a public cloud some place, that public cloud is attacked billions of times a day and probably subscribes to numerous threat-intelligence services. They know exactly what to look for. And if they don’t, they can find out pretty quickly. They probably have a ton of resources from the security perspective.

Do I think it’s better? Absolutely. SMBs have a lot to gain by taking that step. You have to be intelligent about it. You can’t just say, "I'm going to move to the cloud and I'll be secure." Let’s be realistic about it. Get a partner that will get you there. Do due diligence on the partner that you’re choosing to work with. You still can’t run into the water with your eyes closed, but I think there's a lot of benefit to be had, absolutely.

Gardner: And as we’re learning more about the HP Converged Cloud, it’s a cloud of clouds. You have hybrid delivery. You might have a variety of sources for applications and services. You might have data in a variety of sources across a variety of organizations, running from on-premises to managed hosting to multiple cloud and SaaS providers.

Is there a way that, in addition to the security that's going on within those organizations, you can add more security at that converged cloud layer, particularly when you’re converging network storage, workload provisioning, governance, and so forth. What’s the add-on value that the HP Converged Cloud can bring resiliency-wise?

Choice, consistency, confidence

Los: Our Converged Cloud strategy focuses on three very simple words: choice, consistency, and confidence. We’re focusing on consistency and confidence here and perhaps a little bit of choice as well.

What we’re saying is that because we focus on OpenStack, because we’ve chosen to build our platform completely on OpenStack, because we’re building across a single model, a single way of operating, as [HP CEO] Meg Whitman said at Discover in June. You can build a single security operating model and you'll be able to implement it across your private, public, and hybrid models.

I don’t think it’s realistic to say every company will have a public cloud-only presence, just as I don’t think it’s realistic to say companies won’t have a public cloud presence. Most organizations will be a combination of on-premise IT, private cloud, virtual private cloud, and public cloud, all of that somehow sharing space and workload, bursting out to each other when necessary.

As I said systems fail, clouds fail, everything fails. So when we think about, and we’ve had this on our converged cloud chat, when things fail, you have to start architecting for failure and resiliency.

Because of this architecture that we’ve had, if you choose to get one other partner to back up what you have with us, pick a partner that's got the same OpenStack platform and the same models. It’s not going to be hard. There are lots of them out there.

This saves on manpower, on cost, and on having to redevelop the security wheel over and over and over again.



OpenStack is a big platform. You should be able to build once, package once, deploy many times. This saves on manpower, on cost, and on having to redevelop the security wheel over and over and over again. That provides unbelievable amounts of flexibility of what you can do with your enterprise.

When one cloud or a connectivity to one cloud fails, or maybe not fails, but you get attacked in one position, you can bring up other capacity to compensate for that. That's where the true value of cloud comes in. It’s elastic computing. It’s not a marketing buzzword.

Gardner: We’re seeing a lot of highly virtualized environments. We’re talking about virtualized server instances, workloads, and network, storage. Disaster recovery (DR) technologies have evolved to the point where we're mirroring and moving entire data centers virtually from one location to another, if there's a resiliency issue like a natural disaster or a security or cyber attack that impacts an electric grid or something along those lines.

Is there a sort of a tipping point that we’re at, when it comes to higher levels of virtualization, some of the DR speeds, working with de-duplication and reducing the amount that needs to be moved in these instances, that gives us this higher level of security, simply because of the mobility in which we can now exercise for vast amounts of data and applications?

One thing that everybody needs to think about is what is this doing for our bandwidth requirements. Bandwidth is a silent thing nobody really thinks about.



Los: I believe so. Do I have an answer for that that’s clear and crisp? No, I don’t know, and I saw a lot of that fantastic stuff. One of the things that caught my attention is we’ve broken the 100-terabyte-an-hour backup barrier. That blows my mind. I used to work in IT when we were lucky to get 100 gigs an hour and I remember 100 megabytes an hour being a challenge on those giant DLT tapes sometimes over networks.

The idea that we can take an entire cloud and because of data de-duplication, because of the way we move workloads and policies all in one fell swoop, and the way we package things once and move them, as a model, rather than everything together, moving metadata rather than the actual data, it gives us the ability to move things.

One thing that everybody needs to think about is what is this doing for our bandwidth requirements. Bandwidth is a silent thing nobody really thinks about. I've had this discussion with our networking folks. People are building clouds all over the place now and that's great, but it’s really easy to get out to a vendor, to get out to a public cloud or whatever, amass an absolute metric ton of data, and then say, "I want to move." How are you going to take your data from there to there? That’s a big question.

You need to do your homework ahead of time, make sure you know what you’re getting into, and make sure you know what technologies are being supported.
Listen to the podcast. Find it on iTunes/iPod. Read a full transcript or download a copy. Sponsor: HP.

You may also be interested in:

Monday, August 13, 2012

Ocean Observatories Initiative: Cloud and Big Data come together to give scientists unprecedented access to essential climate insights

Listen to the podcast. Find it on iTunes/iPod. Read a full transcript or download a copy. Sponsor: VMware.

A fascinating global ocean studies initiative helps best define some of the IT superlatives around big data, cloud computing, and middleware integration capabilities.

The Ocean Observatories Initiative (OOI) and its accompanying Cyberinfrastructure Program aims to provide an unprecedented ability to study the Earth's oceans and climate using myriad distributed data centers and literally oceans' worth of data.

The scale and impact of the science's importance is closely followed by the magnitude of the computer science needed to make that data accessible and actionable by scientists. In a sense, the OOI and its infrastructure program, a major undertaking by the National Science Foundation, are constructing a big data-scale programmable and integratable cloud fabric for oceanography.

We’ve gathered three leaders to explain the OOI and how the Cyberinfrastructure Program may not only solve this set of data and compute problems, but perhaps establish a path to how future massive data and analysis problems are solved.

Here to share their story on OOI are:
  • Matthew Arrott, Project Manager at the OOI Cyberinfrastructure. Matthew's career spans more than 20 years in design leadership and engineering management for software and network systems. He’s held leadership positions at Currenex, DreamWorks SKG, Autodesk, and the National Center for Supercomputing Applications. His most recent work has been with the University of California as e-Science Program Manager while focusing on delivering the OOI Cyberinfrastructure capabilities.
  • Michael Meisinger, Managing Systems Architect for the Ocean Observatories Initiative Cyberinfrastructure. Since 2007, Michael has been employed by the University of California, San Diego. He leads a team of systems architects on the OOI Project. Prior to UC San Diego, Michael was a lead developer in an Internet startup, developing a platform for automated customer interactions and data analysis. Michael holds a master's degree in computer science from the Technical University of Munich and will soon complete a PhD in formal services-oriented computing and distributed systems architecture.
The discussion is moderated by BriefingsDirect's Dana Gardner, Principal Analyst at Interarbor Solutions.

Here are some excerpts:
Meisinger: The Ocean Observatories Initiative is a large, US National Science Foundation project intended to build a platform for ocean sciences with an operational life span of 30 years.

It comprises a construction period of five years and will integrate a large number of resources and assets. These range from typical oceanographic assets, like instruments that are mounted on buoys deployed in the ocean, to networking infrastructure on the cyberinfrastructure side. It also includes a large number of sophisticated software systems.

I'm the managing architect for the cyberinfrastructure, so I'm primarily concerned with the interfaces through the oceanographic infrastructure, including beta interfaces, networking interfaces, and then primarily, the design of the system that is the network hardware and software system that comprises the cyberinfrastructure.

OOI’s goals include serving the science and education communities with their needs for receiving, analyzing, and manipulating ocean sciences and environmental data. This will have a large impact on the science community and the overall public, as a whole, because ocean sciences data is very important in understanding the changes and processes of the earth, the environment, and the climate as a whole.

Ocean sciences, as a discipline, hasn't yet received as much infrastructure and central attention as other communities. So the OOI initiative is a very important to bring this to the community. It has an almost $400 million construction budget, and an annual operations budget of $70 million for a planned lifetime of 25 to 30 years.

Gardner: What are the big hurdles here in terms of a compute requirements? What makes this so challenging?

Arrott: It has a number of key aspects that we had to address. It's best to start at the top of the functional requirements, which is to provide interactive mission planning and control of the overall instrumentation on the 65 independent platforms that are deployed throughout the ocean.

The issue there is how to provide a standard command-and-control infrastructure over a core set of 800 instruments, about 50 different classes of instrumentation, as well as be able to deploy -- over the 30-year lifecycle -- new instrumentation brought to us by different scientific communities for experimentation.

The next is that the mission planning and control is meant to be interactive and respond to emergent changes. So we needed an event-response infrastructure that allowed us to operate on scales from microseconds to hours in being able to detect and respond to the changes. We needed an ability to move computing throughout the network to deal with the different latency requirements that were needed for the event-response analysis.

Finally, we have computational nodes all the way down in the ocean, as well as on the shore stations, that are accepting or acquiring the data coming off the network. And we're distributing that data in real time to any one who wants to listen to the signals to develop their own sense-and-response mechanisms, whether they're in the cloud, in their local institutions, or on their laptop.

Domain of control

The fundamental challenge was the ability to create a domain of control over instrumentation that is deployed by operators and for processing and data distribution to be agile in its deployment anywhere in the global network.

Gardner: Why is this a good time to try to solve this from a software distribution and data distribution perspective?

Richardson: It's the scale that's changed the architecture and deployment patterns that people have been using for these applications.

We can see the OOI project is essentially bringing the science needed to collaborate between vast numbers of sensors and signals and a comparatively smaller number of scientists, research institutions, and scientific applications to do analytics in a similar way as to how Facebook combines what people say, what pictures they post, what music they listen to with everybody’s friends, and then allow an application to be attached to that.

So it’s a huge technology challenge that would have been simply infeasible 12 years ago in the year 2000, when we thought things were big, but they were not. Now, when we talk about big data being masses of terabytes and petabytes that need to be analyzed all the time, then we’re starting to glimpse what's possible with the technology that’s been created in the last 10 years.

It’s a huge technology challenge that would have been simply infeasible 12 years ago.



If we had been talking about this 12 years ago, in the year 2000, we would have been talking about companies like Google and Yahoo, which we would not have considered to be of moderate scale.

Since then, many companies have appeared. For example, Facebook, which has many hundreds of millions of users connecting throughout the world, shares vast amounts of data all the time.

In addition to that, many of these companies have brought out essentially a platform capability, whereby others, such as Zynga, in the case of Facebook, can create applications that run inside these networks -- social networks in the case of Facebook.

Arrott: The challenge goes beyond just the big data challenge. It also now introduces, as Alexis said, the concept of the instrument as an equal partner with the human in the participation in the network.

So you now have to think about what it means to have a device that’s acting like a human in the network, and the notion that the instrument is, in fact, owned by someone and must be governed by someone, which is not the case with the human, because the human governs themselves. So it represents the notion of an autonomous agent in the network, as well as that agent having a notion of control that has to stay on the network.

Gardner: I’d like to try to explain for our audience a bit more about what is going on here. We understand that we have a tremendous diversity of sensors gathering in real-time a tremendous scale of data. But we’re also talking about automating the gathering and distribution of that data to a variety of applications.

Numerical framework

We’re talking about having applications within this fabric, so that the output is not necessarily data, but is a computational numerical framework that’s then distributed. So there's a lot of data, a lot of logic, and a lot of scale. Can one of you help step me through it all a bit more to understand the architecture of what’s being conducted here?

Meisinger: The challenge, as you mentioned, is very heterogeneous. We deal with various classes of sensors, classes of data, classes of users, or even communities of users, and with classes of technological problems and solution spaces.

So the architecture is based on a tiered model or in a layered model of most invariant things at the bottom, things that shouldn’t change over the lifetime of 30 years to serve the highest level of attention.

Then, we go into our more specialized layered architecture where we try to find optimal solutions using today’s technologies for high-speed messaging, big data, and so on. Then, we go into specialized solutions for specific groups of users and specific sensors that are there as last-mile technologies to integrate them into the system.

Then as you go towards the core, you approach the invariants of the system.



So you basically see an onion layer model of the architecture, externalization outside. Then as you go toward the core, you approach the invariants of the system.

This architecture is based on defining a common interaction format. It’s based on defining a common data format. Our architecture is strongly communication-oriented, service-oriented, message-oriented, and federated.

As Matthew mentioned, it’s an important means to have the individual resources, agents, provide their own policies, not having a central bottleneck in the system or central governing entity in the system that defines policies.

Strongly federated


Arrott: Think of it as its four core layers. There is the underlying network resource management layer. We talk about agents. They supply that capability to any process in the system, and we create devices that process.

The next layer up is the data layer, and the data layer consists of two core parts. One is the distribution system that allows for data to be moved in real-time from the source to the interested parties. It’s fundamentally a publish-subscribe (pub-sub) model. We're currently using point-to-point as well as topic-based subscriptions, but we're quickly moving toward content-based routing, which is more based on the the selector that is provided by the consumer to direct traffic toward them.

The other part of the data layer is the traditional harvesting or retrieval of data from historical repositories.



The other part of the data layer is the traditional harvesting or retrieval of data from historical repositories.

The next layer up is the analytic layer. It looks a lot like the device layer, but is focused on the management of processes that are using the big data and responding to new arrival of data in the network or change in data in the network. Finally, there is the fourth layer, which is the mission planning and control layer, which we’ll talk about later.

Gardner: Alexis, when you saw the problem that needed to be solved here, you had a lot of experience with advanced message queuing protocol (AMQP). Why did this problem seems to be the right fit for that particular technology, RabbitMQ, and a messaging infrastructure in general?

Richardson: What Matthew and Michael have described can be broken down into three fundamental pieces of technology.

Lot of chatter

Number one, you have a lot of chatter coming from these devices -- machines, people, and other kinds of processes -- and that needs to get to the right place. It's being chattered or twittered away and possibly at high rates and high frequencies. It needs to get to just the set of receivers following that stream, very similar to how we understand distribution to our computers. So you need what’s called pub-sub, which is a fundamental technology.

In addition, that data needs to be stored somewhere. People need to go back and audit it, to pull it out of the archive and replay it, or view it again. So you need some form of storage and reliability built into your messaging network.

Finally, you need the ability to attach applications that will be written by autonomous groups, scientists, and other people who don’t necessarily talk to one another, may choose these different programming languages, and may be deploying our applications, as Matthew said, on their own servers, on multiple different clouds that they are choosing through what you would like to be a common platform. So you need this to be done in a standard way.

AMQP is unique in bringing together pub-sub with reliable messaging with standards, so that this can happen. That is precisely why AMQP is important. It's like HTTP and email SMTP, but it’s aimed at messaging the publish-subscribe reliable message delivery in a standard way. And RabbitMQ is one of the first implementations, and that’s how we ended up working with the OOI team -- because RabbitMQ provides these and does it well.

Gardner: I’d also like to go back to the project itself, and give our listeners a sense of what this can accomplish. I’ve heard it described as "the Hubble Telescope of oceans.

It's the notion that we're providing capabilities that do not currently exist for oceanographers.

"

Let’s go back to the oceanography and the climate science. What can we accomplish with this, when this data is delivered in the fashion we’ve been discussing, where the programmability is there, where certain scientists can interact with these sensors and data, ask it to do things, and then get that information back in a format that’s not raw, but is in fact actionable intelligence?

Matthew, what could possibly happen in terms of the change in our understanding of the oceans from this type of undertaking?

Meisinger: The primary mission of our project is to provide this platform, the space telescope in the ocean. And it’s not a single telescope. In our case, it's a set of 65 buoys, locations in the ocean, and even a cable that runs a 1,000 miles at the seafloor of the Pacific Northwest that provides 10 gigabit ethernet connectivity to the instrument, and high power.

The primary mission of our project is to provide this platform, the space telescope in the ocean.



It’s a model where scientists have to compete. They have to compete for a slot on that infrastructure. They'll have to apply for grants and they'll have to reserve the spot, so that they can accomplish the best scientific discoveries out of that system.

It’s kind of the analogy of the space telescope that will bring ocean scientists to the next level. This is our large platform, our large infrastructure that have the best scientists develop and research to best results. That’s the fascination that I see as part of this project.

Arrott: The way to think about this can be summed up as continual presence in the oceans at multiple scales through multiple perspectives.

The scope of the OOI is such that it is considered to be observing the ocean at multiple scales -- coastal, regional, and global. It is an expandable model. One of the largest classes of applications that we’ll attach to the network are the modeling, in particular the nowcast and forecast modeling.

Happening at scale

Once you have that ability to actually model the oceans and predict where it’s going, you can use that to refocus the instrumentation on emergent events. It's this ability to have long-term presence in the ocean, and the ability to refocus the instrumentation on emergent events, that really represents the revolutionary change in the formation of this infrastructure.

Gardner: Is this in some ways taking the weather of the oceans?

Arrott: There's a movement to instrument the Earth, so that we can understand from observation, as opposed to speculation, what the Earth is actually doing, and from a notion of climate and climate change, what we might be doing to the Earth as participants on it.

The weather community, because of the demand for commercial need for that weather data, has been well in advance of the other environmental sciences in this regard. What you'll find is that OOI is just one of several ongoing initiatives to do exactly what weather has done.

Science more mature


Gardner: How is it that cloud computing is being brought to bear, making this productive, and perhaps even ahead of where the whole weather and predicting weather has been?

Richardson: Happily, that’s an easy one. Imagine if a person or scientist wanted to process very quickly a large amount of data that’s come from the oceans to build a picture of the climate, the ocean, or anything to do with the coastal proprieties of the North American coast. They might need to borrow 10,000 or 20,000 machines for an hour, and they might need to have a vast amount of data readily accessible to those machines.

In the cloud, you can do that, and with big data technologies today, that is a realistic proposition. It was not five to 10 years ago. It’s that simple.

Obviously, you need to have the technologies, like this messaging that we talked about, to get that data to those machines so they can be processed. But, the cloud is really there to bring it altogether and to make it seem to the application owner like something that’s just ready for them to acquire it, and when they don’t need it anymore, they can put it back and someone else can use it.

Its common execution infrastructure subsystem is built in order to enable this access to computation and big data very quickly.



Gardner: How are cloud models enabling this at an unprecedented scale, but also at an efficient cost?

Meisinger: It does enable computing at unprecedented scale. A lot of the earth's environment is changing. Assume that you’re interested in tracking the effect of a hurricane somewhere in the ocean and you’re interested in computing a very complex numerical model that provides certain predictions about currents and other variables of the ocean. You want to do that when the hurricane occurs and you want to do it quickly. Part of the strategy is to enable quick computation on demand.

The OOI architecture, in particular, its common execution infrastructure subsystem, is built in order to enable this access to computation and big data very quickly. You want to be able to make use of execution provider’s infrastructure as a service very quickly to run your own models with the infrastructure that the OOI provides.

Then, there are other users that want to do things more regularly, and they might have their own hardware. They might run their own clusters, but in order to be interoperable, and in order to have excess overflow capabilities, it’s very important to have cloud infrastructure as a means of making the system more homogenous.

So the cloud is a way of abstracting compute resources of the various participants of the system, be they commercial or academic cloud computing providers or institutions that provide their own clusters as cloud systems, and they all form a large compute network, a compute fabric, so that they can run the computation in a predictable way, but also then in a very episodic way.

Cloud as enabler


I really see that the cloud paradigm is one of the enablers of doing this very efficiently, and it enables us as a software infrastructure project to develop the systems, the architecture, to actually manage this computation from a system’s point of view in a central way.

Gardner: Alexis, because of AMQP and the VMware cloud application platform, it seems to me that you’ve been able to shop around for cloud resources, using the marketplace, because you’ve allowed for interoperability among and between platforms, applications, tools, and frameworks.

Is it the case that leveraging AMQP has given you the opportunity to go to where the compute resources are available at the lowest cost when that’s in your best interest?

Richardson: The dividend of interoperability for the end-user and the end-customer in this platform environment is ultimately portability -- portability through being able to choose where your application will run.

Michael described it very well. A hurricane is coming. Do you want to use the machines provided by the cloud provider here for this price? Do you want to use your own servers? Maybe your neighboring data center has servers available to you, provided those are visible and provided there is this fundamental interoperability through cloud platforms of the type that we are investing in. Then, you will be able to have that choice. And that lets you make these decisions in a way that you could not do before.

Providing a strong platform or a strong technological footprint that’s not specific to any technology is a great benefit to the community out there.



Gardner: It’s been mentioned by Alexis and others that this has some common features to Twitter or Facebook.

We think of the social environment because of the scale, complexity, and the use of cloud models. But we’re doing far more advanced computational activities here. This is simply not a display of 140 characters, based on a very rudimentary search, for example. These are at the high performance computing (HPC) level, supercomputer-level types of requests and analysis.

So are we combining the best of a social fabric approach and the architecture behind that to what we’ve been traditionally exposed to in high-performance computing and supercomputing, and what does that mean for the future?

Meisinger: This is the direction in which the future will evolve, and it’s the combination of proven patterns of interaction that are emerging out of how humans interact applied to high-performance computing. Providing a strong platform or a strong technological footprint that’s not specific to any technology is a great benefit to the community out there.

Providing a reference architecture and a reference implementation that can solve these problems, that social network for sensor networks and for device computation will be a pattern that can be leveraged by other interested participants, either by participating in the system directly or indirectly, where it’s just taking that pattern and the technologies that come with it and basically bringing it to the next level in the future. Developing it as one large project in a coherent set really yields a technology stack and architecture that will carry us far into the future.

Arrott: With all the incremental change that we're introducing is taking the concepts of Facebook and of Twitter and the notions of Dropbox, which is the ability to move a file to a shared place so someone else can pick it up later, which was really not possible long ago. I had to do an FTP server, put up an HTTP server to accomplish that.

Sharing processes

W
hat we are now adding to the mix is not sharing just artifacts, but we’re actually sharing processes with one another, and then specifically sharing instrumentation. I can say to you, "Here, have a look through my telescope." You can move it around and focus it.

Basically, we introduced the concept of artifacts or information resources, as well as the concept of a taskable resource, and the thing that we’re adding to that which can be shared are taskable resources.

Meisinger: This pattern is very applicable, and it’s not that frequent that a research and construction project of that size has an ability to provide an end-to-end technology solution to this challenge of big data combined with real-time analysis and real-time command and control of the infrastructure.

What I see that’s evolving into is, first of all, you can take the solutions build in this project and apply it to other communities that are in need for such a solution. But then it could go further. Why not combine these communities into a larger system? Why not federate or connect all these communities into a larger infrastructure that all is based on common ideas, common standards, and that still enables open participation?

It’s a platform where you can plug in your own system or subsystem that you can then make available to whoever is connected to that platform, whoever you trust. So it can evolve into a large ecosystem, and that does not have to happen under the umbrella of one organization such as OOI.

Larger ecosystem

I
t can happen to a larger ecosystem of connected computing based on your own policies, your own technologies, your own standards, but where everyone shares a common piece of the same idea and can take whatever they want and not consume what they’re not interested in.
Listen to the podcast. Find it on iTunes/iPod. Read a full transcript or download a copy. Sponsor: VMware.

You may also be interested in: