Tuesday, December 16, 2008

MapReduce-scale analytics change BI game as enterprises need to mine ever-expanding data sets

Listen to the podcast. Download the podcast. Find it on iTunes/iPod. Learn more. Sponsor: Greenplum.

Read a full transcript of the discussion.

Internet-scale data sets and Web-scale analytics have placed a different set of requirements on software infrastructure and data processing techniques. More types of companies and organizations are seeking new inferences and insights across a variety of massive data sets -- some into the petabyte scale.

How can all this data be shifted and analyzed quickly, and how can we deliver the results to an inclusive class of business-focused users? Following the lead of such Web-scale innovators as Google, and through the leveraging of powerful performance characteristics of parallel computing on top of industry-standard hardware, such companies as Greenplum are now focusing on how MapReduce approaches are changing business intelligence (BI) and the data-management game.

BI has become a killer application over the past few years, and we're now extending that beyond enterprise-class computing into cloud-class computing. The amount of data and content -- and the need for innovative analytics from across the Internet -- is still growing rapidly, even though we have harsh economic times.

To provide an in-depth look at how parallelism, modern data infrastructure, and MapReduce technologies come together in the new age, BriefingsDirect's Dana Gardner recently spoke with Tim O’Reilly, CEO and founder of O’Reilly Media and blogger; Jim Kobielus, senior analyst at Forrester Research, and Scott Yara, president and co-founder at Greenplum.

Here are some excerpts:
Kobielus: A number of things are happening ... and the trend continues to grow. In terms of the data sets, it’s becoming ever more massive for analytics. It’s equivalent to Moore’s Law, in the sense that every several years, the size of the average data warehouse or data mart grows by an order of magnitude.

Why are data warehouses bulking up so rapidly? One key thing is that organizations, especially in tough times when they're trying to cut costs, continue to consolidate a lot of disparate data sets into fewer data centers, onto fewer servers, and into fewer data warehouses that become ever-more important for their BI and advanced analytics.

What we're seeing is that more data warehouses are becoming enterprise data warehouses and are becoming multi-domain and multi-subject. You used to have tactical data marts, one for your customer data, one for your product data, one for your finance data, and so forth. Now, the enterprise data warehouse is becoming the be all and end all -- one hub for all of those sets.

Also, the data warehouse is becoming more than a data warehouse. It's becoming a full-fledged content warehouse, not just structured relational data, but unstructured and semi-structured data -- from XML, from your enterprise content management system, from the Web, from various formats.

O'Reilly: In the first age of computing, business models were dominated by hardware. In the second age, they were dominated by software. What started to happen in the 1990s ... open source started to create new business models around data, and, in particular, around network applications that built huge data sets through user participation. That’s the essence of what I call Web 2.0.

Look at Google. It's a BI company, based on massive data sets, where, first of all, they are spidering all the activity off of the Web, and that’s one layer. Then, they do this detailed analysis of the link structure of that Web, and that’s another layer. Then, they start saying, "Well, what else can we find? They start looking at click stream data. They start looking at browsing history, and where people go afterward. Think of all the data. Then, they deliver service against that.

That’s the essence of Web 2.0, building a massive data set, doing real-time analytics against it, and then figuring out what services you can deliver. What’s happening today is that movement is transferring from the consumer Web into business.

... When we think about where this is going, we first have to understand that everybody is connected all the time via applications, and this is accelerating, for example, via mobile. The need for real-time analytics against massive data sets is universal. ... This is a real frontier of competitive advantage. You look at the way that new technologies are being explored by startups. So many of the advantages are in data.

Yara: We're now entering this new cycle, where companies are going to be defined by their ability to capture and make use of the data and the user contributions that are coming from their customers and community. That is really being able to make parallel computing a reality.

... If you look at running applications on a much cheaper and much more efficient set of commodity systems and consolidating applications through virtualization, that would be a really compelling thing, and we've seen a multi-billion dollar industry born of that.

... We're talking about using parallel computing techniques, open-source software, and commodity hardware. It’s literally a 10- to 100-fold improvement in price performance. When the cost of data analysis comes down 10 to 100 times, that’s when new things become possible.

... Business is now driven by Web 2.0, by the success of Google, and by their own use and actions of the Web realizing how important data is to their own businesses. That’s become a very big driver, because it turns out that parallel computing, combined with commodity hardware, is a very disruptive platform for doing large-scale data analysis. ... Google has become a thought leader in how to do this, and there are a lot of companies creating technologies and models that are emblematic of that.

Kobielus: ... Power users are the ones who are going to do the bulk of the BI and analytics application development in this new paradigm. This will mean that for the traditional high priesthood of data modelers and developers and data mining specialists, more and more of this development will be offloaded from them, so they can do more sophisticated statistical analysis. ... The front office is the actual end user.

O'Reilly: ... The breakthroughs are coming from the ability of people to discern meaning in data. That meaning sometimes is very difficult to extract, but the more data you have, the better you can be at it. ... Getting more tools for handling larger and more complex data sets, and in particular, being able to mix data sets, is critical. ... That fits with this idea of crossing data sets being one of the new competencies that people are going to have to get better at.

Kobielus: Traditionally, data warehouses existed to provide you with perfect hindsight on the customer -- historical data, massive historical data, hopefully on the customer, and that 360 degree view of everything about the customer and everything they have ever done in the past, back to the dawn of recorded time.

Now, it’s coming down to managing that customer relationship and evolving and growing with that relationship. You have to have not so much a past or historical view, but a future view on that customer. You need to know that customer and where they are going better than they know themselves. ... That’s where the killer app of the online recommendation engine becomes critical.

Feed all [possible data and content] into a recommendation engine, which is a predictive-analytics model running inside the data warehouse. That can optimize that customer’s interaction at every touch point. Let’s say they're dealing with a call-center person live. The call-center person knows exactly how the world looks to that customer right now and has a really good sense for what that customer might need now or might need in three month, six months, or a year, in terms of new services or products, because other customers like them are doing similar things.

Yara: ... You're going to see lots of cases where for traditional businesses that are selling services and products to other businesses, the aggregation of data is going to be interesting and relevant. At the same time, you have companies where even the internal analysis of their data is something they haven’t been able to do before.

... These companies actually have access to amazing amounts of information about the customers and businesses. They are saying, "Why can’t we, at the point of interaction -- like eBay, Amazon, or some of these recommended engines -- start to take some of this aggregate information and turn it into improving businesses in the way that the Web companies have done so successfully. That’s going to be true for B2C businesses, as well as for B2B companies.

We're just at the beginning of that. That’s fundamentally what’s so exciting about Greenplum and where we're headed.
Read a full transcript of the discussion.

Listen to the podcast. Download the podcast. Find it on iTunes/iPod. Learn more. Sponsor: Greenplum.

Monday, December 15, 2008

IT systems analytics become more crucial as cloud and SaaS adoption raises complexity bar

Listen to the podcast. Download the podcast. Find it on iTunes/iPod. Learn more. More related podcasts. Sponsor: LogLogic.

Read a full transcript of the discussion.

Software-as-a-service (SaaS) and cloud computing are changing the nature of IT systems' performance requirements and heightening expectations for end users from online applications and services.

Increasingly, an extended level of visibility, management, and performance will apply to those serving up applications as services, regardless of their hosting origins or models. The more the apps and services fulfill a need, the more the users will expect even better results and performance.

In other words, the more these organizations succeed, the more they need to scale, leverage virtualization and cloud infrastructure methods, embark of service oriented architecture (SOA) and then keep all the trains running fast and on time. Using the latest tools and analytics -- the equivalent of business intelligence (BI) for IT -- on the systems and across the gathering complexity becomes essential.

To learn more about how systems log tools and analysis are aiding providers of cloud and SaaS, I recently spoke with fellow blogger Phil Wainewright, an independent analyst and director at Procullux Ventures, and SaaS blogger at ZDNet and ebizQ, as well as with Jian Zhen, senior director of product management at LogLogic.

Here are some excerpts:
One thing that's happening is that the SaaS infrastructure is getting more complicated, because more choice is emerging. In the past people might have gone to one or two SaaS vendors in very isolated environments or isolated use cases. What we're now finding is that people are aggregating different SaaS services. ... We're actually looking at different layers of not just SaaS, but also platform as a service (PaaS), which are customizable applications, rather than the more packaged applications that we saw in the first generation of SaaS. We're seeing more utility and cloud platforms and a whole range of options in between.

That means people are really using different resources and having to keep tabs on all those different resources. Where in the past, all of an IT organizations' resources were under their own control, they now have to operate in this more open environment, where trust and visibility as to what's going on are major factors.

If you're going to take advantage of SaaS properly, then you need to move to more of a SOA internally. That makes it easier to start to aggregate or integrate these different mashups, these different services. At the end of the day, the end users aren't going to be bothered whether the application is delivered from the enhanced data center or from a third-party provider outside the firewall, as long as it works and gives them the business results they're looking for.

You have to worry not only about who is accessing the information within your company firewall, but now you have all this data that's sitting outside of the firewall in another environment. That could be a PaaS, as Phil said, it could be a SaaS, an application that's sitting out there. How do you control that access? How do you monitor that access. That's one of the key issues that IT has to worry about.

Obviously, there are data governance issues and activity monitoring issues. Now, from a performance and operational perspective, you have to worry about, are my systems performing, are these applications that I am renting, or platforms or utilities I am renting, are they performing to my spec? How do I ensure that the service providers can give me the SLAs that I need.

... What SaaS providers have been learning is that they need to get better at giving more information to their customers about what is going wrong when the service is not up or the service is not performing as expected. The SaaS industry is still learning about that. So, there is that element on that side.

On the IT side, the IT people have spent too much time worrying about reasons why they didn't want to deal with SaaS or cloud providers. They've been dealing with issues like what if does go down, or how can I trust the security? Yes, it does go down sometimes, but it's up 99.7 percent of the time or 99.9 percent of the time, which is better than most organizations can afford to do with their own services.

Let's shift the emphasis from, "It's broken, so I won't use it," to a more mature attitude, which says, "It will be up most of the time, but when it does break, how do I make sure that I remain accountable, as the IT manager, the IT Director, or the CIO. How do I remain accountable for those services to my organization, and how do I make sure that I can pinpoint the cause of the problem, and get it rectified as quickly as possible?"

One of the great quotes that we recently got from a customer is, "You can outsource responsibility, but not accountability." So, it fits right into what Phil what was saying about being accountable and about your own environment.

The requirement to comply with government regulations and industry mandates really doesn't change all that much, just because of SaaS or because a company is going into the cloud. What it means is that the end users are still responsible for complying with Sarbanes-Oxley (SOX), payment cared industry (PCI) standards, the Health Insurance Portability and Accountability Act (HIPAA), and other regulations. It also means that these customers will also expect the same type of reports that they get out of their own systems.

BI for IT, or IT intelligence, as I have used the term before, is really about getting more information out of the IT infrastructure; whether it's internal IT infrastructure or external IT infrastructure, such as the cloud.

Traditionally, administrators have always used logs as one of the tools to help them analyze and understand the infrastructure, both from a security and operational perspective. For example, one of the recent reports from Price Waterhouse, I believe, says that the number one method for identifying security incidents and operational problems is through logs.

We can provide them that information, both from an internal and external perspective. We work with a lot of service providers, as you know, companies like SAVVIS, VeriSign, Verizon Business Services, to provide the tools for them to analyze service provider infrastructures as well.

A lot of that information can be gathered into a central location, correlated, and presented as business intelligence or business activity monitoring for the IT infrastructure.

Increasingly, it comes back to IT accountability. If your service provider does go down, and if the logs show that the performance was degrading gradually over a period of time, then you should have known that. You should have been doing the analysis over time, so that you were ahead of that curve and were able to challenge the provider before the system went down.

If it's a good provider, which comes back to the question you asked, then the provider should be on top of that before the customer finds out. Increasingly, we'll see the quality of reporting that providers are doing to customers go up dramatically. The best providers will understand that the more visibility and transparency they provide the customers about the quality of service they are delivering, the more confidence and trust their customers will have in that service.
Read a full transcript of the discussion.

Listen to the podcast. Download the podcast. Find it on iTunes/iPod. Learn more. More related podcasts. Sponsor: LogLogic.

Sunday, December 14, 2008

BriefingsDirect analysts handicap large IT vendors on how cloud trend impacts them

Listen to the podcast. Download the podcast. Find it on iTunes/iPod. Charter Sponsor: Active Endpoints.

Read a full transcript of the discussion.

Special offer: Download a free, supported 30-day trial of Active Endpoint's ActiveVOS at www.activevos.com/insight.

Welcome to the latest BriefingsDirect Insights Edition, Vol. 34, a periodic discussion and dissection of software, services, services-oriented-architecture (SOA) and compute cloud-related news and events, with a panel of IT analysts and guests.

In this episode, recorded Nov. 21, our experts focus on the impact that cloud computing will have on the large, established IT vendors. We really are only beginning to understand how the IT services delivery, data management, and economic models of cloud computing will impact the market. If this shift is as large and inevitable as many of us think, the impact on the current IT business landscape will also be large. Some will do well, and some will not. All, I expect, will need to adapt, and the shifts are certainly exacerbated by the deepening global recession.

Please join noted IT industry analysts and experts Jim Kobielus, senior analyst at Forrester Research; Tony Baer, senior analyst at Ovum; Brad Shimmin, principal analyst at Current Analysis, and Joe McKendrick, independent analyst and prolific blogger on ZDNet and ebizQ. Our discussion is produced and moderated by me, Dana Gardner.

Here are some excerpts:
Baer: In terms of who is best positioned for all this, I think it's a little too early to tell, because most of the large vendors are only just starting to put their feet in the water. Obviously, IBM, HP, and Microsoft are making moves. SAP has actually had a couple of stumbles on the way there. Oracle has sort of a sitting-on-the-fence strategy.

If we are going to talk about who has consistently positioned themselves as being the poster child, it has been Marc Benioff over at Salesforce.com, where they have evolved from a customer relationship management (CRM) application that you access on demand to expand towards Platform as a Service (PaaS).

Gardner: Who can get kicked in the teeth by this thing?

Baer: Well, Microsoft clearly could get kicked in the teeth, and that's obviously why they've come out with their resource strategy and with their various live-office strategies. Microsoft clearly has the most to lose, because they've been very identified with the rich client.

Gardner: Yet Microsoft has an opportunity to shoot for the moon. They have all the essential pieces. They have a very difficult transformation to make in terms of their business. They have a lot of cash in the bank, and we're in a transformational period.

I think Microsoft has an opportunity to make an offer that developers can't resist -- and probably no one else is in a position to do it -- which is to say, "We will have at least one of the top three clouds. We're going to give you the tools and give you simplicity that Joe the plumber can develop, and we're going to make sure that you have a huge audience of both consumers and businesses that we're going to line up for you."

McKendrick: They've already made a lot of moves in this direction: Software plus Services, the Live offerings. They're already positioning a lot of their product line. They work with Amazon and have offerings through the Amazon service as well. Microsoft gets into everything. Wherever you look, in the enterprise or in computing, they have some kind of offering there. Sometimes, the things don't take off for a while. They sit and bide their time, and eventually it takes hold.

... Thinking about the Microsoft plus Yahoo, it makes really good sense for them both to be a real powerhouse together in cloud computing. Earlier, I stressed that the providers who dominate the cloud world will be those that focus on extreme scalability, scale out, shared nothing, massively parallel processing being able to sift and analyze petabyte upon petabyte of data from all over especially of the Web 2.0 world especially clickstream information, and so forth.

Gardner: On the other hand, for those not shooting for the whole package, is this going to democratize IT?

Shimmin: When you look at the strategy vendors like IBM has, Sun will have, and Cisco has, in terms of how they're rolling out anything that's in the cloud -- whether its PaaS, infrastructure as a service, or SaaS -- they all seem to be doing two things.

One is that they are taking some point solutions that they are going direct with, like IBM with Bluehouse, for example. Secondly, they are going after an independent software vendor (ISV) market. They want to empower folks like amazon.com, Panorama, Pervasive, Peer1, Mosso, Akamai, Boomi, and all those guys. They're really looking to empower them to go out and deliver services.

What these companies are doing is allowing this broader feel, allowing this channel of service providers to exist, using their software and their services, and, in some cases, their actual data-center resources.

Gardner: I think you're saying that the organization that can provide the best ecology of partners and provide the best environment to thrive for many other players will do best, whereas, in the past, it seemed that, as an IT vendor, having the most installed base and the most lock-in offered the path to who did best.

Shimmin: Exactly.

Kobielus: I think you hit the nail on the head, Dana, when you pointed out that success in the emerging cloud arena depends on having a very broad and deep ecology of partners. I see the partner ecosystem as the new platform for cloud computing, being able to put together a group of partners that provide various differentiated features and services within an overall cloud-computing environment.

Then, the hub partner, as it were, provides some core, enabling infrastructure that binds them all together. Core infrastructures such as, for example, a core analytic environment or distributed data-warehousing environment that manages all of the structured, unstructured, and semi-structured data, manages all of the very compute-intensive analytical workloads, CPUs, and other resources that many or all of the partner solutions can tap into -- a basic utility computing environment.

Shimmin: When you look at a company like Microsoft, they seem to be slow to market, and then, once they enter the market, they go really, really fast. They seem to be going really, really fast at the moment with two things, because they have both. They have the infrastructure and they also have the apps. They're going to have both paths.

They have the Azure platform, which is truly a PaaS offering that you use to build your own applications. So it's a layer above the Amazon EC2 infrastructure as a service.

Then they have the full-on SaaS-type products with Microsoft Online Services, which has in it almost the entirety of their collaboration software. So, they have actually sort of leapfrogged IBM Bluehouse a little bit with that.

The point is that these vendors are really looking at their portfolios and seeing which ones fit either of those two models. They're not committing to one or the other, Dana. They're really trying to tackle both ends [the infrastructure and the apps].

McKendrick: Just about every small ISV coming on the market now is offering a SaaS model. This is the way to go with the emerging smaller software-development companies.

For the larger developers, ISVs that are already well-established, it's now another delivery mechanism, another channel to reach their customer base. There are a lot of efficiencies. When you have a cloud model or are working with a cloud model, you don't have to worry about making sure all your customers receive the latest upgrade or deal with problems customers may be having with conflicting software. It's all done once. You do the upgrade once, test it, ensure the quality, deliver it, and it's all done in one location. It makes their job a lot easier.

Kobielus: What the whole trend toward SOA started was the gradual dissolution or deconstruction of the underlying platforms, as you mentioned -- OSs, development environments, and the declarative programming languages. This is all buggy-whip territory now in terms of what large and small software vendors are developing to. Pretty much everyone is now developing to a virtualized SOA, cloud environment.

Most of the large and small vendors that I talk to ... are really looking at more of a flex-sourcing approach to delivering solutions to market. ... Most of the vendors that I talk to now have three broad go-to market delivery approaches for flexible delivery of applications or of solutions. They have everything as a service approach, the appliance approach, and the packaged, licensed software approach.

If you look at cloud computing as a Venn diagram, with many smaller bubbles within it, one of the hugest bubbles is this notion of flexible packaging and sourcing of solution functionality.

The "Chinese Wall" between internal hosting and external hosting is dissolving, as more and more organizations say, "You know what. We want to do data warehousing. We'll license a software from vendor X. We might also use their hosted offerings for these particular data marts. We also might go with an appliance from them, for either our data warehousing hub, a particular operational data store, or another deployment wall where the appliance form factor makes most sense."

Baer: The vision of SOA is that it runs both ways. You publish services and you consume services. ... Through SOA, perhaps companies can look at increasing capacity or tapping into capacity as needed in a grid like fashion, either with each other, or with a provider out there such as Amazon or IBM.

Shimmin: When you look at companies like IBM and Microsoft -- Microsoft with their Software plus Services, and IBM with their Foundation Start Appliance, coupled with their Bluehouse software as a service, coupled with their on-premises collaboration software -- you're talking about a solution that spans those three delivery mechanisms.

The pressures I'm talking about that are making that so for the enterprise buyer is that you don't want to have a full SaaS deployment, and you don't want to have a full appliance deployment. When you consider issues like ownership of data, privacy of that data or SOAs, even transaction volumes, there are facets of your enterprise application that are best suited to running in an appliance, in your data center, or in the cloud.

So, these vendors we are talking about here clearly recognize that need, and are trying to re-architect their software so it can run across those three channels in different ways.

McKendrick: The underlying architecture that a lot of vendors are moving toward to enable that degree of flexible deployment of different form factor -- hosted service, appliance, and packaged license software -- is the notion of shared nothing, massively parallel processing for extreme scale-out capabilities and extreme scale up as well.

In a federated model, where you have different clusters that can be internal, external, or in combinations specialized to particular roles within the application environment, some might be optimized for data warehousing, some might be optimized for business-process management and workflow, and others might be optimized for the upfront delivery, Web 2.0, REST, and all that. But, having shared nothing, massively parallel processing, with a federated middleware fabric in an SOA context, is where everybody is moving their platform and strategy.

Gardner: How about Amazon? That would be in my thinking a pretty good candidate for prom queen right now. Perhaps there will be some polygamy at the prom, because Amazon could team up potentially with say an Oracle and a Salesforce. Can you imagine such a pairing?

Kobielus: Yeah, because Oracle, a couple of months ago, announced that you can now take your existing Oracle database licenses and you can move them to the Amazon EC cloud and the Amazon storage service. So, to a degree, that partnership foreshadows possibly a larger relationship between those two companies going forward.

I think its really an interesting pairing of Oracle plus Amazon. Once again, I always have to hit the analytics thing on the head, because I think database analytics or cloud-scalable analytics is going to be a key differentiator for most application vendors.

Shimmin: With regards to Microsoft's channel, as you and Jim were saying, Microsoft is definitely going to be the queen bee and they are definitely going to make it beneficial to this channel to work with them in their cloud initiatives. At the same time, it's also Microsoft's greatest risk.

When you look at their PaaS with Azure, that makes sense for the channel, because how the channel differentiates is by the services they provide their customers directly, and that comes from developing code. But, when you talk about Microsoft's online services, Office Live, and those things, they are in a very precarious predicament of undercutting the values that their channel partners provide.

They're literally saying, "Hey, why do you need a channel partner for the SMB market, just come right to us and give us your credit card, which you can do for a certain number of dollars a month, and you are running."

Gardner: Right, so perhaps Microsoft has the golden opportunity but the transition is perilous, and execution has to be perfect. Just as we had back in the "anti" days, when all of the Unix vendors got together and created what they called the "anti-Microsoft coalition," all these other cloud providers, ISVs, developers, and all the PaaS people are going to get together and try to provide more of a marketplace, in order to if not staunch Microsoft, at least create that democratic approach to cloud.

Shimmin: I can’t believe I'm saying this -- Microsoft has really done something spectacular here, because it all comes back to the developer. What the developer does drives what software you run on the server, in many cases. What Microsoft has done with the Software-plus-Services program initiative, right now, today, using the 3.5 .NET framework in Windows 2008, you can write code that can be dropped in the cloud or on the desktop automatically. You can just write a rule that says, "If I reach a certain service level agreement (SLA), just kick this piece of code to the cloud."

Gardner: So Microsoft and not the business becomes the arbiter.

Shimmin: Exactly.
Listen to the podcast. Download the podcast. Find it on iTunes/iPod. Charter Sponsor: Active Endpoints.

Read a full transcript of the discussion.

Special offer: Download a free, supported 30-day trial of Active Endpoint's ActiveVOS at www.activevos.com/insight.