Monday, November 9, 2009

Part 3 of 4: Web data services--Here's why text-based content access and management plays crucial role in real-time BI

Listen to the podcast. Find it on iTunes/iPod and Podcast.com. View a full transcript or download a copy. Learn more. Sponsor: Kapow Technologies.

Text-based content and information from across the Web are growing in importance to businesses. The need to analyze web-based text in real-time is rising to where structured data was in importance just several years ago.

Indeed, for businesses looking to do even more commerce and community building across the Web, text access and analytics forms a new mother lode of valuable insights to mine.

As the recession forces the need to identify and evaluate new revenue sources, businesses need to capture such web data services for their business intelligence (BI) to work better, deeper, and faster.

In this podcast discussion, Part 3 of a series on web data services for BI, we discuss how an ecology of providers and a variety of content and data types come together in several use-case scenarios.

In Part 1 of our series we discussed how external data has grown in both volume and importance across the Internet, social networks, portals, and applications. In Part 2, we dug even deeper into how to make the most of web data services for BI, along with the need to share those web data services inferences quickly and easily.

Our panel now looks specifically at how near real-time text analytics fills out a framework of web data services that can form a whole greater than the sum of the parts, and this brings about a whole new generation of BI benefits and payoffs.

To help explain the benefits of text analytics and their context in web data services, we're joined by Seth Grimes, principal consultant at Alta Plana Corp., and Stefan Andreasen, co-founder and chief technology officer at Kapow Technologies. The discussion is moderated by me, Dana Gardner, principal analyst at Interarbor Solutions.

Here are some excerpts:
Grimes: "Noise free" is an interesting and difficult concept when you're dealing with text, because text is just a form of human communication. Whether it's written materials, or spoken materials that have been transcribed into text, human communications are incredibly chaotic ... and they are full of "noise." So really getting to something that's noise-free is very ambitious.

... It's become an imperative to try to deal with the great volume of text -- the fire hose, as you said -- of information that's coming out. And, it's coming out in many, many different languages, not just in English, but in other languages. It's coming out 24 hours a day, 7 days a week -- not only when your business analysts are working during your business day. People are posting stuff on the web at all hours. They are sending email at all hours.

If you want to keep up, if you want to do what business analysts have been referring to as a 360-degree analysis of information, you've got to have automated technologies to do it.



... There are hundreds of millions of people worldwide who are on the Internet, using email, and so on. There are probably even more people who are using cell phones, text messaging, and other forms of communication.

If you want to keep up, if you want to do what business analysts have been referring to as a 360-degree analysis of information, you've got to have automated technologies to do it. You simply can't cope with the flood of information without them.

Fortunately, the software is now up to the job in the text analytics world. It's up to the job of making sense of the huge flood of information from all kinds of diverse sources, high volume, 24 hours a day. We're in a good place nowadays to try to make something of it with these technologies.

Andreasen: ... There is also a huge amount of what I call "deep web," very valuable information that you have to get to in some other way. That's where we come in and allow you to build robots that can go to the deep web and extract information.

... Eliminating noise is getting rid of all this stuff around the article that is really irrelevant, so you get better results.

The other thing around noise-free is the structure. ... The key here is to get noise-free data and to get full data. It's not only to go to the deep web, but also get access to the data in a noise-free way, and in at least a semi-structured way, so that you can do better text analysis, because text analysis is extremely dependent on the quality of data.

Grimes: ... [There are] many different use-cases for text analytics. This is not only on the Web, but within the enterprise as well, and crossing the boundary between the Web and the inside of the enterprise.

Those use-cases can be the early warning of a Swine flu epidemic or other medical issues. You can be sure that there is text analytics going on with Twitter and other instant messaging streams and forums to try to detect what's going on.

... You also have brand and reputation management. If someone has started posting something very negative about your company or your products, then you want to detect that really quickly. You want early warning, so that you can react to it really quickly.

We have some great challenges out there, but . . . we have great technologies to respond to those challenges.



We have a great use case in the intelligence world. That's one of the earliest adopters of text analytics technology. The idea is that if you are going to do something to prevent a terrorist attack, you need to detect and respond to the signals that are out there, that something is pending really quickly, and you have to have a high degree of certainty that you're looking at the right thing and that you're going to react appropriately.

... Text analytics actually predate BI. The basic approaches to analyzing textual sources were defined in the late '50s. Actually, there is a paper from an IBM researcher from 1958, that defines BI as the analysis of textual sources.

...[Now] we want to take a subset of all of the information that's out there in the so-called digital universe and bring in only what's relevant to our business problems at hand. Having the infrastructure in place to do that is a very important aspect here.

Once we have that information in hand, we want to analyze it. We want to do what's called information extraction, entity extraction. We want to identify the names of people, geographical location, companies, products, and so on. We want to look for pattern-based entities like dates, telephone numbers, addresses. And, we want to be able to extract that information from the textual sources.

Suitable technologies

All of this sounds very scientific and perhaps abstruse -- and it is. But, the good message here is one that I have said already. There are now very good technologies that are suitable for use by business analysts, by people who aren't wearing those white lab coats and all of that kind of stuff. The technologies that are available now focus on usability by people who have business problems to solve and who are not going to spend the time learning the complexities of the algorithms that underlie them.

Andreasen: ... Any BI or any text analysis is no better than the data source behind it. There are four extremely important parameters for the data sources. One is that you have the right data sources.

There are so many examples of people making these kind of BI applications, text analytics applications, while settling for second-tier data sources, because they are the only ones they have. This is one area where Kapow Technologies comes in. We help you get exactly the right data sources you want.

The other thing that's very important is that you have a full picture of the data. So, if you have data sources that are relevant from all kinds of verticals, all kinds of media, and so on, you really have to be sure you have a full coverage of data sources. Getting a full coverage of data sources is another thing that we help with.

Noise-free data

We already talked about the importance of noise-free data to ensure that when you extract data from your data source, you get rid of the advertisements and you try to get the major information in there, because it's very valuable in your text analysis.

Of course, the last thing is the timeliness of the data. We all know that people who do stock research get real-time quotes. They get it for a reason, because the newer the quotes are, the surer they can look into the crystal ball and make predictions about the future in a few seconds.

The world is really changing around us. Companies need to look into the crystal ball in the nearer and nearer future. If you are predicting what happens in two years, that doesn't really matter. You need to know what's happening tomorrow.
Listen to the podcast. Find it on iTunes/iPod and Podcast.com. View a full transcript or download a copy. Learn more. Sponsor: Kapow Technologies.

Thursday, November 5, 2009

Role of governance plumbed in Nov. 10 webinar on managing hybrid and cloud computing types

I'll be joining John Favazza, vice president of research and development at WebLayers, on Nov. 10 for a webinar on the critical role of governance in managing hybrid cloud computing environments.

The free, live webinar begins at 2 p.m. EDT. Register at https://www2.gotomeeting.com/register/695643130. [Disclosure: WebLayers is a sponsor of BriefingsDirect podcasts.]

Titled "How Governance Gets You More Mileage from Your Hybrid Computing Environment,” the webinar targets enterprise IT managers, architects and developers interested in governance for infrastructures that include hybrids of cloud computing, software as a service (saaS) and service-oriented architectures (SOA). There will be plenty of opportunity to ask questions and join the discussion.

Organizations are looking for more consistency across IT-enabled enterprise activities, and are finding competitive differentiation in being able to best manage their processes more effectively. That benefit, however, requires the ability to govern across different types of systems and infrastructure and applications delivery models. Enforcing policies, and implementing comprehensive governance, acts to enhance business modeling, additional services orientation, process refinement, and general business innovation.

Increasingly, governance of hybrid computing environments establishes the ground rules under which business activities and processes -- supported by multiple and increasingly diverse infrastructure models -- operate.

Developing and maintaining governance also fosters collaboration between architects, those building processes and solutions for companies, and those operating the infrastructure -- be it supported within the enterprise or outside. It also sets up multi-party business processes, across company boundaries, with coordinated partners.

Cambridge, Mass.-based WebLayers provides a design-time governance platform that helps centralize policy management across multiple IT domains -- from SOA through mainframe and cloud implementations. Such governance clearly works to reduce the costs of managing and scaling such environments, individually and in combination.

In the webinar we'll look at how structured policies, including extensions across industry standards, speeds governance implementations and enforcement -- from design-time through ongoing deployment and growth.

So join me and Favazza and me at 2 p.m. ET on Nov. 10 by registering at https://www2.gotomeeting.com/register/695643130.

Wednesday, November 4, 2009

HP takes converged infrastructure a notch higher with new data warehouse appliance

Hewlett-Packard (HP) on Wednesday announced new products, solutions and services that leaves the technology packaging to them, so users don't have to.

HP Neoview Advantage, HP Converged Infrastructure Architecture, and HP Converged Infrastructure Consulting Services are designed to help organizations drive business and technology innovations at lower total cost via lower total hassle. [Disclosure: HP is a sponsor of BriefingsDirect podcasts.]

HP’s measured focus

HP isn’t just betting on a market whim. Recent market research it supported reveals that more than 90 percent of senior business decision makers believe business cycles will continue to be unpredictable for the next few years — and 80 percent recognize they need to be far more flexible in how they leverage technology for business.

The same old IT song and dance doesn't seem to be what these businesses are seeking. Nearly 85 percent of those surveyed cited innovation as critical to success, and 71 percent said they would sanction more technology investments -- if they could see how those investments met their organization’s time-to-market and business opportunity needs.

Cost nowadays is about a lot more than the rack and license. The fuller picture of labor, customization, integration, shared services suppport, data-use-tweaking and inevitable unforeseen gotchas need to be better managed in unison -- if that desired agility can also be afforded (and sanctioned by the bean-counters).

HP said its new offerings deliver three key advantages:
  • Improved competitiveness and risk mitigation through business data management, information governance, and business analytics

  • Faster time to revenue for new goods and services

  • The ability to return to peak form, after being compressed or stretched.
The Neoview advantage

First up, HP Neoview Advantage, the new release of the HP Neoview enterprise data warehouse platform, which aims to help organizations respond to business events more quickly by supporting real-time insight and decision-making.

HP calls the performance, capacity, footprint and manageability improvements dramatic and says the software also reduces the total cost of ownership (TCO) associated with industry-standard components and pre-built, pre-tested configurations optimized for warehousing.

HP Neoview Advantage and last year's Exadata product (produced in partnership with Oracle) seem to be aimed at different segments. Currently, HP Neoview Advantage is a "very high end database," whereas Exadata is designed for "medium to large enterprises," and does not scale to the Neoview level, said Deb Nelson, senior vice president, Marketing, HP Enterprise Business.

A converged infrastructure

Next up, HP Converged Infrastructure architecture. As HP describes it, the architecture adjusts to meet changing business needs, specifically what HP calls “IT sprawl,” which it points to as the key culprit in raising technology costs for maintenance that could otherwise be used for innovation.

HP touts key benefits of this new architecture. First, the ability to deploy application environments on the fly through shared service management, followed closely by lower network costs and less complexity. The new architecture is optimized through virtual resource pools and also improves energy integration and effectiveness across the data center by tapping into data center smart grid technology.

Finally, HP is offering Converged Infrastructure Consulting Services that aim to help customers transition from isolated product-centric technologies to a more flexible converged infrastructure. The new services leverage HP’s experience in shared services, cloud computing, and data center transformation projects to let customers design, test and implement scalable infrastructures.

Overall, typical savings of 30 percent in total costs can be achieved by implementing Data Center Smart Grid technologies and solutions, said HP.

With these moves to converged infrastructure, HP is filling out where others are newly treading. Cisco and EMC this week announced packaging partnerships that seek to deliver simiar convergence benefits to the market.

"It's about experience, not an experiment," said Nelson.

BriefingsDirect contributor Jennifer LeClaire provided editorial assistance and research on this post.

Tuesday, November 3, 2009

Aster Data architects application logic with data for speeded-up analytics processing en masse

In real estate, the mantra is "location, location, location." The same could be said for the juxtaposition of applications logic and data. With enterprise data growing at an explosive rate, having applications separate from the mountains of data that they rely on has resulted in massive data movement -- increasing latency and restricting due analysis.

Aster Data, which provides massively parallel processing (MPP) data management, has tackled the location problem head-on with the announcement this week of Aster Data Version 4.0, (along with Aster nCluster System 4.0), a massively parallel application-data server that allows companies to embed applications inside an MPP data warehouse. This is designed to speed the processing of terabytes to petabytes of data.

The latest offering from the San Carlos, Calif., company fully parallelizes both data and a wide variety of analytics applications in one system. This provides faster analysis for such data-heavy applications as real-time fraud detection, customer behavior modeling, merchandising optimization, affinity marketing, trending and simulations, trading surveillance, and customer calling patterns.

While both data and applications reside in the same system, they are independent of one another, but both execute as "first-class citizens" with their respective data and application management services.

Resource sharing

The Aster Data Application Server is responsible for managing and coordinating activities and resource sharing in the cluster. It also acts as a host for the application processing and data inside the cluster. In its role as data host, it manages incremental scaling, fault tolerance and heterogeneous hardware for application processing.

Aster Data Version 4.0 provides application portability, which allows companies to take their existing Java, C, C++, C#, .NET, Perl and Python applications, MapReduce-enable them and push them down into the data.

The Dynamic Workload Management (WLM) helps support hundreds of concurrent mixed workloads that can span interactive and batch data queries, as well as application execution. Includes granular rule-based prioritization of workloads and dynamic allocation and re-allocation of resources.

Other features include:
  • Trickle feeds for granular data loading and interactive queries with millisecond response times

  • New online partition splitting capabilities to allow infinite cost-effective scaling

  • Dual-stage query optimizer, which ensures peak performance across hundreds to thousands of CPU cores

  • Integrations with leading business intelligence (BI) tools and Hadoop.
More companies want to bring more data to bear on more BI problems. While Aster's benefits and value may be used for high-end and esoteric analytics uses now, I fully expect that there data-intense architectures will be finding more uses. The price, too, is dropping, making the use of such systems more affordable.

Many of the core users of high-end analytics are also moving on architecture-wise. The systems designed five or more years ago will not meet the needs of five or even a few years from now.

What's really cool about Aster Data's approach is the analytics apps can be used, and the languages and query semantics most familiar to users can be used with the new systems and architectures.

I suppose we should also expect more of these analytics engines to become available as services, aka cloud services. That would allow joins of more data sets and they the massive analytics applications can open up even more BI cans of worms.

Survey: Virtualization and physical infrastructures need to be managed in tandem

If your company uses test and development infrastructures as a proving ground for shared services, virtualization and private cloud environments, you’re not alone. More companies are moving in that direction, according to a Taneja Group survey.

Yet underlying the use of the newer infrastructure approaches lies a budding challenge. The recent Taneja Group survey of senior IT managers working on test/dev infrastructures at North American firms found that 72 percent of respondents said virtualization on its own doesn’t address their most important test/dev infrastructure challenges. Some 55 percent rate managing both virtual and physical resources as having a high or medium impact on their success. The market is clearly looking for ways to bridge this gap.

Sharing physical and virtual infrastructures

Despite the confusion in the market about the economics of the various flavors of cloud computing, Dave Bartoletti, a senior analyst and consultant at Taneja Group, says one thing is clear: Enterprises are comfortable with, and actively sharing, both physical and virtual infrastructures internally.

“This survey reaffirms that shared infrastructure is common in test/dev environments and also reveals it’s increasingly being deployed for production workloads,” Bartoletti says. "Virtualization is seen as a key enabling technology. But on its own it does not address the most important operational and management challenges in a shared infrastructure.”

Half the survey respondents are funding projects starts in 2009. Another 66 percent of respondents will have funded a project started by the end of 2010.



Noteworthy is the fact that 92 percent of test/dev operations are using shared infrastructures, and companies are making significant investments in infrastructure-sharing initiatives to address the operational and budgetary challenges. Half the survey respondents are funding projects in 2009. Another 66 percent of respondents will have funded a project started by the end of 2010.

The survey reveals most firms are turning to private cloud infrastructures to support test/dev projects, and that shared infrastructures are beginning to bridge the gap between pre-production and production silos. A full 30 percent are sharing resource pools between both test/dev and production applications. This indicates a rising comfort level with sharing infrastructure within IT departments.

Virtualization’s cost and control issues


Although 89 percent of respondents use virtualization for test/dev, more than half have virtualized less than 25 percent of their servers. That’s because virtualization adds several layers of control and cost issues that need to be addressed by sharing, process, workflow and other management capabilities in order to fully maximize and integrate both virtual and physical infrastructures.

“Test/Dev environments are one of the most logical places for organizations to begin implementing private clouds and prove the benefits of a more elastic, self service, pay-per-use service delivery model,” says Martin Harris, director Product Management at Platform Computing. “We’ve certainly seen this trend among our own customers and have found that additional management tools enabling private clouds are required to effectively improve business service levels and address cost cutting initiatives.” [Disclosure: Platform Computing is a sponsor of BriefingsDirect podcasts.]

Despite the heavy internal investments, however, 82 percent of respondents are not using hosted environments outside their own firewalls. The top barriers to adoption: Lack of control and immature technology.

BriefingsDirect contributor Jennifer LeClaire provided editorial assistance and research on this post.