Wednesday, May 16, 2012

Searching for data scientists as a service

This guest post comes courtesy of Tony Baer's OnStrategies blog. Tony is senior analyst at Ovum.

By Tony Baer

It’s no secret that rocket .. err … data scientists are in short supply. The explosion of data and the corresponding explosion of tools, and the knock-on impacts of Moore’s and Metcalfe’s laws, is that there is more data, more connections, and more technology to process it than ever. At last year’s Hadoop World, there was a feeding frenzy for data scientists, which only barely dwarfed demand for the more technically oriented data architects. In English, that means:

1. Potential MacArthur Grant recipients who have a passion and insight for data, the mathematical and statistical prowess for ginning up the algorithms, and the artistry for painting the picture that all that data leads to. That’s what we mean by data scientists.

2. People who understand the platform side of Big Data, a.k.a., data architect or data engineer.

The data architect side will be the more straightforward nut to crack. Understanding big data platforms (Hadoop, MongoDB, Riak) and emerging Advanced SQL offerings (Exadata, Netezza, Greenplum, Vertica, and a bunch of recent upstarts like Calpont) is a technical skill that can be taught with well-defined courses. The laws of supply and demand will solve this one – just as they did when the dot com bubble created demand for Java programmers back in 1999.

Behind all the noise for Hadoop programmers, there’s a similar, but quieter desperate rush to recruit data scientists. While some data scientists call data scientist a buzzword, the need is real.

It’s all about connecting the dots, not as easy as it sounds.

However, data science will be a tougher number to crack. It’s all about connecting the dots, not as easy as it sounds. The V’s of big data – volume, variety, velocity, and value — require someone who discovers insights from data; traditionally, that role was performed by the data miner. But data miners dealt with better-bounded problems and well-bounded (and known) data sets that made the problem more 2-dimensional.

The variety of Big Data – in form and in sources – introduces an element of the unknown. Deciphering Big Data requires a mix of investigative savvy, communications skills, creativity/artistry, and the ability to think counter-intuitively. And don’t forget it all comes atop a foundation of a solid statistical and machine learning background plus technical knowledge of the tools and programming languages of the trade.

Sometimes it seems like we’re looking for Albert Einstein or somebody smarter.

Nature abhors a vacuum

As nature abhors a vacuum, there’s also a rush to not only define what a data scientist is, but develop programs that could somehow teach it, software packages that to some extent package it, and otherwise throw them into a meat … err, the free market. EMC and other vendors are stepping up to the plate to offer training, not just on platforms, but for data science. Kaggle offers an innovative cloud-based, crowdsourced approach to data science, making available a predictive modeling platform and then staging sponsored 24-hour competitions for moonlighting data scientists to devise the best solutions to particular problems (redolent of the Netflix $1 million prize to devise a smarter algorithm for predicting viewer preferences).

With data science talent scarce, we’d expect that consulting firms would buy up talent that could then be “rented’ to multiple clients. Excluding a few offshore firms, few systems integrators (SIs) have yet stepped up to the plate to roll out formal big data practices (the logical place where data scientists would reside), but we expect that to change soon.

Opera Solutions, which has been in the game of predictive analytics consulting since 2004, is taking the next step down the packaging route. having raised $84 million in Series A funding last year, the company has staffed up to nearly 200 data scientists, making it one of the largest assemblages of genius this side of Google. Opera’s predictive analytics solutions are designed for a variety of platforms, SQL and Hadoop, and today they join the SAP Sapphire announcement stream with a release of their offering on the HANA in-memory database. Andrew Brust provides a good drilldown on the details on this announcement.

With market demand, there will inevitably be a watering down of the definition of data scientists so that more companies can claim they’ve got one… or many.

From SAP’s standpoint, Opera’s predictive analytics solutions are a logical fit for HANA as they involve the kinds of complex problems (e.g., a computation triggers other computations) that their new in-memory database platform was designed for.

There’s too much value at stake to expect that Opera will remain the only large aggregation of data scientists for hire. But ironically, the barriers to entry will keep the competition narrow and highly concentrated. Of course, with market demand, there will inevitably be a watering down of the definition of data scientists so that more companies can claim they’ve got one… or many.

The laws of supply and demand will kick in for data scientists, but the ramp up of supply won’t be as quick as that for the more platform-oriented data architect or engineer. Of necessity, that supply of data scientists will have to be augmented by software that automates the interpretation of machine learning, but there’s only so far that you can program creativity and counter-intuitive insight into a machine.

This guest post comes courtesy of Tony Baer's OnStrategies blog. Tony is senior analyst at Ovum.

You may also be interested in:

Tuesday, May 15, 2012

MuleSoft suite of tools eases way for SaaS integration in the cloud

MuleSoft this week launched Mule iON SaaS Edition, providing a broad set of new tools and services for swift software-as-a -Service (SaaS) integration in the cloud, and lowering the barrier to SaaS adoption for SaaS providers and developers.

The Mule iON integration platform as a service (iPaaS) connects across cloud-based applications and also connects SaaS to on-premise applications. MuleSoft's Anypoint technology for on-demand API connectivity eliminates the need for copious custom point-to-point code, said MuleSoft. [Disclosure: MuleSoft is a sponsor of BriefingsDirect podcasts.]

In recent commentary, Ross Mason, founder and CTO of Mulesoft, said, "The world today is moving at lightning speed to SaaS and cloud applications, and the idea of gaining competitive advantage through legacy enterprise applications is no longer relevant."

I agree. Key differentiators less involve building applications now than in the effective composition of services. Cloud and SaaS providers need to give their clients better means to leverage APIs and craft business processes across both enterprise and multiple Saas provider boundaries. This rationalization of cloud services stew is the new integration nut to crack.

The problem is, what type of platform and organizations can fulfill the role of cloud services orchestration hub? The role may not fit well for any one SaaS provider, nor any single or cadre of enterprises. For the time being, a best of breed platform and supporting ecosystem must evolve, and then the market will decide on who or what will be the acceptable hub mechanisms.

And the market for cloud integration technologies is clearly heating up. Also this week, FuseSource unveilved at CamelOne in Boston the Fuse ESB Enterprise 7.0 and Fuse MQ Enterprise 7.0 products to general availability. These platforms enable "Integration Everywhere," says FuseSource, with modular, open source products based on Apache Software Foundation projects. [Disclosure: FuseSource is a sponsor of BriefingsDirect podcasts.]

QuickStart Plan


Integration platform provider MuleSoft also unveiled on Monday a new QuickStart Plan for fast growth SaaS vendors and systems integrators (SIs) that enables them to build their own revenue-generating integration apps on the Mule iON cloud platform in just a few days. Pricing for Mule iON SaaS Edition is based on a per month, volume of use basis, not based on connectivity, encouraging more connections over time.

On other integration news, SAP today said it plans to offer its own cloud-based integration technology, and also plans to enable its ecosystem of partners, including solutions from Mulesoft.

New features available with Mule iON SaaS Edition, which is available now, include:
  • Graphical data mapping and transformation capabilities enable SaaS vendors and SIs to build and deploy integration apps without writing custom code by using the Mule Studio drag-and-drop interface.
    The dark side of SaaS and Cloud is that while they are relatively easy to procure and deploy, it is difficult to integrate them with existing enterprise applications and other SaaS offerings.
  • Cloud Connector ToolKit creates new cloud connectors in Mule Studio for any public or private Web API.
  • Customer self-service portals allow customers to independently manage integrations, minimizing dependency on developers and reducing support calls.
  • SaaS Operations Center provides complete visibility into end user environments with a multi-tenant portal to monitor, manage and maintain integration apps, including:

    • Operational dashboards: deliver better customer support with live integration status and performance metrics.
    • Real-time notifications: meet availability requirements and improve service level agreements (SLAs) with immediate notifications for events or performance issues as they occur.
    • Proactive alerts: reduce support calls by proactively monitoring and addressing issues before they impact customers.
In addition, Mule iON SaaS Edition introduces a gallery of over 20 packaged integration apps and more than 100 Cloud Connectors for the most common integration use cases.

Opportunities for everyone

Ovum's Carter Lusher sees opportunities for everyone involved:
The dark side of SaaS and Cloud is that while they are relatively easy to procure and deploy, it is difficult to integrate them with existing enterprise applications and other SaaS offerings. What makes integration even more challenging is the proliferation of SaaS deployed within an organisation as line-of-business managers procure point solutions to their specific needs that really should be integrated with other systems in order to maximize value and manageability.
This becomes a challenge for IT and the vendors who are faced with a plethora of public and private APIs that require brute force to integrate. Integration is expensive, with estimates of $8 of integration work for every $1 of SaaS subscription or software license.
For systems integrators, Mule iON SaaS Edition offers the ability to create reusable connectors for a variety of horizontal and industry-specific applications and SaaS.
For SaaS and traditional enterprise applications, MuleSoft’s Mule iON SaaS Edition offers the ability to create pre-packaged integration modules that will give them a compelling story during the sales cycle without dramatically increasing costs or long-term maintenance. For example, HR talent management SaaS vendor PeopleMatter used Mule iON to create a new hire onboard module that connects with ADP payroll processing through ADP’s private APIs.
For systems integrators, Mule iON SaaS Edition offers the ability to create reusable connectors for a variety of horizontal and industry-specific applications and SaaS. This not only reduces the cost of integrations, which can be a competitive advantage in a sales cycle, but also gives the SI the opportunity to sell more value-added consulting as the focus of sales discussion moves away from brute force integration to maximizing the business value of enterprise applications or SaaS.
In other news, MuleSoft announced a record quarter in Q1 2012, achieving a 109 percent increase in bookings year over year, the privately held San Francisco company said. This was driven by new customer wins among major companies and key SaaS vendor partnerships added in Q1 include Avalara and Zuora. Additionally, the company reported a strong customer renewal rate of 95 percent.

You may also be interested in: