Ben Lorica

Ben Lorica is the Chief Data Scientist and Director of Content Strategy for Data at O'Reilly Media, Inc.. He has applied Business Intelligence, Data Mining, Machine Learning and Statistical Analysis in a variety of settings including Direct Marketing, Consumer and Market Research, Targeted Advertising, Text Mining, and Financial Engineering. His background includes stints with an investment management company, internet startups, and financial services.

The re-emergence of time-series

Researchers begin to scale up pattern recognition, machine-learning, and data management tools.

My first job after leaving academia was as a quant 1 for a hedge fund, where I performed (what are now referred to as) data science tasks on financial time-series. I primarily used techniques from probability & statistics, econometrics, and optimization, with occasional forays into machine-learning (clustering, classification, anomalies). More recently, I’ve been closely following the emergence of…
Read Full Post | Comment |

An update on in-memory data management

In-memory data management brings data close to the computation.

By Ben Lorica and Roger Magoulas We wanted to give you a brief update on what we’ve learned so far from our series of interviews with players and practitioners in the in-memory data management space. A few preliminary themes have emerged, some expected, others surprising. Performance improves as you put data as close to the computation as…
Read Full Post | Comment: 1 |

Need speed for big data? Think in-memory data management

We're launching an investigation into in-memory data technologies.

By Ben Lorica and Roger Magoulas In a forthcoming report we will highlight technologies and solutions that take advantage of the decline in prices of RAM, the popularity of distributed and cloud computing systems, and the need for faster queries on large, distributed data stores. Established technology companies have had interesting offerings, but what initially caught our attention…
Read Full Post | Comments: 14 |

Seven reasons why I like Spark

Spark is becoming a key part of a big data toolkit.

A large portion of this week’s Amp Camp at UC Berkeley, is devoted to an introduction to Spark – an open source, in-memory, cluster computing framework. After playing with Spark over the last month, I’ve come to consider it a key part of my big data toolkit. Here’s why: Hadoop integration: Spark can work with files stored in…
Read Full Post | Comments: 2 |

Active Facebook users by region: November, 2010

With Facebook unveiling an integrated messaging system for its more than 500 million users, I decided to update a few charts that breakdown its users by region.

Read Full Post | Comments: 4 |

Hiring trends among the major platform players

The battle for the Internet's points of control requires amassing talent.

Consistent with the recent flurry of articles about hiring wars, many platform companies have increased their number of job postings. Winning the battle for the Internet's points of control requires amassing talent.

Read Full Post | Comments Off |

Windows Mobile apps are more expensive than iPhone apps

The mean app price for the Windows market is nearly two times higher than the App Store.

The Windows Marketplace for Mobile now has about 1,400 apps spread across 16 categories. In this short post I'll provide some basic statistics and compare it with the grandaddy of app stores: the U.S. iTunes store.

Read Full Post | Comments: 30 |

Crowdsourcing specific microtasks

Since the first-ever Mechanical Turk meetup a year ago, there has been an explosion in crowdsourcing services and a well-attended conference in San Francisco. I remain enthusiastic about crowdsourcing, but the number of companies has me worried about quality of work. Fortunately specialization is already occurring, so for particular tasks there are companies out there ready to provide high-quality service….

Read Full Post | Comments: 2 |

Amazon's cloud platform still the largest, but others are closing the gap

Measured in terms of (U.S.) job postings, Amazon's Cloud Computing platform is still larger than Google's App Engine. What's interesting is that the gap has closed over the past year.

Read Full Post | Comments: 5 |
The number of Hadoop jobs continue to rise

The number of Hadoop jobs continue to rise

While still a small fraction of data management job postings, the number of job posts that mention "hadoop" continue to grow steadily. Year-over-year, there were 300% more such job posts in the first seven months of 2010 compared to the same period in 2009. The fraction of "hadoop" jobs posted by California companies remain high, but is definitely lower than what it was last year.

Read Full Post | Comment: 1 |