Ben Lorica
Ben Lorica is a Senior Analyst in the Research Group at O'Reilly Media, Inc.. He has applied Business Intelligence, Data Mining and Statistical Analysis in a variety of settings including Direct Marketing, Consumer and Market Research, Targeted Advertising, Text Mining, and Financial Engineering. His background includes stints with an investment management company, internet startups, and financial services. At O'Reilly, Ben works on custom research and consulting projects, open source data warehousing and analytics.
Wed
Jul 1
2009
The US Online Job Market Was (still) Down Big In June 2009
by Ben Lorica | @dliman | comments: 1Updating my post from early June, the U.S. online job market still hasn't shown signs of recovering from steady declines that began in September of last year. Compared to the same period last year, there were 50% fewer job postings in June 2009.
An alternate view highlights the start of the downward trend, as well as the smaller than expected seasonal bounce from Dec-08 to Jan/Feb 2009. In a normal year, the number of postings decline in December (as employers table job searches for after the holidays) and recovers sharply the following Jan/Feb. While job postings did bounce back in Jan/Feb 2009, the seasonal bump was less than half of what occurred in previous years.
No geographic region has been exempt from the downturn in online job postings. There have been sharp declines in all states, ranging from -59% in DE, WY, and MN, to -38% in MD, OK, VA.

In closing, we still haven't detected the green shoots that some forecasters have been crowing about over the last few months. If one were to take an optimistic perspective, the worse year-over-year decline occurred in April. OTOH, we are still staring at a 50% decline in June 2009. So while we may have hit the bottom in April, we need a few strong(er) months before we can comfortably announce the arrival of green shoots.
() In partnership with SimplyHired and Greenplum, we maintain a data warehouse that contains most U.S. online job postings dating back to mid-2005. Data for this post was through 6/28/2009.
tags: big data, economy, jobs
| comments: 1
submit:
Fri
Jun 19
2009
Facebook Adds Million of Users in Asia
by Ben Lorica | @dliman | comments: 1Since my previous post on Facebook users by country, the company has grown rapidly in Asia. Over the last 12 weeks, Facebook grew 90% in Asia going from 11.4 to 21.7 million active users. With a Market Penetration of only 0.6% in Asia, Facebook has barely scratched the surface in the region.
The company also gained 11.3M users in Europe (up 19%) and 14.7M users in North America (up 21%) over the last 12 weeks. On a year-over-year basis, Facebook grew 194% (adding close to 150 million active users worldwide) from Jun/2008 to Jun/2009.
For more details, you can view regional numbers below:
tags: facebook, hard numbers, platforms, research, social networking
| comments: 1
submit:
Thu
Jun 11
2009
Mechanical Turk Best Practices
by Ben Lorica | @dliman | comments: 8Last night, Dolores Labs hosted what was billed as the first-ever Mechanical Turk meetup, and I was fortunate enough to have been able to squeeze into what turned out to be a great series of presentations. While Amazon was the pioneer and remains the largest provider in the space, other services like Dolores Labs and Nathan Eagle's txteagle have emerged to expand the pool of users and turks.
In the past, we've turned to Dolores Labs when we needed (machine-learning) training sets and were unable to quickly find reliable ones. To increase the quality of the output we receive from turks, we try to get multiple turks to perform an individual task and aggregate their work into a single answer. (We jokingly refer to this as the wisdom of micro-crowds.) Working on problems quite different from the ones we tackle, the first set of speakers presented research results confirming that this form of aggregation actually works. Rion Snow of Stanford's AI Lab presented results that suggest that for a large set of tasks, the aggregate work of 4-6 turks compare favorably to the work of a single (domain) expert. Working primarily in the area of NLP and computational linguistics, Bob Carpenter of alias-i presented similar results when evaluating turk-generated against gold standard training sets. (It's hard enough when turks disagree, but as Bob Carpenter highlighted, disagreements among experts makes it difficult to arrive at a gold standard.) Bob has found that in certain situations an iterative approach works best ("code-a-little", "learn-a-little") and tools that allow you to start suggesting "answers" to a new set of turks would help immensely. Coincidentally, one of the speakers presented a toolkit that allows users to do just that: Greg Little's TurKit is a JavaScript API for running iterative tasks in mechanical turk.
Another set of speakers talked about the emergence of mechanical turks as a research tool. Social scientists Aaron Shaw and John Horton spoke of favorably of their experience using turks for research experiments in economics and paired surveys. Among other things, they've conducted studies on the turk labor market by testing demand for tasks of varying difficulty (something Bob Carpenter also talked about), and by evaluating demand for follow-on tasks at lower wages. Alexander Sorokin of UIUC, presented work on using turks to annotate training sets for computer vision and robotics. For those interested in using turks to annotate images, Alex has a toolkit ready to go.
For most users of mechanical turk (us included), it has become an API call that fits smoothly within their workflow. (Or as someone at the meetup wryly suggested, turk is a Remote Person Call.) The last pair of speakers, Lilly Irani and Six Silberman, reminded us that behind mechanical turk lies thousands of workers ("the crowd in the cloud") working without (health care) benefits, oftentimes at extremely low hourly wages. Irani and Silberman suggested that rather than abstracting mechanical turk services as mere API calls, users should start thinking of the plight of the turks ("Mechanical Turk Bill of Rights") behind the service. As a first step they have a released a Firefox plugin that aims to narrow the information assymetry between turks (those performing tasks) and requesters (those posting tasks). While requesters can see ratings for turks, requesters aren't rated: Turkopticon lets turks rate requesters. They need more turks to download and start using Turkopticon, so if you know any mechanical turks please enourage them do so.
() According to Amazon representatives in the audience, a majority of turks are in the U.S. That may change in the future, once Amazon is able to get approval for other payment systems. Because of the possibility of money-laundering, services like AMT are subject to strict KYC controls.
tags: big data, machine learning, mechanical turk, meetup
| comments: 8
submit:
Wed
Jun 3
2009
The Economic Crisis and the US Online Job Market
by Ben Lorica | @dliman | comments: 10In my previous post, I noted that despite the large decline in total number of job postings, the number Hadoop/MapReduce job postings increased by 49%. What is the current state of the online job market? The financial crisis that began in the Fall of 2008 has had a lasting negative effect on the U.S. online job market. Since late 2008, there have been significantly less jobs posted online.
Using data from SimplyHired and a few charts, I'll quickly highlight the impact of the global economic crisis on the U.S. online job market. To quantify the sudden drop in U.S. online job postings, I calculated the average number of job posts per day:
The number of posts declined 49% from Jan/May 2008 to Jan/May 2009. While there has been a downward trend since April 2008, the financial crisis in September 2008 marked the start of even larger reductions. In particular, the relatively small number of job postings in Nov/Dec 2008 has carried over into the first five months of 2009. The sharp seasonal rebound that occurs in Jan/Feb of each year, was practically non-existent in 2009. While some forecasters are seeing signs of a recovery, at least through the first five months of 2009, we haven't detected "green shoots" in the U.S. online job market.
tags: big data, jobs
| comments: 10
submit:
Mon
Jun 1
2009
Most Hadoop Jobs Are In California
by Ben Lorica | @dliman | comments: 3Given the recent buzz surrounding Hadoop and MapReduce, I was curious if employers were beginning to mention either term in their job postings. Fortunately I have access to a massive job data warehouse dating back to mid-2005. In partnership with SimplyHired and Greenplum, we maintain a data warehouse that contains most of the online job postings in the U.S.
While the percentage of job postings that mention either Hadoop or MapReduce remains miniscule, the number of such postings is growing steadily:
The number of Hadoop/MapReduce job postings (during the Feb/Apr 2009 period) grew 49% compared to 2008. In contrast, the tough economic environment has translated to significantly fewer job postings: the total number of online job postings declined 40% during the same period.
How mainstream is Hadoop? While researching our report on Big Data, we talked to a (database) vendor who jokingly claimed that nobody outside of the West & East coast cared about Hadoop. Analysis of recent job postings seems to support that perspective. During the three most recent months, employers in 18 states posted Hadoop/MapReduce jobs online, but 60% of those were in California. The top 5 states (CA, MD, NY, MA, WA) accounted for 87% of the Hadoop/MapReduce job postings:
Looking at the same period last year, 72% of the job postings were in California, and the top 5 states (CA, WA, TX, PA, VA) accounted for 79%.
Given the presence of large (Google, Yahoo!, Facebook) and small companies (Cloudera, Greenplum, Aster, ...) who are leaders in the use of Hadoop/MapReduce, it's no surprise that at this early stage, a large share of jobs are in California. While the share of California job postings remains high (60%), it's down from 72% last year. As mentioned above, the percentage of job postings that mention either Hadoop or MapReduce remains miniscule, so I caution against reading too much into the geographic distributions. Nevertheless, it's clear that California employers are expressing interest in Hadoop skills ahead of their peers in other states.
tags: big data, hadoop, jobs
| comments: 3
submit:
Mon
May 18
2009
Being a Suggested User Leads to Thousands of Twitter Followers
by Ben Lorica | @dliman | comments: 4Ever since Twitter started suggesting accounts to new users, it was clear that those on the suggested users list were gaining thousands of followers. Setting aside the fact that number of followers is a poor gauge of influence (see our Twitter report for details), I wanted to know how many followers a suggested account gains by appearing on the list.
I took the set of accounts that were added to the suggested users list during the last two months, recorded their number of followers the day before they made the list (Initial # of followers), and tracked what happened a week, 2 weeks, and a month later. From an initial set of just over a hundred accounts, I was able to gather sufficient data (using Twitterholic and Twittercounter) on 80+ suggested users.
tags: hard numbers, twitter
| comments: 4
submit:
Wed
May 13
2009
2 Years Later, the Facebook App Platform is Still Thriving
by Ben Lorica | @dliman | comments: 8In a few weeks, the Facebook application platform will mark its second anniversary. While it garnered lots of press coverage in the months after it launched, the arrival of the iTunes app store shifted attention away from Facebook's vibrant ecosystem. The media glow is understandable: among other things, the younger iTunes platform is adding apps at a much faster rate than Facebook or Myspace.
Games comprise a sizable chunk of app revenues in all three platforms and recent stories suggest that 2009 has been a great year for developers. The substantial revenue generated by popular Facebook (and Myspace) apps has been the subject of articles in VentureBeat, TechCrunch, and Inside Facebook. There have also been recent estimates for the revenue generated by iPhone apps (see here and here). Game developers in particular are benefiting from having a multitude of platforms: Games are the largest iTunes category, and the second largest category in both Facebook and Myspace. In addition, 4 of the top 10 most successful Facebook app providers are Game developers.
tags: facebook, iphone, myspace, platform, platforms, social media
| comments: 8
submit:
Fri
May 8
2009
Up Close with an Enigma
by Ben Lorica | @dliman | comments: 6At last month's RSA conference in San Francisco, I stumbled upon a vintage 1944 model of the German crypothographic machine, popularly known as the Enigma. This particular machine was owned by the National Cryptologic Museum, and was part of a larger booth hosted by the National Security Agency. The staff at the exhibit were quite friendly and it didn't take much to convince someone from the NSA to talk on-camera about the Enigma. (I did decide to submit the video to the NSA public affairs office for final review.) Reading through the accompanying historical pamphlet and listening to NSA staffers, I developed a better appreciation for the contributions made by Polish authorities (and mathematicians) towards breaking what was then, the most important cryptographic machine in the world.
Also from RSA 2009:
tags: history, mashup, oauth, security
| comments: 6
submit:
Mon
May 4
2009
Big Data: SSD's, R, and Linked Data Streams
by Ben Lorica | @dliman | comments: 2The Solid State Storage Revolution: If you haven't seen it, I recommend you watch Andy Bechtolsheim's keynote at the recent Mysqlconf. We covered SSD's in our just published report on Big Data management technologies. Since then, we've gotten additional signals from our network of alpha geeks and our interest in them remains high.
R and Linked Data Streams: I had a chance to visit with Dataspora founder and blogger Mike Driscoll, an enthusiastic advocate for the use of the open source statistical computing language, R. After founding and leading online retailer CustomInk.com, Mike went back to grad school and earned a doctorate in Bioinformatics. He has applied data analysis and programming in a variety of domains including retail, biotech, academia, and government projects.
Having been an avid user of S/S-Plus in the 1990's, I seamlessly switched over to R in the early 2000's. To this day, I consider the S/S-Plus user manuals to be the best reference and introductory books on the R programming language. (Mike wholeheartedly agrees.) R has been popular in the statistics community for many years, but I've been noticing that its visualization and analytic capabilities are attracting interest from developers. Moreover, recent efforts by the R community to improve its ability to scale large data sets (see brief update from Jay Emerson), will strengthen R's place in the Big Data stack.
tags: analytics, big data, r, ssd, statistics, video
| comments: 2
submit:
Fri
May 1
2009
The iTunes App Store and One-hit Wonders
by Ben Lorica | @dliman | comments: 0Thousands of sellers created the 40,000 apps that have appeared in the U.S. iTunes app store. Measured in terms of apps per seller, developer and vendor engagement has gotten stronger over time:
The above average (mean) is somewhat misleading: 52% of sellers have produced just one app, and 80% have released 3 or fewer. Certain types of apps (e.g. electronic books) are easier to create, thus inflating the overall average app per seller. The disparity in complexity across categories is captured in the chart below. Aside from Books, Travel and Education apps also tend to be easy to develop and launch. (Note: Some apps are listed in more than one category.) The number of apps per seller also depends on whether one is interested in Paid or Free apps.
tags: iphone, mobile, platform, slides
| comments: 0
submit:
Recent Posts
- How Big Data Impacts Analytics on April 28, 2009
- Active Facebook Users By Country on April 19, 2009
- Waiting for the Billionth Download on April 16, 2009
- Big Data: Technologies and Techniques for Large-Scale Data on March 23, 2009
- Celebrities Embrace Twitter (and vice-versa) on March 17, 2009
- Facebook is Growing Fast in Asia, Europe, and the Middle East on March 5, 2009
- The Fastest-Growing Category in the iTunes App Store: Books on March 3, 2009
- Parallel Computing on Late-Night TV on February 11, 2009
- The Incubation Period for iPhone Apps is Declining on January 9, 2009
- Facebook Growth Regions and Gender Split on December 4, 2008
















