Ben Lorica

Ben Lorica

Ben Lorica is a Senior Analyst in the Research Group at O'Reilly Media, Inc.. He has applied Business Intelligence, Data Mining and Statistical Analysis in a variety of settings including Direct Marketing, Consumer and Market Research, Targeted Advertising, Text Mining, and Financial Engineering. His background includes stints with an investment management company, internet startups, and financial services. At O'Reilly, Ben works on custom research and consulting projects, open source data warehousing and analytics.

 

Wed

Jul 1
2009

The US Online Job Market Was (still) Down Big In June 2009

by Ben Lorica@dlimancomments: 1

Updating my post from early June, the U.S. online job market still hasn't shown signs of recovering from steady declines that began in September of last year. Compared to the same period last year, there were 50% fewer job postings in June 2009.

pathint

An alternate view highlights the start of the downward trend, as well as the smaller than expected seasonal bounce from Dec-08 to Jan/Feb 2009. In a normal year, the number of postings decline in December (as employers table job searches for after the holidays) and recovers sharply the following Jan/Feb. While job postings did bounce back in Jan/Feb 2009, the seasonal bump was less than half of what occurred in previous years.

pathint

No geographic region has been exempt from the downturn in online job postings. There have been sharp declines in all states, ranging from -59% in DE, WY, and MN, to -38% in MD, OK, VA.

pathint

In closing, we still haven't detected the green shoots that some forecasters have been crowing about over the last few months. If one were to take an optimistic perspective, the worse year-over-year decline occurred in April. OTOH, we are still staring at a 50% decline in June 2009. So while we may have hit the bottom in April, we need a few strong(er) months before we can comfortably announce the arrival of green shoots.

(†) In partnership with SimplyHired and Greenplum, we maintain a data warehouse that contains most U.S. online job postings dating back to mid-2005. Data for this post was through 6/28/2009.

tags: big data, economy, jobscomments: 1
submit:

 

Fri

Jun 19
2009

Facebook Adds Million of Users in Asia

by Ben Lorica@dlimancomments: 1

Since my previous post on Facebook users by country, the company has grown rapidly in Asia. Over the last 12 weeks, Facebook grew 90% in Asia going from 11.4 to 21.7 million active users. With a Market Penetration of only 0.6% in Asia, Facebook has barely scratched the surface in the region.

pathint

The company also gained 11.3M users in Europe (up 19%) and 14.7M users in North America (up 21%) over the last 12 weeks. On a year-over-year basis, Facebook grew 194% (adding close to 150 million active users worldwide) from Jun/2008 to Jun/2009.

For more details, you can view regional numbers below:

 

Thu

Jun 11
2009

Mechanical Turk Best Practices

by Ben Lorica@dlimancomments: 8

Last night, Dolores Labs hosted what was billed as the first-ever Mechanical Turk meetup, and I was fortunate enough to have been able to squeeze into what turned out to be a great series of presentations. While Amazon was the pioneer and remains the largest provider in the space, other services like Dolores Labs and Nathan Eagle's txteagle have emerged to expand the pool of users and turks.

In the past, we've turned to Dolores Labs when we needed (machine-learning) training sets and were unable to quickly find reliable ones. To increase the quality of the output we receive from turks, we try to get multiple turks to perform an individual task and aggregate their work into a single answer. (We jokingly refer to this as the wisdom of micro-crowds.) Working on problems quite different from the ones we tackle, the first set of speakers presented research results confirming that this form of aggregation actually works. Rion Snow of Stanford's AI Lab presented results that suggest that for a large set of tasks, the aggregate work of 4-6 turks compare favorably to the work of a single (domain) expert. Working primarily in the area of NLP and computational linguistics, Bob Carpenter of alias-i presented similar results when evaluating turk-generated against gold standard training sets. (It's hard enough when turks disagree, but as Bob Carpenter highlighted, disagreements among experts makes it difficult to arrive at a gold standard.) Bob has found that in certain situations an iterative approach works best ("code-a-little", "learn-a-little") and tools that allow you to start suggesting "answers" to a new set of turks would help immensely. Coincidentally, one of the speakers presented a toolkit that allows users to do just that: Greg Little's TurKit is a JavaScript API for running iterative tasks in mechanical turk.

Another set of speakers talked about the emergence of mechanical turks as a research tool. Social scientists Aaron Shaw and John Horton spoke of favorably of their experience using turks for research experiments in economics and paired surveys. Among other things, they've conducted studies on the turk labor market by testing demand for tasks of varying difficulty (something Bob Carpenter also talked about), and by evaluating demand for follow-on tasks at lower wages. Alexander Sorokin of UIUC, presented work on using turks to annotate training sets for computer vision and robotics. For those interested in using turks to annotate images, Alex has a toolkit ready to go.

For most users of mechanical turk (us included), it has become an API call that fits smoothly within their workflow. (Or as someone at the meetup wryly suggested, turk is a Remote Person Call.) The last pair of speakers, Lilly Irani and Six Silberman, reminded us that behind mechanical turk lies thousands of workers ("the crowd in the cloud") working without (health care) benefits, oftentimes at extremely low hourly wages. Irani and Silberman suggested that rather than abstracting mechanical turk services as mere API calls, users should start thinking of the plight of the turks ("Mechanical Turk Bill of Rights") behind the service. As a first step they have a released a Firefox plugin that aims to narrow the information assymetry between turks (those performing tasks) and requesters (those posting tasks). While requesters can see ratings for turks, requesters aren't rated: Turkopticon lets turks rate requesters. They need more turks to download and start using Turkopticon, so if you know any mechanical turks please enourage them do so.

(†) According to Amazon representatives in the audience, a majority of turks are in the U.S. That may change in the future, once Amazon is able to get approval for other payment systems. Because of the possibility of money-laundering, services like AMT are subject to strict KYC controls.

 

Wed

Jun 3
2009

The Economic Crisis and the US Online Job Market

by Ben Lorica@dlimancomments: 10

In my previous post, I noted that despite the large decline in total number of job postings, the number Hadoop/MapReduce job postings increased by 49%. What is the current state of the online job market? The financial crisis that began in the Fall of 2008 has had a lasting negative effect on the U.S. online job market. Since late 2008, there have been significantly less jobs posted online.

Using data from SimplyHired and a few charts, I'll quickly highlight the impact of the global economic crisis on the U.S. online job market. To quantify the sudden drop in U.S. online job postings, I calculated the average number of job posts per day:

pathint

The number of posts declined 49% from Jan/May 2008 to Jan/May 2009. While there has been a downward trend since April 2008, the financial crisis in September 2008 marked the start of even larger reductions. In particular, the relatively small number of job postings in Nov/Dec 2008 has carried over into the first five months of 2009. The sharp seasonal rebound that occurs in Jan/Feb of each year, was practically non-existent in 2009. While some forecasters are seeing signs of a recovery, at least through the first five months of 2009, we haven't detected "green shoots" in the U.S. online job market.

(continue reading)

tags: big data, jobscomments: 10
submit:

 

Mon

Jun 1
2009

Most Hadoop Jobs Are In California

by Ben Lorica@dlimancomments: 3

Given the recent buzz surrounding Hadoop and MapReduce, I was curious if employers were beginning to mention either term in their job postings. Fortunately I have access to a massive job data warehouse dating back to mid-2005. In partnership with SimplyHired and Greenplum, we maintain a data warehouse that contains most of the online job postings in the U.S.

While the percentage of job postings that mention either Hadoop or MapReduce remains miniscule, the number of such postings is growing steadily:

pathint

The number of Hadoop/MapReduce job postings (during the Feb/Apr 2009 period) grew 49% compared to 2008. In contrast, the tough economic environment has translated to significantly fewer job postings: the total number of online job postings declined 40% during the same period.

How mainstream is Hadoop? While researching our report on Big Data, we talked to a (database) vendor who jokingly claimed that nobody outside of the West & East coast cared about Hadoop. Analysis of recent job postings seems to support that perspective. During the three most recent months, employers in 18 states posted Hadoop/MapReduce jobs online, but 60% of those were in California. The top 5 states (CA, MD, NY, MA, WA) accounted for 87% of the Hadoop/MapReduce job postings:

pathint

Looking at the same period last year, 72% of the job postings were in California, and the top 5 states (CA, WA, TX, PA, VA) accounted for 79%.

Given the presence of large (Google, Yahoo!, Facebook) and small companies (Cloudera, Greenplum, Aster, ...) who are leaders in the use of Hadoop/MapReduce, it's no surprise that at this early stage, a large share of jobs are in California. While the share of California job postings remains high (60%), it's down from 72% last year. As mentioned above, the percentage of job postings that mention either Hadoop or MapReduce remains miniscule, so I caution against reading too much into the geographic distributions. Nevertheless, it's clear that California employers are expressing interest in Hadoop skills ahead of their peers in other states.

tags: big data, hadoop, jobscomments: 3
submit:

 

Mon

May 18
2009

Being a Suggested User Leads to Thousands of Twitter Followers

by Ben Lorica@dlimancomments: 4

Ever since Twitter started suggesting accounts to new users, it was clear that those on the suggested users list were gaining thousands of followers. Setting aside the fact that number of followers is a poor gauge of influence (see our Twitter report for details), I wanted to know how many followers a suggested account gains by appearing on the list.

I took the set of accounts that were added to the suggested users list during the last two months, recorded their number of followers the day before they made the list (Initial # of followers), and tracked what happened a week, 2 weeks, and a month later. From an initial set of just over a hundred accounts, I was able to gather sufficient data (using Twitterholic and Twittercounter) on 80+ suggested users.

pathint

(continue reading)

tags: hard numbers, twittercomments: 4
submit:

 

Wed

May 13
2009

2 Years Later, the Facebook App Platform is Still Thriving

by Ben Lorica@dlimancomments: 8

In a few weeks, the Facebook application platform will mark its second anniversary. While it garnered lots of press coverage in the months after it launched, the arrival of the iTunes app store shifted attention away from Facebook's vibrant ecosystem. The media glow is understandable: among other things, the younger iTunes platform is adding apps at a much faster rate than Facebook or Myspace.

pathint

Games comprise a sizable chunk of app revenues in all three platforms and recent stories suggest that 2009 has been a great year for developers. The substantial revenue generated by popular Facebook (and Myspace) apps has been the subject of articles in VentureBeat, TechCrunch, and Inside Facebook. There have also been recent estimates for the revenue generated by iPhone apps (see here and here). Game developers in particular are benefiting from having a multitude of platforms: Games are the largest iTunes category, and the second largest category in both Facebook and Myspace. In addition, 4 of the top 10 most successful Facebook app providers are Game developers.

(continue reading)

 

Fri

May 8
2009

Up Close with an Enigma

by Ben Lorica@dlimancomments: 6

At last month's RSA conference in San Francisco, I stumbled upon a vintage 1944 model of the German crypothographic machine, popularly known as the Enigma. This particular machine was owned by the National Cryptologic Museum, and was part of a larger booth hosted by the National Security Agency. The staff at the exhibit were quite friendly and it didn't take much to convince someone from the NSA to talk on-camera about the Enigma. (I did decide to submit the video to the NSA public affairs office for final review.) Reading through the accompanying historical pamphlet and listening to NSA staffers, I developed a better appreciation for the contributions made by Polish authorities (and mathematicians) towards breaking what was then, the most important cryptographic machine in the world.

Also from RSA 2009:

  • Making Mashups Safe(r) with MashSSL: Of the ten presentations at the inaugural RSA Innovation Sandbox, I thought the most intriguing technology came from SafeMashups (a startup out of UT San Antonio). They use SSL certificates and handshakes as the foundation for a scalable trust infrastructure.
  • tags: history, mashup, oauth, securitycomments: 6
    submit:

     

    Mon

    May 4
    2009

    Big Data: SSD's, R, and Linked Data Streams

    by Ben Lorica@dlimancomments: 2

    The Solid State Storage Revolution: If you haven't seen it, I recommend you watch Andy Bechtolsheim's keynote at the recent Mysqlconf. We covered SSD's in our just published report on Big Data management technologies. Since then, we've gotten additional signals from our network of alpha geeks and our interest in them remains high.

    R and Linked Data Streams: I had a chance to visit with Dataspora founder and blogger Mike Driscoll, an enthusiastic advocate for the use of the open source statistical computing language, R. After founding and leading online retailer CustomInk.com, Mike went back to grad school and earned a doctorate in Bioinformatics. He has applied data analysis and programming in a variety of domains including retail, biotech, academia, and government projects.

    Having been an avid user of S/S-Plus in the 1990's, I seamlessly switched over to R in the early 2000's. To this day, I consider the S/S-Plus user manuals to be the best reference and introductory books on the R programming language. (Mike wholeheartedly agrees.) R has been popular in the statistics community for many years, but I've been noticing that its visualization and analytic capabilities are attracting interest from developers. Moreover, recent efforts by the R community to improve its ability to scale large data sets (see brief update from Jay Emerson), will strengthen R's place in the Big Data stack.

    (continue reading)

     

    Fri

    May 1
    2009

    The iTunes App Store and One-hit Wonders

    by Ben Lorica@dlimancomments: 0

    Thousands of sellers created the 40,000 apps that have appeared in the U.S. iTunes app store. Measured in terms of apps per seller, developer and vendor engagement has gotten stronger over time:

    pathint

    The above average (mean) is somewhat misleading: 52% of sellers have produced just one app, and 80% have released 3 or fewer. Certain types of apps (e.g. electronic books) are easier to create, thus inflating the overall average app per seller. The disparity in complexity across categories is captured in the chart below. Aside from Books, Travel and Education apps also tend to be easy to develop and launch. (Note: Some apps are listed in more than one category.) The number of apps per seller also depends on whether one is interested in Paid or Free apps.

    (continue reading)

    tags: iphone, mobile, platform, slidescomments: 0
    submit:

     

    BENS'S TWITTER UPDATES

    RELEASE 2.0

    CURRENT CONFERENCES