"strata session sc 2013" entries

Your analytics talent pool is not made up of misanthropes

Tips for interacting with analytics colleagues

To quote Pride and Prejudice, businesses have for many years “labored under the misapprehension” that their analytics talent was made up of misanthropes with neither the will nor the ability to communicate or work with others on strategic or creative business problems. These employees were meant to be kept in the basement out of sight, fed bad pizza, and pumped for spreadsheets to be interpreted in the sunny offices aboveground.

This perception is changing in industry as the big data phenomenon has elevated data science to a C-level priority. Suddenly folks once stereotyped by characters like Milton in Office Space are now “sexy.” The truth is there have always been well-rounded, articulate, friendly analytics professionals (they may just like Battlestar more than you), and now that analytics is an essential business function, personalities of all types are being attracted to practice the discipline.

Read more…

Communicating data clearly

Preview of Strata Santa Clara 2013 Session

The 2013 Strata Conference in Santa Clara, CA will be my fifth Strata conference. As always, I’m excited to join so many leaders in the data and data viz communities, and I’m honored that I’ll be speaking there.

I will be presenting my tutorial “Communicating Data Clearly” at 9AM on Tuesday, February 26. This talk will cover methods and principles of creating effective graphs, to ensure they are clear, accurate, and make it easier to understand the data. It will also emphasize how to avoid common graphical mistakes. To give you a preview of a few of the topics I will be covering as well as to provide some information to those who cannot attend, I will now link to some of the blog posts I‘ve written for Forbes. I was invited to blog for Forbes at a New York Strata Conference in 2011 so that my relationships with Forbes and Strata are intertwined.

Read more…

Public health case study: Tracking zombies and vampires using social media

Preview of Strata Santa Clara 2013 Session

Towards the end of 2012, a battle that the pitted state versus state, father versus son, wife versus Bunco group, dog versus cat, finally reached a truce spawned by the treaty we all sign every fours years known as the presidential election. While the death match between red versus blue states has finally faded from our televisions and twitter feeds, we can now focus on the real issues of the day.

Longer then Romney’s candidacy bid for the white house, there has been a war going on in America, an undeath match of sorts between Zombies and Vampires. Like a flu pandemic sweeping the nation, the undead have been infiltrating our lives in every aspect. What traditionally was only a mild outbreak in October has turned into a year round epidemic that our society cannot seem to shake.

Read more…

Privacy in the Online Ecosystem: Obligations and Best Practices Are Evolving

Preview of upcoming session at Strata Santa Clara

At the end of 2012, the Federal Trade Commission (“FTC”) hosted the public workshop, “The Big Picture – Comprehensive Online Data Collection,” which focused on privacy concerns relating to the comprehensive collection of consumer online data by Internet service providers (“ISPs”), operating systems, browsers, search engines, and social media. During the workshop, panelists debated the impact of service providers’ ability to collect data about computer and device users across unaffiliated websites, including when some entities have no direct relationship with such users.

As one example of the issues raised by the panelists, Professor Neil Richards, from the Washington University in St. Louis School of Law, stated that, despite its benefits, comprehensive data collection infringes on the concept of “intellectual privacy,” which is predicated on consumers’ ability to freely search, interact, and express themselves online. Professor Richards also stated that comprehensive data collection is creating a transformational power shift in which businesses can effectively persuade consumers based on their knowledge of consumer preferences. Yet, according to Professor Richards, few consumers actually understand “the basis of the bargain,” or the extent to which their information is being collected.

Read more…

Building recommendation platforms with Hadoop

Preview of upcoming session at the Strata Conference

Recommendations are making their way into more and more products. Using larger datasets are significantly improving the recommendations. Hadoop is being increasingly used for building out the recommendation platforms. Some of the examples of Recommendations include product recommendations, merchant recommendations, content recommendations, social recommendations, query recommendation, display and search ads.

With the number of options available to the users ever increasing, the attention span of customers is getting lower and lower at the very fast pace. At any given moment, the customers are getting used to seeing their best choices right in front of them. In such a scenario, we see recommendations powering more and more features of the products and driving user interaction. Hence companies are looking for more ways to minutely target customers at the right time. This brings in big data into the picture. Succeeding with data and building new markets, or changing the existing markets is the game being played in many high stake scenarios. Some companies have found the way to build their big data recommendation/machine learning platform giving them the edge in bringing better and better products ever faster to the market. Hence, there is a strong case for looking at recommendations/machine learning on big data as a platform in a company, rather than something of a black box that magically produces the right results. The platform allows us to build various other features like fraud detection, spam detection, content enrichment and serving etc. making it viable in the long run. It is not just about recommendations.

Read more…

Strata Conference in Santa Clara 2013 Startup Showcase

We asked the Startup Showcase judges three questions about the big data industry.

The Startup Showcase returns to Strata this month, with 10 startup finalists pitching our panel of judges. We’ve assembled an enviable— and somewhat intimidating— lineup of experts to help narrow down the field.

judges

In the interest of giving our finalists a head start, we asked the judges three questions about the big data industry.

Read more…

Just the basics: refreshingly void of any semblance of big data

Strata Santa Clara session preview on core data science skills

The McKinsey Global Institute forecasts a shortage of over 140,000 data scientists in the U.S. by 2018. I forecast a shortage of 140,000 people to explain to their respective hiring managers that make it Hadoop is not an appropriate articulation of what these people can or should do. If big data is the new bubble, then here’s to the prolonged correct data recession that hopefully follows.

Correct data? Such skills used to be called unsexy names like statistics or scientific experiments, but we now prefer to spice up the job titles (and salaries!) a bit and brand ourselves as data scientists, data storytellers, data prophets, or—if my next promotion comes through—Lord High Chancellor of Data, appointed by the Sovereign on the advice of the Prime Minister to oversee Her Majesty’s Terabytes. Modesty, it sometimes feels, is low on the burgeoning list of big data skills.

Read more…

Design matters more than math

Design compels. Math is proof. Both sides will defend their domains at Strata's next Great Debate.

At Strata Santa Clara later this month, we’re reprising what has become a tradition: Great Debates. These Oxford-style debates pit two teams against one another to argue a hot topic in the fields of big data, ubiquitous computing, and emerging interfaces.

What matters more? Our teams for the Great Debate.Part of the fun is the scoring: attendees vote on whether they agree with the proposal before the debaters; and after both sides have said their piece, the audience votes again. Whoever moves the needle wins.

This year’s proposition — that design matters more than math — is sure to inspire some vigorous discussion. The argument for math is pretty strong. Math is proof. Given enough data — and today, we have plenty — we can know. “The right information in the right place just changes your life,” said Stewart Brand. Properly harnessed, the power of data analysis and modeling can fix cities, predict epidemics, and revitalize education. Abused, it can invade our lives, undermine economies, and steal elections. Surely the algorithms of big data matter!

But your life won’t change by itself. Bruce Mau defines design as “the human capacity to plan and produce desired outcomes.” Math informs; design compels. Without design, math can’t do its thing. Poorly designed experiments collect the wrong data. And if the data can’t be understood and acted upon, it may as well not have been crunched in the first place.

This is the question we’ll be putting to our debaters: Which matters more? A well-designed collection of flawed information — or an opaque, hard-to-parse, but unerringly accurate model? From mobile handsets to social policy, we need both good math and good design. Which is more critical? Read more…

Tax season + your identity = bucket loads of easy money for fraudsters

Take control of your identity and make sure your electronic footprint works for you.

Recently, the Wall Street Journal published an article discussing the explosion of tax-identity theft, which has ballooned to 1.1 million cases in 2011 from 51,700 in 2008. The Wall Street Journal mentioned that the Treasury Inspector General for Tax Administration reported an additional 1.5 million potentially fraudulent 2011 tax refunds totaling in excess of $5.2 billion.

What is the cause of this? Taxpayers now have the option to file their taxes online and receive their refunds directly deposited into bank accounts, and this tax filing method creates all kinds of opportunities for electronic fraud.

How does this affect you? Fraudsters who have certain pieces of your data can file a tax return under your identity and receive your refund. Fraudsters look for “at-risk” identities that they can use in scale to scam refunds out of the IRS. “At risk” identities include deceased identities, identities of minors, and legitimate citizens. Fraudsters only need a valid name and social security number combination. How much time, effort and money do you think it would take you to convince the IRS that a fraudster stole your identity, filed your taxes and received your refund? Even though this trend is on the rise and the government has seen it, the onus would still be on you, to prove that you were defrauded.

Read more…

Exploring web standards for high data density visualizations

A sneak peek at an upcoming visualization session from the 2013 Strata Conference in Santa Clara, Calif.

Strata Editor’s Note: Over the next few weeks, the Strata Community Site will be providing sneak peeks of upcoming sessions at the Strata Conference in Santa Clara. Nicolas’ sneak peek is the first in this series. 

Last year was a great year for data visualization at Twitter. Our Analytics team expanded and created a dedicated data visualization team, and some of our projects were released publicly with great feedback.

Our first public interactive of 2012 was a fun way to expose how the Eurocup was experienced at Twitter. You can see in this organic visualization how people cheered for  their teams during each match, and how the tension and volume of  tweets increased towards the finals.

NB StrataSC 2013 image1

Read more…