Commerce and censorship in in cross-cultural social media.
Culture has a huge impact on social media adoption and usage. In Measuring Culture, I talked about specific cultural traits and attitudes, and I described how those things are being measured on social media. For this article, I’ll outline broader patterns in cross-cultural social media, specifically regarding commerce and censorship.
Commerce finds a way
Commerce always finds a way. Whether restricted by red tape or blocked by citizens’ mutual distrust, money flows around obstacles. Most examples don’t have much cultural data-crunching associated with them yet, but there are intriguing gestures in that direction. For instance, a recent survey from the European Commission reported that 62% of European Internet users say they use non-native language sites to communicate with friends online, but only 18% would use a non-native language site to buy something. Read more…
Digital media influences culture -- and it's influenced by culture in turn
Digital media influences culture — and it’s influenced by culture in turn. Culture matters in business: Facebook just spent an astonishing $19 billion to acquire WhatsApp because of WhatsApp’s international presence. Culture also matters politically: Turkey’s leader recently made it the latest country to try blocking Twitter. How can we use data to understand culture’s impact on digital media adoption and usage? How can we even measure culture in the first place?
Dimensions of Culture, And How They’re Being Measured
Some cross-cultural elements are easy to see. For example, when a brand or platform expands into a new country, their stuff should obviously be translated. Sometimes marketers use basic analytical tools to cut words that are controversial in a given culture, like the Chinese localization service Kawo, which screens English tweets for words that would be sensitive in China before any actual translation is done.
Strata SC 2014 Session Postmortem
In February, GraphLab took a road trip to Strata, a Big Data conference organized by O’Reilly. It was a gathering of close to 3100 people–engineers, business folks, industry evangelists, and data scientists. We had a lot of fun meeting and socializing with our peers and customers. Amidst all the conference excitement, we presented two talks. Carlos Guestrin, our intrepid CEO, held a tutorial on large-scale machine learning. I gave a talk in the Hardcore Data Science track.
Insight from a Strata Santa Clara 2014 session
When you think about what goes into winning a Nobel Prize in a field like economics, it’s a lot like machine learning. In order to make a breakthrough, you need to identify an interesting theory for explaining the world, test your theory in practice to see if it holds up, and if it does, you’ve got a potential winner. The bigger and more significant the issue addressed by your theory, the more likely you are to win the prize.
In the world of business, there’s no bigger issue than helping a company be more successful, and that usually hinges on helping it deliver its products to those that need them. This is why I like to describe my company SalesPredict as helping our customers win the Nobel Prize in business, if such a thing existed.
Focusing attention on the present lets organizations pursue existing opportunities as opposed to projected ones
Slow and Unaware
It was 2005. The war in Iraq was raging. Many of us in the national security R&D community were developing responses to the deadliest threat facing U.S. soldiers: the improvised explosive device (IED). From the perspective of the U.S. military, the unthinkable was happening each and every day. The world’s most technologically advanced military was being dealt significant blows by insurgents making crude weapons from limited resources. How was this even possible?
The war exposed the limits of our unwavering faith in technology. We depended heavily on technology to provide us the advantage in an environment we did not understand. When that failed, we were slow to learn. Meanwhile the losses continued. We were being disrupted by a patient, persistent organization that rapidly experimented and adapted to conditions on the ground.
To regain the advantage, we needed to start by asking different questions. We needed to shift our focus from the devices that were destroying U.S. armored vehicles to the people responsible for building and deploying the weapons. This motivated new approaches to collect data that could expose elements of the insurgent network.
New organizations and modes of operation were also required to act swiftly when discoveries were made. By integrating intelligence and special operations capabilities into a single organization with crisp objectives and responsive leadership, the U.S. dramatically accelerated its ability to disrupt insurgent operations. Rapid orientation and action were key in this dynamic environment where opportunities persisted for an often unknown and very limited period of time.
This story holds important and under appreciated lessons that apply to the challenges numerous organizations face today. The ability to collect, store, and process large volumes of data doesn’t confer advantage by default. It’s still common to fixate on the wrong questions and fail to recover quickly when mistakes are made. To accelerate organizational learning with data, we need to think carefully about our objectives and have realistic expectations about what insights we can derive from measurement and analysis.
Insights from a business executive and law professor
If you develop software or manage databases, you’re probably at the point now where the phrase “Big Data” makes you roll your eyes. Yes, it’s hyped quite a lot these days. But, overexposed or not, the Big Data revolution raises a bunch of ethical issues related to privacy, confidentiality, transparency and identity. Who owns all that data that you’re analyzing? Are there limits to what kinds of inferences you can make, or what decisions can be made about people based on those inferences? Perhaps you’ve wondered about this yourself.
We’re obsessed by these questions. We’re a business executive and a law professor who’ve written about this question a lot, but our audience is usually lawyers. But because engineers are the ones who confront these questions on a daily basis, we think it’s essential to talk about these issues in the context of software development.
While there’s nothing particularly new about the analytics conducted in big data, the scale and ease with which it can all be done today changes the ethical framework of data analysis. Developers today can tap into remarkably varied and far-flung data sources. Just a few years ago, this kind of access would have been hard to imagine. The problem is that our ability to reveal patterns and new knowledge from previously unexamined troves of data is moving faster than our current legal and ethical guidelines can manage. We can now do things that were impossible a few years ago, and we’ve driven off the existing ethical and legal maps. If we fail to preserve the values we care about in our new digital society, then our big data capabilities risk abandoning these values for the sake of innovation and expediency.
By David Andrzejewski of SumoLogic
A few weeks ago I had the pleasure of hosting the machine data track of talks at Strata Santa Clara. Like “big data”, the phrase “machine data” is associated with multiple (sometimes conflicting) definitions, two prominent ones come from Curt Monash and Daniel Abadi. The focus of the machine data track is on data which is generated and/or collected automatically by machines. This includes software logs and sensor measurements from systems as varied as mobile phones, airplane engines, and data centers. The concept is closely related to the “internet of things”, which refers to the trend of increasing connectivity and instrumentation in existing devices, like home thermostats.
More data, more problems
This data can be useful for the early detection of operational problems or the discovery of opportunities for improved efficiency. However, the decoupling of data generation and collection from human action means that the volume of machine data can grow at machine scales (i.e., Moore’s Law), an issue raised by both Monash and Abadi. This explosive growth rate amplifies existing challenges associated with “big data”. In particular two common motifs among the talks at Strata were the difficulties around:
- mechanics: the technical details of data collection, storage, and analysis
- semantics: extracting understandable and actionable information from the data deluge
Today, it’s shocking (and honestly exciting) how much of my daily experience is determined by a recommender system. These systems drive amazing experiences everywhere, telling me where to eat, what to listen to, what to watch, what to read, and even who I should be friends with. Furthermore, information overload is making recommender systems indispensable, since I can’t find what I want on the web simply using keyword search tools. Recommenders are behind the success of industry leaders like Netflix, Google, Pandora, eHarmony, Facebook, and Amazon. It’s no surprise companies want to integrate recommender systems with their own online experiences. However, as I talk to team after team of smart industry engineers, it has become clear that building and managing these systems is usually a bit out of reach, especially given all the other demands on the team’s time.
Tools, Trends, What Pays (and What Doesn't) for Data Professionals
There is no shortage of news about the importance of data or the career opportunities within data. Yet a discussion of modern data tools can help us understand what the current data evolution is all about, and it can also be used as a guide for those considering stepping into the data space or progressing within it.
In our report, 2013 Data Science Salary Survey, we make our own data-driven contribution to the conversation. We collected a survey from attendees of the Strata Conference in New York and Santa Clara, California, about tool usage and salary.
Strata attendees span a wide spectrum within the data world: Hadoop experts and business leaders, software developers and analysts. By no means does everyone use data on a “Big” scale, but almost all attendees have some technical aspect to their role. Strata attendees may not represent a random sample of all professionals working with data, but they do represent a broad slice of the population. If there is a bias, it is likely toward the forefront of the data space, with attendees using the newest tools (or being very interested in learning about them).
Sneak peek at an upcoming tutorial at Strata Santa Clara 2014
Apache Hadoop 2.0 represents a generational shift in the architecture of Apache Hadoop. With YARN, Apache Hadoop is recast as a significantly more powerful platform – one that takes Hadoop beyond merely batch applications to taking its position as a ‘data operating system’ where HDFS is the file system and YARN is the operating system.
YARN is a re-architecture of Hadoop that allows multiple applications to run on the same platform. With YARN, applications run “in” Hadoop, instead of “on” Hadoop: