Apple Watch and the skin as interface

The success of Apple’s watch, and of wearables in general, may depend on brain plasticity.

Recently, to much fanfare, Apple launched a watch. Reviews were mixed. And the watch may thrive — after all, once upon a time, nobody knew they needed a tablet or an iPod. But at the same time, today’s tech consumer is markedly different from those at the dawn of the Web, and the watch faces a different market all together.

Apple_Watch

Apple Watches. Source: Apple.

One of the more positive reviews came from tech columnist Farhad Manjoo. In it, he argued that we’ll eventually give in to wearables for a variety of reasons.

“It was only on Day 4 that I began appreciating the ways in which the elegant $650 computer on my wrist was more than just another screen,” he wrote. “By notifying me of digital events as soon as they happened, and letting me act on them instantly, without having to fumble for my phone, the Watch became something like a natural extension of my body — a direct link, in a way that I’ve never felt before, from the digital world to my brain.”

On-body messaging and brain plasticity

Manjoo uses the term “on-body messaging” to describe the variety of specific vibrations the watch emits, and how quickly he came to accept them as second nature. The success of Apple’s watch, and of wearables in general, may be due to this brain plasticity. Read more…

Comment: 1

Topic modeling for the newbie

Learning the fundamentals of natural language processing.

Get “Data Science from Scratch” at 50% off with code DATA50. Editor’s note: This is an excerpt from our recent book Data Science from Scratch, by Joel Grus. It provides a survey of topics from statistics and probability to databases, from machine learning to MapReduce, giving the reader a foundation for understanding, and examples and ideas for learning more.

When we built our Data Scientists You Should Know recommender in Chapter 1, we simply looked for exact matches in people’s stated interests.

A more sophisticated approach to understanding our users’ interests might try to identify the topics that underlie those interests. A technique called Latent Dirichlet Analysis (LDA) is commonly used to identify common topics in a set of documents. We’ll apply it to documents that consist of each user’s interests.

LDA has some similarities to the Naive Bayes Classifier we built in Chapter 13, in that it assumes a probabilistic model for documents. We’ll gloss over the hairier mathematical details, but for our purposes the model assumes that:

  • There is some fixed number K of topics.
  • There is a random variable that assigns each topic an associated probability distribution over words. You should think of this distribution as the probability of seeing word w given topic k.
  • There is another random variable that assigns each document a probability distribution over topics. You should think of this distribution as the mixture of topics in document d.
  • Each word in a document was generated by first randomly picking a topic (from the document’s distribution of topics) and then randomly picking a word (from the topic’s distribution of words).

In particular, we have a collection of documents, each of which is a list of words. And we have a corresponding collection of document_topics that assigns a topic (here a number between 0 and K – 1) to each word in each document. Read more…

Comment: 1

Startups suggest big data is moving to the clouds

A look at the winners from a showcase of some of the most innovative big data startups.

At Strata + Hadoop World in London last week, we hosted a showcase of some of the most innovative big data startups. Our judges narrowed the field to 10 finalists, from whom they — and attendees — picked three winners and an audience choice.

Underscoring many of these companies was the move from software to services. As industries mature, we see a move from custom consulting to software and, ultimately, to utilities — something Simon Wardley underscored in his Data Driven Business Day talk, and which was reinforced by the announcement of tools like Google’s Bigtable service offering.

This trend was front and center at the showcase:

  • Winner Modgen, for example, generates recommendations and predictions, offering machine learning as a cloud-based service.
  • While second-place Brytlyt offers their high-performance database as an on-premise product, their horizontally scaled-out architecture really shines when the infrastructure is elastic and cloud based.
  • Finally, third-place OpenSensors’ real-time IoT message platform scales to millions of messages a second, letting anyone spin up a network of connected devices.

Ultimately, big data gives clouds something to do. Distributed sensors need a widely available, connected repository into which to report; databases need to grow and shrink with demand; and predictive models can be tuned better when they learn from many data sets. Read more…

Comment: 1

The tensor renaissance in data science

The O'Reilly Data Show Podcast: Anima Anandkumar on tensor decomposition techniques for machine learning.

glory_nosha_flickr

After sitting in on UC Irvine Professor Anima Anandkumar’s Strata + Hadoop World 2015 in San Jose presentationI wrote a post urging the data community to build tensor decomposition libraries for data science. The feedback I’ve gotten from readers has been extremely positive. During the latest episode of the O’Reilly Data Show Podcast, I sat down with Anandkumar to talk about tensor decomposition, machine learning, and the data science program at UC Irvine.

Modeling higher-order relationships

The natural question is: why use tensors when (large) matrices can already be challenging to work with? Proponents are quick to point out that tensors can model more complex relationships. Anandkumar explains:

Tensors are higher order generalizations of matrices. While matrices are two-dimensional arrays consisting of rows and columns, tensors are now multi-dimensional arrays. … For instance, you can picture tensors as a three-dimensional cube. In fact, I have here on my desk a Rubik’s Cube, and sometimes I use it to get a better understanding when I think about tensors.  … One of the biggest use of tensors is for representing higher order relationships. … If you want to only represent pair-wise relationships, say co-occurrence of every pair of words in a set of documents, then a matrix suffices. On the other hand, if you want to learn the probability of a range of triplets of words, then we need a tensor to record such relationships. These kinds of higher order relationships are not only important for text, but also, say, for social network analysis. You want to learn not only about who is immediate friends with whom, but, say, who is friends of friends of friends of someone, and so on. Tensors, as a whole, can represent much richer data structures than matrices.

Read more…

Comment: 1

The unwelcome guest: Why VMs aren’t the solution for next-gen applications

Scale-out applications need scaled-in virtualization.

scale_in_esterno_Mia_Felicita_Bertelli_FlickrData center operating systems are emerging as a first-class category of distributed system software. Hadoop, for example, is evolving from a MapReduce framework into YARN, a generic platform for scale-out applications.

To enable a rich ecosystem of diverse applications to coexist on these platforms, providing adequate isolation is crucial. The isolation mechanism must enforce resource limits, decouple software dependencies among applications and the host, provide security and privacy, confine failures, etc. Containers offer a simple and elegant solution to the problem. However, a question that comes up frequently is: Why not virtual machines (VMs)? After all, these systems face a number of the same challenges that have been solved by virtualization for traditional enterprise applications.

All problems in computer science can be solved by another level of indirection, except of course for the problem of too many indirections” — David Wheeler

Read more…

Comment: 1

On the evolution of machine learning

From linear models to neural networks: an interview with Reza Zadeh.

Get notified when our free report, “Future of Machine Intelligence: Perspectives from Leading Practitioners,” is available for download. The following interview is one of many that will be included in the report.

As part of our ongoing series of interviews surveying the frontiers of machine intelligence, I recently interviewed Reza Zadeh. Reza is a Consulting Professor in the Institute for Computational and Mathematical Engineering at Stanford University and a Technical Advisor to Databricks. His work focuses on Machine Learning Theory and Applications, Distributed Computing, and Discrete Applied Mathematics.

Key Takeaways

  • Neural networks have made a comeback and are playing a growing role in new approaches to machine learning.
  • The greatest successes are being achieved via a supervised approach leveraging established algorithms.
  • Spark is an especially well-suited environment for distributed machine learning.

David Beyer: Tell us a bit about your work at Stanford

Reza Zadeh: At Stanford, I designed and teach distributed algorithms and optimization (CME 323) as well as a course called discrete mathematics and algorithms (CME 305). In the discrete mathematics course, I teach algorithms from a completely theoretical perspective, meaning that it is not tied to any programming language or framework, and we fill up whiteboards with many theorems and their proofs. Read more…

Comment: 1