- A Worm’s Mind in a Lego Body — the c. elegans worm’s 302 neurons has been sequenced, modelled in open source code, and now hooked up to a Lego robot. It is claimed that the robot behaved in ways that are similar to observed C. elegans. Stimulation of the nose stopped forward motion. Touching the anterior and posterior touch sensors made the robot move forward and back accordingly. Stimulating the food sensor made the robot move forward. There is video.
- Show and Tell: A Neural Image Caption Generator — Google Research paper on generating captions like “Two pizzas sitting on top of a stove top oven” from a photo. Wow.
- Big Data with the ELK Stack — ElasticSearch, logstash, and Kibana. Interesting and powerful combination of tools!
- Apollo: Amazon’s Deployment Engine — Apollo will stripe the rolling update to simultaneously deploy to an equivalent number of hosts in each location. This keeps the fleet balanced and maximizes redundancy in the case of any unexpected events. When the fleet scales up to handle higher load, Apollo automatically installs the latest version of the software on the newly added hosts. Lust.
"Big Data" entries
From the Internet of Things to data-driven fashion, here are key insights from Strata + Hadoop World in Barcelona 2014.
Experts from across the big data world came together for Strata + Hadoop World in Barcelona 2014. We’ve gathered insights from the event below.
#IoTH: The Internet of Things and Humans
“If we could start over with these capabilities we have now, how would we do it differently?” Tim O’Reilly continues to explore data and the Internet of Things through the lens of human empowerment and the ability to “use technology to give people superpowers.”
Rajiv Maheswaran talks about the tools and techniques required to analyze new kinds of sports data.
Many data scientists are comfortable working with structured operational data and unstructured text. Newer techniques like deep learning have opened up data types like images, video, and audio.
Other common data sources are garnering attention. With the rise of mobile phones equipped with GPS, I’m meeting many more data scientists at start-ups and large companies who specialize in spatio-temporal pattern recognition. Analyzing “moving dots” requires specialized tools and techniques.
A few months ago, I sat down with Rajiv Maheswaran founder and CEO of Second Spectrum, a company that applies analytics to sports tracking data. Maheswaran talked about this new kind of data and the challenge of finding patterns:
“It’s interesting because it’s a new type of data problem. Everybody knows that big data machine learning has done a lot of stuff in structured data, in photos, in translation for language, but moving dots is a very new kind of data where you haven’t figured out the right feature set to be able to find patterns from. There’s no language of moving dots, at least not that computers understand. People understand it very well, but there’s no computational language of moving dots that are interacting. We wanted to build that up, mostly because data about moving dots is very, very new. It’s only in the last five years, between phones and GPS and new tracking technologies, that moving data has actually emerged.”
Schemas inevitably will change — Apache Avro offers an elegant solution.
When a team first starts to consider using Hadoop for data storage and processing, one of the first questions that comes up is: which file format should we use?
This is a reasonable question. HDFS, Hadoop’s data storage, is different from relational databases in that it does not impose any data format or schema. You can write any type of file to HDFS, and it’s up to you to process it later.
The usual first choice of file formats is either comma delimited text files, since these are easy to dump from many databases, or JSON format, often used for event data or data arriving from a REST API.
There are many benefits to this approach — text files are readable by humans and therefore easy to debug and troubleshoot. In addition, it is very easy to generate them from existing data sources and all applications in the Hadoop ecosystem will be able to process them. Read more…