- Behind the Scenes of a Dashboard Design — the design decisions that go into displaying complex info.
- Superconductor — a web framework for creating data visualizations that scale to real-time interactions with up to 1,000,000 data points. It compiles to WebCL, WebGL, and web workers. (via Ben Lorica)
- BIDMach: Large-scale Learning with Zero Memory Allocation (PDF) — GPU-accelerated machine learning. In this paper we describe a caching approach that allows code with complex matrix (graph) expressions at massive scale, i.e. multi-terabyte data, with zero memory allocation after the initial setup. (via Siah)
ENTRIES TAGGED "machine learning"
AI Book, Science Superstars, Engineering Ethics, and Crowdsourced Science
- Society of Mind — Marvin Minsky’s book now Creative-Commons licensed.
- Collaboration, Stars, and the Changing Organization of Science: Evidence from Evolutionary Biology — The concentration of research output is declining at the department level but increasing at the individual level. [...] We speculate that this may be due to changing patterns of collaboration, perhaps caused by the rising burden of knowledge and the falling cost of communication, both of which increase the returns to collaboration. Indeed, we report evidence that the propensity to collaborate is rising over time. (via Sciblogs)
- As Engineers, We Must Consider the Ethical Implications of our Work (The Guardian) — applies to coders and designers as well.
- Eyewire — a game to crowdsource the mapping of 3D structure of neurons.
- SAMOA — Yahoo!’s distributed streaming machine learning (ML) framework that contains a programming abstraction for distributed streaming ML algorithms. (via Introducing SAMOA)
- madlib — an open-source library for scalable in-database analytics. It provides data-parallel implementations of mathematical, statistical and machine-learning methods for structured and unstructured data.
- Data Portraits: Connecting People of Opposing Views — Yahoo! Labs research to break the filter bubble. Connect people who disagree on issue X (e.g., abortion) but who agree on issue Y (e.g., Latin American interventionism), and present the differences and similarities visually (they used wordclouds). Our results suggest that organic visualisation may revert the negative effects of providing potentially sensitive content. (via MIT Technology Review)
- Disguise Detection — using Raspberry Pi, Arduino, and Python.
Learning Machine Learning, Pokemon Coding, Drone Coverage, and Optimization Guide
- CalTech Machine Learning Video Library — a pile of video introductions to different machine learning concepts.
- Awesome Pokemon Hack — each inventory item has a number associated with it, they are kept at a particular memory location, and there’s a glitch in the game that executes code at that location so … you can program by assembling items and then triggering the glitch. SO COOL.
- Drone Footage of Bangkok Protests — including water cannons.
- The Mature Optimization Handbook — free, well thought out, and well written. My favourite line: In exchange for that saved space, you have created a hidden dependency on clairvoyance.
Internet Cities, Defying Google Glass, Deep Learning Book, and Open Paleoanthropology
- The Death and Life of Great Internet Cities — “The sense that you were given some space on the Internet, and allowed to do anything you wanted to in that space, it’s completely gone from these new social sites,” said Scott. “Like prisoners, or livestock, or anybody locked in institution, I am sure the residents of these new places don’t even notice the walls anymore.”
- What You’re Not Supposed To Do With Google Glass (Esquire) — Maybe I can put these interruptions to good use. I once read that in ancient Rome, when a general came home victorious, they’d throw him a triumphal parade. But there was always a slave who walked behind the general, whispering in his ear to keep him humble. “You are mortal,” the slave would say. I’ve always wanted a modern nonslave version of this — a way to remind myself to keep perspective. And Glass seemed the first gadget that would allow me to do that. In the morning, I schedule a series of messages to e-mail myself throughout the day. “You are mortal.” “You are going to die someday.” “Stop being a selfish bastard and think about others.” (via BoingBoing)
- Neural Networks and Deep Learning — Chapter 1 up and free, and there’s an IndieGogo campaign to fund the rest.
- What We Know and Don’t Know — That highly controlled approach creates the misconception that fossils come out of the ground with labels attached. Or worse, that discovery comes from cloaked geniuses instead of open discussion. We’re hoping to combat these misconceptions by pursuing an open approach. This is today’s evolutionary science, not the science of fifty years ago We’re here sharing science. [...] Science isn’t the answers, science is the process. Open science in paleoanthropology.
Tutorials for designers, data scientists, data engineers, and managers
As the Program Development Director for Strata Santa Clara 2014, I am pleased to announce that the tutorial session descriptions are now live. We’re pleased to offer several day-long immersions including the popular Data Driven Business Day and Hardcore Data Science tracks. We curated these topics as we wanted to appeal to a broad range of attendees including business users and managers, designers, data analysts/scientists, and data engineers. In the coming months we’ll have a series of guest posts from many of the instructors and communities behind the tutorials.
Analytics for Business Users
We’re offering a series of data intensive tutorials for non-programmers. John Foreman will use spreadsheets to demonstrate how data science techniques work step-by-step – a topic that should appeal to those tasked with advanced business analysis. Grammar of Graphics author, SYSTAT creator, and noted Statistician Leland Wilkinson, will teach an introductory course on analytics using an innovative expert system he helped build.
Data Science essentials
Scalding – a Scala API for Cascading – is one of the most popular open source projects in the Hadoop ecosystem. Vitaly Gordon will lead a hands-on tutorial on how to use Scalding to put together effective data processing workflows. Data analysts have long lamented the amount of time they spend on data wrangling. But what if you had access to tools and best practices that would make data wrangling less tedious? That’s exactly the tutorial that distinguished Professors and Trifacta co-founders, Joe Hellerstein and Jeff Heer, are offering.
The co-founders of Datascope Analytics are offering a glimpse into how they help clients identify the appropriate problem or opportunity to focus on by using design thinking (see the recent Datascope/IDEO post on Design Thinking and Data Science). We’re also happy to reprise the popular (Strata Santa Clara 2013) d3.js tutorial by Scott Murray.
Scan Win, Watson Platform, Metal Printer, and Microcontroller Python
- Google Wins Book Scanning Case (Giga Om) — will probably be appealed, though many authors will fear it’s good money after bad tilting at the fair use windmill.
- IBM Watson To Be A Platform (IBM) — press release indicates you’ll soon be able to develop your own apps that use Watson’s machine learning and text processing.
- MiniMetalMaker (IndieGogo) — 3D printer that can print detailed objects from specially blended metal clay and fire.
- MicroPython (KickStarter) — Python for Microcontrollers.
IP Woe, Deep Learning Intro, Rapid Prototyping Bots, 3D Display
- TPPA Trades Away Internet Freedoms (EFF) — commentary on the wikileaked text of the trade agreement.
- Deep Learning 101 — introduction to the machine learning trend of choice.
- Large Scale Rapid Prototyping Robots — an informal list of large rapid prototyping systems [...] including: big 3-axis systems that print plastic, sand, or cement; large robot arms with extruders and milling bits; and large industrial arms for bending metal and assembling modular structures.
- Dynamic Shape Display (MIT) — a Dynamic Shape Display that can render 3D content physically, so users can interact with digital information in a tangible way. inFORM can also interact with the physical world around it, for example moving objects on the table’s surface. (via Fast Company)
Tools for unlocking big data continue to get simpler
Here are a few observations based on conversations I had during the just concluded Strata NYC conference.
Interactive query analysis on Hadoop remains a hot area
A recent O’Reilly survey confirmed SQL is an important skill for data scientists. A year after the launch of Impala, quite a few attendees I spoke with remained interested in the progress of SQL-on-Hadoop solutions. A trio from Hortonworks gave an update on recent improvements and changes to Hive1. A sign that Impala is gaining traction, Greg Rahn’s talk on Practical Performance Tuning for Impala was one of the best attended sessions in the conference. Ditto for a sponsored session on Kognitio’s latest features.
Existing SQL-on-Hadoop solutions require that users define a schema – an additional step given that a lot of data is increasingly in key-value or JSON format. In his talk Hadapt co-founder Daniel Abadi highlighted a solution2 that lets users query complex data types (Hadapt reserializes complex data types to speed up joins). I expect other SQL-on-Hadoop solutions to also offer query support for complex data types in the near future.
Empowering business users
With its launch at the conference, ClearStory joins Platfora and Datameer in the business analytics space. Each company builds tools that lets business users wade through large amounts of data, while emphasizing different areas. Platfora is for interactive visual analysis of massive data sets, while Datameer connects to many data sources (not just Hadoop), has started offering analytics, and can run on a laptop or cluster. Built primarily on the Berkeley stack (BDAS), ClearStory’s interesting platform encourages collaboration and simplifies data harmonization (fusing disparate data sources is a common bottleneck for business users). For organizations willing to tag and describe their data sets, Microsoft unveiled a tool that lets users query data using natural language (UK startup NeutrinoBI uses a similar “search interface”).