"algorithm" entries

Simpler workflow tools enable the rapid deployment of models

The importance of data science tools that let organizations easily combine, deploy, and maintain algorithms

Data science often depends on data pipelines, that involve acquiring, transforming, and loading data. (If you’re fortunate most of the data you need is already in usable form.) Data needs to be assembled and wrangled, before it can be visualized and analyzed. Many companies have data engineers (adept at using workflow tools like Azkaban and Oozie), who manage1 pipelines for data scientists and analysts.

A workflow tool for data analysts: Chronos from airbnb
A raw bash scheduler written in Scala, Chronos is flexible, fault-tolerant2, and distributed (it’s built on top of Mesos). What’s most interesting is that it makes the creation and maintenance of complex workflows more accessible: at least within airbnb, it’s heavily used by analysts.

Job orchestration and scheduling tools contain features that data scientists would appreciate. They make it easy for users to express dependencies (start a job upon the completion of another job), and retries (particularly in cloud computing settings, jobs can fail for a variety of reasons). Chronos comes with a web UI designed to let business analysts3 define, execute, and monitor workflows: a zoomable DAG highlights failed jobs and displays stats that can be used to identify bottlenecks. Chronos lets you include asynchronous jobs – a nice feature for data science pipelines that involve long-running calculations. It also lets you easily define repeating jobs over a finite time interval, something that comes in handy for short-lived4 experiments (e.g. A/B tests or multi-armed bandits).

Read more…

Comment

Stacks get hacked: The inevitable rise of data warfare

The cycle of good, bad, and stable has happened at every layer of the stack. It will happen with big data, too.

First, technology is good. Then it gets bad. Then it gets stable.

This has been going on for a long time, likely since the invention of fire, knives, or the printed word. But I want to focus specifically on computing technology. The human race is busy colonizing a second online world and sticking prosthetic brains — today, we call them smartphones — in front of our eyes and ears. And stacks of technology on which they rely are vulnerable.

When we first created automatic phone switches, hackers quickly learned how to blow a Cap’n Crunch whistle to get free calls from pay phones. When consumers got modems, attackers soon figured out how to rapidly redial to get more than their fair share of time on a BBS, or to program scripts that could brute-force their way into others’ accounts. Eventually, we got better passwords and we fixed the pay phones and switches.

We moved up the networking stack, above the physical and link layers. We tasted TCP/IP, and found it good. Millions of us installed Trumpet Winsock on consumer machines. We were idealists rushing onto the wild open web and proclaiming it a new utopia. Then, because of the way the TCP handshake worked, hackers figured out how to DDOS people with things like SYN attacks. Escalation, and router hardening, ensued.

We built HTTP, and SQL, and more. At first, they were open, innocent, and helped us make huge advances in programming. Then attackers found ways to exploit their weaknesses with cross-site scripting and buffer overruns. They hacked armies of machines to do their bidding, flooding target networks and taking sites offline. Technologies like MP3s gave us an explosion in music, new business models, and abundant crowd-sourced audiobooks — even as they leveled a music industry with fresh forms of piracy for which we hadn’t even invented laws. Read more…

Comments: 3
Strata Week: Add structured data, lose local flavor?

Strata Week: Add structured data, lose local flavor?

Wikidata's structure vs. diverse knowledge, and a look at the many factors behind Netflix's recommendations.

A critic says Wikidata could undermine Wikipedia's localized information. Also, Netflix explains why its recommendation engine is much more complicated than most people realize.

Comment

AI will eventually drive healthcare, but not anytime soon

A merging of artificial intelligence and healthcare is tougher than many realize.

People will eventually get better care from artificial intelligence, but for now, we should keep the algorithms focused on the data that we know is good and keep the doctors focused on the patients.

Comments: 6
Strata Week: Unfortunately for some, Uber's dynamic pricing worked

Strata Week: Unfortunately for some, Uber's dynamic pricing worked

Dynamic pricing angers some Uber users, Hadoop hits 1.0, a possible set back for open-access research.

Uber's dynamic pricing worked as intended on New Year's Eve, but not everyone is happy about that. Elsewhere, Hadoop reaches the 1.0 milestone and proposed legislation seeks to repeal an open-access research policy.

Comment
Strata Week: Unfortunately for some, Uber’s dynamic pricing worked

Strata Week: Unfortunately for some, Uber’s dynamic pricing worked

Dynamic pricing angers some Uber users, Hadoop hits 1.0, a possible set back for open-access research.

Uber's dynamic pricing worked as intended on New Year's Eve, but not everyone is happy about that. Elsewhere, Hadoop reaches the 1.0 milestone and proposed legislation seeks to repeal an open-access research policy.

Comment
Four short links: 5 December 2011

Four short links: 5 December 2011

Spatial Search, Exposing Your Phone's Perfidity, School Unconference, and Wikipedia Viz

  1. VP Trees — a data structure for fast spatial searching. A form of nearest neighbour, useful for melodies (PDF) and image retrieval (PDF) and poetry. (via Reddit)
  2. iYou — iTunes plugin to show you all the stuff your phone collects about you.
  3. Bar Camps in Primary Schools — NZ teacher deploys bar camps among students. Great things happen.
  4. Realtime Wikipedia Edits — fascinating and hypnotic and inspirational and appalling and irrelevant all at once.

Comment: 1
Four short links: 18 November 2011

Four short links: 18 November 2011

Quantified Learner, Text Extraction, Backup Flickr, and Multitouch UI Awesomeness

  1. Learning With Quantified Self — this CS grad student broke Jeopardy records using an app he built himself to quantify and improve his ability to answer Jeopardy questions in different categories. This is an impressive short talk and well worth watching.
  2. Evaluating Text Extraction AlgorithmsThe gold standard of both datasets was produced by human annotators. 14 different algorithms were evaluated in terms of precision, recall and F1 score. The results have show that the best opensource solution is the boilerpipe library. (via Hacker News)
  3. Parallel Flickr — tool for backing up your Flickr account. (Compare to one day of Flickr photos printed out)
  4. Quneo Multitouch Open Source MIDI and USB Pad (Kickstarter) — interesting to see companies using Kickstarter to seed interest in a product. This one looks a doozie: pads, sliders, rotary sensors, with LEDs underneath and open source drivers and SDK. Looks almost sophisticated enough to drive emacs :-)
Comment
Four short links: 14 October 2011

Four short links: 14 October 2011

Relativity in Short Words, Set Math, Design Inspiration, and Internet of Things

  1. Theory of Relativity in Words of Four Letters or Less — this does just what it says, and well too. I like it, as you may too. At the end, you may even know more than you do now.
  2. Effective Set Reconciliation Without Prior Context (PDF) — paper on using Bloom filters to do set union (deduplication) efficiently. Useful in distributed key-value stores and other big data tools.
  3. Mental Notes — each card has an insight from psychology research that’s useful with web design. Shuffle the deck, peel off a card, get ideas for improving your site. (via Tom Stafford)
  4. The Internet of Things To Come (Mike Kuniavsky) — Mike lays out the trends and technologies that will lead to an explosion in Internet of Things products. E.g., This abstraction of knowledge into silicon means that rather than starting from basic principles of electronics, designers can focus on what they’re trying to create, rather than which capacitor to use or how to tell the signal from the noise. He makes it clear that, right now, we have the rich petrie dish in which great networked objects can be cultured.
Comment
Four short links: 18 July 2011

Four short links: 18 July 2011

Organisational Warfare, RTFM, Timezone Shapefile, Microsoft Adventure

  1. Organisational Warfare (Simon Wardley) — notes on the commoditisation of software, with interesting analyses of the positions of some large players. On closer inspection, Salesforce seems to be doing more than just commoditisation with an ILC pattern, as can be clearly seen from Radian’s 6 acquisition. They also seem to be operating a tower and moat strategy, i.e. creating a tower of revenue (the service) around which is built a moat devoid of differential value with high barriers to entry. When their competitors finally wake up and realise that the future world of CRM is in this service space, they’ll discover a new player dominating this space who has not only removed many of the opportunities to differentiate (e.g. social CRM, mobile CRM) but built a large ecosystem that creates high rates of new innovation. This should be a fairly fatal combination.
  2. Learning to Win by Reading Manuals in a Monte-Carlo Framework (MIT) — starting with no prior knowledge of the game or its UI, the system learns how to play and to win by experimenting, and from parsed manual text. They used FreeCiv, and assessed the influence of parsing the manual shallowly and deeply. Trust MIT to turn RTFM into a paper. For human-readable explanation, see the press release.
  3. A Shapefile of the TZ Timezones of the World — I have nothing but sympathy for the poor gentleman who compiled this. Political boundaries are notoriously arbitrary, and timezones are even worse because they don’t need a war to change. (via Matt Biddulph)
  4. Microsoft Adventure — 1979 Microsoft game for the TRS-80 has fascinating threads into the past and into what would become Microsoft’s future.
Comment