"deep learning" entries

Patrick Wendell on Spark’s roadmap, Spark R API, and deep learning on the horizon

The O'Reilly Radar Podcast: A special holiday cross-over of the O'Reilly Data Show Podcast.

Subscribe to the O’Reilly Radar Podcast to track the technologies and people that will shape our world in the years to come.

350px-4746439330_f71a44a82f_o

In this special holiday episode of the Radar Podcast, we’re featuring a cross-over of the O’Reilly Data Show Podcast, which you can find on iTunes, Stitcher, TuneIn, or SoundCloud. O’Reilly’s Ben Lorica hosts that podcast, and in this episode, he chats with Apache Spark release manager and Databricks co-founder Patrick Wendell about the roadmap of Spark and where it’s headed, and interesting applications he’s seeing in the growing Spark ecosystem.

Here are some highlights from their chat:

We were really trying to solve research problems, so we were trying to work with the early users of Spark, getting feedback on what issues it had and what types of problems they were trying to solve with Spark, and then use that to influence the roadmap. It was definitely a more informal process, but from the very beginning, we were expressly user driven in the way we thought about building Spark, which is quite different than a lot of other open source projects. … From the beginning, we were focused on empowering other people and building platforms for other developers.

One of the early users was Conviva, a company that does analytics for real-time video distribution. They were a very early user of Spark, they continue to use it today, and a lot of their feedback was incorporated into our roadmap, especially around the types of APIs they wanted to have that would make data processing really simple for them, and of course, performance was a big issue for them very early on because in the business of optimizing real-time video streams, you want to be able to react really quickly when conditions change. … Early on, things like latency and performance were pretty important.

Read more…

Four short links: 4 December 2015

Four short links: 4 December 2015

Bacterial Research, Open Source Swift, Deep Forger, and Prudent Crypto Engineering

  1. New Antibiotics Research Direction — most people don’t know that we can’t cultivate and isolate most of the microbes we know about.
  2. Swift now Open Source — Apache v2-licensed. An Apple exec is talking about it and its roadmap.
  3. Deep Forger User Guideclever Twitter bot converting your photos into paintings in the style of famous artists, using deep learning tech.
  4. Prudent Engineering Practice for Cryptographic Protocols (PDF) — paper from the ’90s that is still useful today. Those principles are good for API design too. (via Adrian Colyer)

Kristian Hammond on truly democratizing data and the value of AI in the enterprise

The O'Reilly Radar Podcast: Narrative Science's foray into proprietary business data and bridging the data gap.

Subscribe to the O’Reilly Radar Podcast to track the technologies and people that will shape our world in the years to come.

350px-Cognitive_Dissonance_Robert_Couse-Baker

In this week’s episode, O’Reilly’s Mac Slocum chats with Kristian Hammond, Narrative Science’s chief scientist. Hammond talks about Natural Language Generation, Narrative Science’s shift into the world of business data, and evolving beyond the dashboard.

Here are a few highlights:

We’re not telling people what the data are; we’re telling people what has happened in the world through a view of that data. I don’t care what the numbers are; I care about who are my best salespeople, where are my logistical bottlenecks. Quill can do that analysis and then tell you — not make you fight with it, but just tell you — and tell you in a way that is understandable and includes an explanation about why it believes this to be the case. Our focus is entirely, a little bit in media, but almost entirely in proprietary business data, and in particular we really focus on financial services right now.

You can’t make good on that promise [of what big data was supposed to do] unless you communicate it in the right way. People don’t understand charts; they don’t understand graphs; they don’t understand lines on a page. They just don’t. We can’t be angry at them for being human. Instead we should actually have the machine do what it needs to do in order to fill that gap between what it knows and what people need to know.

Read more…

Four short links: 24 November 2015

Four short links: 24 November 2015

Tabular Data, Distrusting Authority, Data is the Future, and Remote Working Challenges

  1. uitable — cute library for tabular data in console golang programs.
  2. Did Carnegie Mellon Attack Tor for the FBI? (Bruce Schneier) — The behavior of the researchers is reprehensible, but the real issue is that CERT Coordination Center (CERT/CC) has lost its credibility as an honest broker. The researchers discovered this vulnerability and submitted it to CERT. Neither the researchers nor CERT disclosed this vulnerability to the Tor Project. Instead, the researchers apparently used this vulnerability to deanonymize a large number of hidden service visitors and provide the information to the FBI. Does anyone still trust CERT to behave in the Internet’s best interests? Analogous to the CIA organizing a fake vaccination drive to get close to Osama. “Intelligence” agencies.
  3. Google Open-Sourcing TensorFlow Shows AI’s Future is Data not Code (Wired) — something we’ve been saying for a long time.
  4. Challenges of Working Remote (Moishe Lettvin) — the things that make working remote hard aren’t, primarily, logistical; they’re emotional.
Four short links: 13 November 2015

Four short links: 13 November 2015

CEO Optimism, Fibbing Networking, GPU TensorFlow, and GUI Font Design

  1. CEO OptimismCEOs always act on leading indicators of good news, but only act on lagging indicators of bad news. (Andy Grove)
  2. Fibbing — lie to your router table to get the most from your network. Clever!
  3. TensorFlow for GPUs — Amazon image of TensorFlow ready to run on their GPU compute cloud.
  4. metaflop — UI for metafont that makes it super-easy to design your own sweet-looking font. (via BoingBoing)
Four short links: 10 November 2015

Four short links: 10 November 2015

TensorFlow Released, TensorFlow Described, Neural Networks Optimized, Cybersecurity as RealPolitik

  1. TensorFlow — Google released, as open source, their distributed machine learning system. The DataFlow programming framework is sweet, and the documentation is gorgeous. AMAZINGLY high-quality, sets the bar for any project. This may be 2015’s most important software release.
  2. TensorFlow White Paper (PDF) — Compared to DistBelief [G’s first scalable distributed inference and training system], TensorFlow’s programming model is more flexible, its performance is significantly better, and it supports training and using a broader range of models on a wider variety of heterogeneous hardware platforms.
  3. Neural Networks With Few Multiplications — paper with a method to eliminate most of the time-consuming floating point multiplications needed to update the intermediate virtual neurons as they learn. Speed has been one of the bugbears of deep neural networks.
  4. Cybersecurity as RealPolitik — Dan Geer’s excellent talk from 2014 BlackHat. When younger people ask my advice on what they should do or study to make a career in cyber security, I can only advise specialization. Those of us who were in the game early enough and who have managed to retain an over-arching generalist knowledge can’t be replaced very easily because while absorbing most new information most of the time may have been possible when we began practice, no person starting from scratch can do that now. Serial specialization is now all that can be done in any practical way. Just looking at the Black Hat program will confirm that being really good at any one of the many topics presented here all but requires shutting out the demands of being good at any others.
Four short links: 2 November 2015

Four short links: 2 November 2015

Anti-Caching, Tyranny of Ratings, Distributed Deep Learning, and Sorting Rated Things

  1. Anti-Caching (PDF) — paper outlining a clever reframing of the database strategy of keeping frequently accessed things in-memory, namely pushing to disk the things that won’t be accessed … aka, “anti-caching.”
  2. The Rating Game (Verge) — Until companies release ratings data, we can’t know for certain whether this is true, but a study of Airbnb users found that black hosts get less money for similar listings than white hosts, and another study found that white taxi drivers get higher tips than black ones. There’s no reason such biases wouldn’t carry over to ratings.
  3. Singa — Apache distributed deep learning platform turns 1.0.
  4. Scoring Items That Were Voted On or Rated — a Bayesian system to turn a set of ratings or up/down votes into a single score, such that you can sort a list from “best” to “worst.”
Four short links: 27 October 2015

Four short links: 27 October 2015

Learning Neural Nets, Medium's Stack, Bacterial Materials, and Drone Data

  1. What a Deep Neural Net Thinks of Your Selfie — really easy to understand explanation of covolutional neural nets (the tech behind image recognition). No CS required.
  2. Medium’s Stack — interesting use of Protocol Buffers: We help our people work with data by treating the schemas as the spec, rigorously documenting messages and fields and publishing generated documentation from the .proto files.
  3. Bacterial Materials (Wired UK) — Showing a prototype worn by dancers, Yao demonstrated how bacteria-powered clothing can respond to the body’s needs. She has, in effect, created living clothes, ones that react in real time to heat and sweat mapping with tiny vents that would curl open or flatten closed as exertion levels demanded.
  4. Robots to the Rescue (NSF) — one 20-minute drone flight generated upwards of 800 photographs, each of which took at least one minute to inspect. This article is five lessons learned in the field of disaster robotics, and they’re all doozies.
Four short links: 22 October 2015

Four short links: 22 October 2015

Predicting activity, systems replacement fail, Khan React style, and an interoperability system for the Web

  1. Predicting Daily Activities from Egocentric Images Using Deep LearningOur technique achieves an overall accuracy of 83.07% in predicting a person’s activity [from images taken by a camera worn all day by a person] across the 19 activity classes.
  2. Trying to Replace Multiple Systems with One Can Lead to None (IEEE) — check out that final graph, it’s a doozy. It’s a graph of x against time, from various “this project is great, it will replace x systems with 1″ claims about a single project. Software projects should come with giant warning labels: “most fail, you are about to set your money on fire. Are you sure? [Y/N/Abort/Restart]”
  3. Khan React Style Guide — in case you’re dipping your toes into the cool kids’ pool.
  4. ballistaAn interoperability system for the modern Web. Like intents.
Four short links: 12 October 2015

Four short links: 12 October 2015

Unattended Robots, Replicable Economics, Deep Learning Learnings, and TPP Problems

  1. Acquiring Object Experiences at Scale — software to let a robot examine a pile of objects, unattended overnight.
  2. Economics Apparently Not Replicable (PDF) — We successfully replicate the key qualitative result of 22 of 67 papers (33%) without contacting the authors. Excluding the six papers that use confidential data and the two papers that use software we do not possess, we replicate 29 of 59 papers (49%) with assistance from the authors. Because we are able to replicate less than half of the papers in our sample even with help from the authors, we assert that economics research is usually not replicable.
  3. 26 Things I Learned in the Deep Learning Summer School20. When Frederick Jelinek and his team at IBM submitted one of the first papers on statistical machine translation to COLING in 1988, they got the following anonymous review: The validity of a statistical (information theoretic) approach to MT has indeed been recognized, as the authors mention, by Weaver as early as 1949. And was universally recognized as mistaken by 1950 (cf. Hutchins, MT – Past, Present, Future, Ellis Horwood, 1986, p. 30ff and references therein). The crude force of computers is not science. The paper is simply beyond the scope of COLING.
  4. The Final Leaked TPP Text is All That We Feared (EFF) — If you dig deeper, you’ll notice that all of the provisions that recognize the rights of the public are non-binding, whereas almost everything that benefits rightsholders is binding.