"Big Data" entries

Four short links: 7 August 2014

Four short links: 7 August 2014

Material Design, Stewart's Slack, Sketching in Javascript, and Neural Networks and Deep Learning

  1. Material Design in the Google I/O App (Medium) — steps through design thinking as they put Google’s new design metaphor in place. I’ve been chewing on material design. It brings an internal consistency and logic to the Android world that Apple’s iOS and OS X visual worlds have been losing over the years. How long until web users expect this consistency too?
  2. Stewart and Slack (Wired) — profile of Foo Stewart Butterfield and his shiny Slack startup.
  3. p5js — a new Processing-inspired code-as-sketching in Javascript. Using the original metaphor of a software sketchbook, p5.js has a full set of drawing functionality. However, you’re not limited to your drawing canvas, you can think of your whole browser page as your sketch!
  4. Neural Networks and Deep Learning — a free online book to teach you … well, neural networks and deep learning.
Comment
Four short links: 6 August 2014

Four short links: 6 August 2014

Mesa Database, Thumbstoppers, Impressive Research, and Microsoft Development

  1. Mesa: Geo-Replicated, Near Real-Time, Scalable Data Warehousing (PDF) — paper by Googlers on the database holding G’s ad data. Trillions of rows, petabytes of data, point queries with 99th percentile latency in the hundreds of milliseconds and overall query throughput of trillions of rows fetched per day, continuous updates on the order of millions of rows updated per second, strong consistency and repeatable query results even if a query involves multiple datacenters, and no SPOF. (via Greg Linden)
  2. Thumbstopping (Salon) — The prime goal of a Facebook ad campaign is to create an ad “so compelling that it would get people to stop scrolling through their news feeds,” reports the Times. This is known, in Facebook land, as a “thumbstopper.” And thus, the great promise of the digitial revolution is realized: The best minds of our generation are obsessed with manipulating the movement of your thumb on a smartphone touch-screen.
  3. om3d — pose a model based on its occurrence in a photo, then update the photo after rotating and re-rendering the model. Research is doing some sweet things these days—this comes hot on the heels of recovering sounds from high-speed video of things like chip bags.
  4. Microsoft’s Development Practices (Ars Technica) — they get the devops religion but call it “combined engineering”. They get the idea of shared code bases, but call it “open source”. At least when they got the agile religion, they called it that. Check out the horror story of where they started: a two-year development process in which only about four months would be spent writing new code. Twice as long would be spent fixing that code. MSFT’s waterfall was the equivalent of American football, where there’s 11 minutes of actual play in the average 3h 12m game.
Comment
Four short links: 5 August 2014

Four short links: 5 August 2014

Discussion Graph Tool, Superlinear Productivity, Go Concurrency, and R Map/Reduce Tools

  1. Discussion Graph Tool (Microsoft Research) — simplifies social media analysis by making it easy to extract high-level features and co-occurrence relationships from raw data.
  2. Superlinear Productivity in Collective Group Actions (PLoS ONE) — study of open source projects shows small groups exhibit non-linear productivity increases by size, which drop off at larger sizes. we document a size effect in the strength and variability of the superlinear effect, with smaller groups exhibiting widely distributed superlinear exponents, some of them characterizing highly productive teams. In contrast, large groups tend to have a smaller superlinearity and less variability.
  3. coop — cheat sheet of the most common concurrency program flows in Go.
  4. Tessera — set of open source tools around Hadoop, R, and visualization.
Comment
Four short links: 1 August 2014

Four short links: 1 August 2014

Data Storytelling Tools, Massive Dataset Mining, Failed Crowdsourcing, and IoT Networking

  1. MisoDataset, a JavaScript client-side data management and transformation library, Storyboard, a state and flow-control management library & d3.chart, a framework for creating reusable charts with d3.js. Open source designed to expedite the creation of high-quality interactive storytelling and data visualisation content.
  2. Mining of Massive Datasets (PDF) — book by Stanford profs, focuses on data mining of very large amounts of data, that is, data so large it does not fit in main memory. Because of the emphasis on size, many of our examples are about the Web or data derived from the Web. Further, the book takes an algorithmic point of view: data mining is about applying algorithms to data, rather than using data to “train” a machine-learning engine of some sort.
  3. Lessons from Iceland’s Failed Crowdsourced Constitution (Slate) — Though the crowdsourcing moment could have led to a virtuous deliberative feedback loop between the crowd and the Constitutional Council, the latter did not seem to have the time, tools, or training necessary to process carefully the crowd’s input, explain its use of it, let alone return consistent feedback on it to the public.
  4. Thread a ZigBee Killer?Thread is Nest’s home automation networking stack, which can use the same hardware components as ZigBee, but which is not compatible, also not open source. The Novell NetWare of Things. Nick Hunn makes argument that Google (via Nest) are taking aim at ZigBee: it’s Google and Nest saying “ZigBee doesn’t work”.
Comment
Four short links: 21 July 2014

Four short links: 21 July 2014

Numenta Code, Soccer Robotics, Security Data Science, Open Wireless Router

  1. nupic (github) -GPL v3-licensed ode from Numenta, at last. See their patent position.
  2. Robocup — soccer robotics contest, condition of entry is that all codes are open sourced after the contest. (via The Economist)
  3. Security Data Science Paper Collection — machine learning, big data, analysis, reports, all around security issues.
  4. Building an Open Wireless Router — EFF call for coders to help build a wireless router that’s more secure and more supportive of open sharing than current devices.

Comment
Four short links: 15 July 2014

Four short links: 15 July 2014

Data Brokers, Car Data, Pattern Classification, and Hogwild Deep Learning

  1. Inside Data Brokers — very readable explanation of the data brokers and how their information is used to track advertising effectiveness.
  2. Elon, I Want My Data! — Telsa don’t give you access to the data that your cars collects. Bodes poorly for the Internet of Sealed Boxes. (via BoingBoing)
  3. Pattern Classification (Github) — collection of tutorials and examples for solving and understanding machine learning and pattern classification tasks.
  4. HOGWILD! (PDF) — the algorithm that Microsoft credit with the success of their Adam deep learning system.
Comment
Four short links: 9 July 2014

Four short links: 9 July 2014

Developer Inequality, Weak Signals, Geek Feminism Wiki, and Reidentification Risks

  1. Developer Inequality (Jonathan Edwards) — The bigger injustice is that programming has become an elite: a vocation requiring rare talents, grueling training, and total dedication. The way things are today if you want to be a programmer you had best be someone like me on the autism spectrum who has spent their entire life mastering vast realms of arcane knowledge — and enjoys it. Normal humans are effectively excluded from developing software. (via Slashdot)
  2. Signals From Foo Camp (O’Reilly Radar) — useful for me (aka “the stuff I didn’t get to see”), hopefully useful to you too. Companies outside of Silicon Valley badly want to understand it and want to find ways to truly collaborate with it, but they’re worried that conversations can turn into competition. “Old industry” has incredible expertise and operates in very complex environments, and it has much to teach tech, if tech will listen. Silicon Valley isn’t an IT department for the world, it’s the competition.
  3. Feminist Point of View: Lessons from Running the Geek Feminism Wiki — deck from Alex’s OS Bridge session. Today’s awareness and actions around sexism in tech resulted from their actions, sometimes directly, sometimes indirectly.
  4. Big Data Should Not Be a Faith-Based Initiative (Cory Doctorow) — Re-identification is part of the Big Data revolution: among the new meanings we are learning to extract from huge corpuses of data is the identity of the people in that dataset. And since we’re commodifying and sharing these huge datasets, they will still be around in ten, twenty and fifty years, when those same Big Data advancements open up new ways of re-identifying — and harming — their subjects.
Comment
Four short links: 1 July 2014

Four short links: 1 July 2014

Efficient Representation, Page Rendering, Graph Database, Warning Effectiveness

  1. word2vecThis tool provides an efficient implementation of the continuous bag-of-words and skip-gram architectures for computing vector representations of words. These representations can be subsequently used in many natural language processing applications and for further research. From Google Research paper Efficient Estimation of Word Representations in Vector Space.
  2. What Every Frontend Developer Should Know about Page RenderingRendering has to be optimized from the very beginning, when the page layout is being defined, as styles and scripts play the crucial role in page rendering. Professionals have to know certain tricks to avoid performance problems. This arcticle does not study the inner browser mechanics in detail, but rather offers some common principles.
  3. Cayleyan open-source graph inspired by the graph database behind Freebase and Google’s Knowledge Graph.
  4. Alice in Warningland (PDF) — We performed a field study with Google Chrome and Mozilla Firefox’s telemetry platforms, allowing us to collect data on 25,405,944 warning impressions. We find that browser security warnings can be successful: users clicked through fewer than a quarter of both browser’s malware and phishing warnings and third of Mozilla Firefox’s SSL warnings. We also find clickthrough rates as high as 70.2% for Google Chrome SSL warnings, indicating that the user experience of a warning can have tremendous impact on user behaviour.
Comments: 7
Four short links: 27 June 2014

Four short links: 27 June 2014

Google MillWheel, 20yo Bug, Fast Real-Time Visualizations, and Google's Speed King

  1. MillWheel: Fault-Tolerant Stream Processing at Internet Scale — Google Research paper on the tech underlying the new cloud DataFlow tool. Watch the video. Yow.
  2. The Integer Overflow Bug That Went to Mars — long-standing (20 year old!) bug in a compression library prompts a wave of new releases. No word yet on whether NASA will upgrade the rover to avoid being pwned by Martian script kiddies. (update: I fell for a self-promoter. The Martians will need to find another attack vector. Huzzah!)
  3. epoch (github) — Fastly-produced open source general purpose real-time charting library for building beautiful, smooth, and high performance visualizations.
  4. Achieving Rapid Response Times in Large Online Services (YouTube) — Jeff Dean‘s keynote at Velocity. He wrote … a lot of things for this. And now he’s into deep learning ….
Comment
Four short links: 24 June 2014

Four short links: 24 June 2014

Failure of Imagination, Meat Failure Mode, Grand Challenges, and Data Programming

  1. Maximum Happy Imagination (Matt Jones) — questioning the true vision of Marc Andreessen’s recent Twitter discourse on the great future that awaits us. His analogies run out in the 20th century when it comes to the political, social and economic implications of his maximum happy imagination.
  2. The MirrortocracyIt’s astonishing how many of the people conducting interviews and passing judgement on the careers of candidates have had no training at all on how to do it well. Aside from their own interviews, they may not have ever seen one. I’m all for learning on your own but at least when you write a program wrong it breaks. Without a natural feedback loop, interviewing mostly runs on myth and survivor bias.
  3. Longitude Prize — six prize areas, Grand Challenge style, in clean flight, antibiotic resistance, dementia, food, water, and overcoming paralysis. Mysteriously none for library system that avoids DLL hell.
  4. The Re-Emergence of DatalogMichael Fogus overviews Datalog and provides examples of how it is implemented and used in Datomic, Cascalog, and the Bacwn Clojure library. See also notes from the talk.
Comment