"social software" entries

Four short links: 18 January 2016

Four short links: 18 January 2016

Machine Learning Technical Debt, Audio Matching, Self-Tracking Research, and Baidu's Open Source Deep Learning Code

  1. Hidden Technical Debt in Machine Learning Systems (PDF) — We explore several ML-specific risk factors to account for in system design. These include boundary erosion, entanglement, hidden feedback loops, undeclared consumers, data dependencies, configuration issues, changes in the external world, and a variety of system-level anti-patterns.
  2. Large-Scale Content-Based Matching of Midi and Audio FilesWe present a system that can efficiently match and align MIDI files to entries in a large corpus of audio content based solely on content, i.e., without using any metadata.
  3. Critical Social Research on Self-TrackingI am currently working on an article that is a comprehensive review of both literatures, in the attempt to outline what each can contribute to understanding self-tracking as an ethos and a practice, and its wider sociocultural implications. Here is a reading list of the work from critical social researchers that I am aware of. Trigger warning: phrases like “The discursive construction of student subjectivities.”
  4. Warp-CTC — Baidu’s open source deep learning code. Connectionist Temporal Classification is a loss function useful for performing supervised learning on sequence data, without needing an alignment between input data and labels.
Comment
Four short links: 4 November 2015

Four short links: 4 November 2015

Data Dashboard, Feature Flags, Email Replies, and Invisible Bias

  1. re:dash — open source query editor, visualisations, dashboard for data from all sorts of databases (SQL, ElasticSearch, etc.)
  2. Feature-Flag-Driven Development — one of the key pieces of modern development systems.
  3. Gmail Suggesting RepliesIn developing Smart Reply, we adhered to the same rigorous user privacy standards we’ve always held — in other words, no humans reading your email. This means researchers have to get machine learning to work on a data set that they themselves cannot read, which is a little like trying to solve a puzzle while blindfolded — but a challenge makes it more interesting!
  4. The Selective Laziness of ReasoningAmong those participants who accepted the manipulation and thus thought they were evaluating someone else’s argument, more than half (56% and 58%) rejected the arguments that were in fact their own. Moreover, participants were more likely to reject their own arguments for invalid than for valid answers. This demonstrates that people are more critical of other people’s arguments than of their own, without being overly critical: They are better able to tell valid from invalid arguments when the arguments are someone else’s rather than their own.
Comment
Four short links: 2 November 2015

Four short links: 2 November 2015

Anti-Caching, Tyranny of Ratings, Distributed Deep Learning, and Sorting Rated Things

  1. Anti-Caching (PDF) — paper outlining a clever reframing of the database strategy of keeping frequently accessed things in-memory, namely pushing to disk the things that won’t be accessed … aka, “anti-caching.”
  2. The Rating Game (Verge) — Until companies release ratings data, we can’t know for certain whether this is true, but a study of Airbnb users found that black hosts get less money for similar listings than white hosts, and another study found that white taxi drivers get higher tips than black ones. There’s no reason such biases wouldn’t carry over to ratings.
  3. Singa — Apache distributed deep learning platform turns 1.0.
  4. Scoring Items That Were Voted On or Rated — a Bayesian system to turn a set of ratings or up/down votes into a single score, such that you can sort a list from “best” to “worst.”
Comment
Four short links: 9 October 2015

Four short links: 9 October 2015

Page Loads, Data Engines, Small Groups, and Political Misperception

  1. Ludicrously Fast Page Loads: A Guide for Full-Stack Devs (Nate Berkopec) — steps slowly through the steps of page loading using Chrome Developer Tools’ timeline. Very easy to follow.
  2. Specialised and Hybrid Data Management and Processing Engines (Ben Lorica) — wrap-up of data engines uncovered at Strata + Hadoop World NYC 2015.
  3. Power of Small Groups (Matt Webb) — Matt’s joined a small Slack community of like-minded friends. There’s a space where articles written or edited by members automatically show up. I like that. I caught myself thinking: it’d be nice to have Last.FM here, too, and Dopplr. Nothing that requires much effort. Let’s also pull in Instagram. Automatic stuff so I can see what people are doing, and people can see what I’m doing. Just for this group. Back to those original intentions. Ambient awareness, togetherness. cf Clay Shirky’s situated software. Everything useful from 2004 will be rebuilt once the fetish for scale passes.
  4. Asymmetric Misperceptions (PDF) — research into the systematic mismatch between how politicians think their constituents feel on issues, and how the constituents actually feel. Our findings underscore doubts that policymakers perceive opinion accurately: politicians maintain systematic misperceptions about constituents’ views, typically erring by over 10 percentage points, and entire groups of politicians maintain even more severe collective misperceptions. A second, post-election survey finds the electoral process fails to ameliorate these misperceptions.
Comment
Four short links: 1 September 2015

Four short links: 1 September 2015

People Detection, Ratings Patterns, Inspection Bias, and Cloud Filesystem

  1. End-to-End People Detection in Crowded Scenes — research paper and code. When parsing the title, bind “end-to-end” to “scenes” not “people”.
  2. Statistical Patterns in Movie Ratings (PLOSone) — We find that the distribution of votes presents scale-free behavior over several orders of magnitude, with an exponent very close to 3/2, with exponential cutoff. It is remarkable that this pattern emerges independently of movie attributes such as average rating, age and genre, with the exception of a few genres and of high-budget films.
  3. The Inspection Bias is EverywhereIn 1991, Scott Feld presented the “friendship paradox”: the observation that most people have fewer friends than their friends have. He studied real-life friends, but the same effect appears in online networks: if you choose a random Facebook user, and then choose one of their friends at random, the chance is about 80% that the friend has more friends. The friendship paradox is a form of the inspection paradox. When you choose a random user, every user is equally likely. But when you choose one of their friends, you are more likely to choose someone with a lot of friends. Specifically, someone with x friends is overrepresented by a factor of x.
  4. s3qla file system that stores all its data online using storage services like Google Storage, Amazon S3, or OpenStack. S3QL effectively provides a hard disk of dynamic, infinite capacity that can be accessed from any computer with internet access running Linux, FreeBSD or OS-X. (GPLv3)
Comment
Four short links: 23 July 2015

Four short links: 23 July 2015

Open Source, State of DevOps, History of Links, and Vote Rings

  1. The Future of Open Source (Allison Randal) — Inexperienced companies can cause a great deal of harm as they blunder around blindly in a collaborative project, throwing resources in ways that ultimately benefit no one, not even themselves. It is in our best interest as a community to actively engage with companies and teach them how to participate effectively, how to succeed at free software and open source. Their success feeds the success of free software and open source, which feeds the self-reinforcing cycle of accelerating software innovation.
  2. Puppet Labs’ State of DevOps Report (PDF) — Westrum’s model gives us the language to define and measure culture. Perhaps most interesting, Westrum’s model also predicts IT performance. This shows that information flow isn’t just essential to safety, it’s also a critical success factor for rapidly building and evolving resilient systems at scale.
  3. Beyond Conversation — tracing the history of the link from Memex to Web.
  4. Detecting Vote Rings in Product Hunt — worth implementing in every system that processes votes. Who are the jerks in a circle?
Comments: 2
Four short links: 22 June 2015

Four short links: 22 June 2015

Power Analysis, Data at Scale, Open Source Fail, and Closing the Virtuous Loop

  1. Power Analysis of a Typical Psychology Experiment (Tom Stafford) — What this means is that if you don’t have a large effect, studies with between groups analysis and an n of less than 60 aren’t worth running. Even if you are studying a real phenomenon you aren’t using a statistical lens with enough sensitivity to be able to tell. You’ll get to the end and won’t know if the phenomenon you are looking for isn’t real or if you just got unlucky with who you tested.
  2. The Future of Data at ScaleData curation, on the other hand, is “the 800-pound gorilla in the corner,” says Stonebraker. “You can solve your volume problem with money. You can solve your velocity problem with money. Curation is just plain hard.” The traditional solution of extract, transform, and load (ETL) works for 10, 20, or 30 data sources, he says, but it doesn’t work for 500. To curate data at scale, you need automation and a human domain expert.
  3. Why Are We Still Explaining? (Stephen Walli) — Within 24 hours we received our first righteous patch. A simple 15-line change that provided a 10% boost in Just-in-Time compiler performance. And we politely thanked the contributor and explained we weren’t accepting changes yet. Another 24 hours and we received the first solid bug fix. It was golden. It included additional tests for the test suite to prove it was fixed. And we politely thanked the contributor and explained we weren’t accepting changes yet. And that was the last thing that was ever contributed.
  4. Blood Donors in Sweden Get a Text Message When Their Blood Helps Someone (Independent) — great idea to close the feedback loop. If you want to get more virtuous behaviour, make it a relationship and not a transaction. And if a warm feeling is all you have to offer in return, then offer it!
Comment
Four short links: 3 June 2015

Four short links: 3 June 2015

Filter Design, Real-Time Analytics, Neural Turing Machines, and Evaluating Subjective Opinions

  1. How to Design Applied FiltersThe most frequently observed issue during usability testing were filtering values changing placement when the user applied them – either to another position in the list of filtering values (typically the top) or to an “Applied filters” summary overview. During testing, the subjects were often confounded as they noticed that the filtering value they just clicked was suddenly “no longer there.”
  2. Twitter Herona real-time analytics platform that is fully API-compatible with Storm […] At Twitter, Heron is used as our primary streaming system, running hundreds of development and production topologies. Since Heron is efficient in terms of resource usage, after migrating all Twitter’s topologies to it we’ve seen an overall 3x reduction in hardware, causing a significant improvement in our infrastructure efficiency.
  3. ntman implementation of neural Turing machines. (via @fastml_extra)
  4. Bayesian Truth Seruma scoring system for eliciting and evaluating subjective opinions from a group of respondents, in situations where the user of the method has no independent means of evaluating respondents’ honesty or their ability. It leverages respondents’ predictions about how other respondents will answer the same questions. Through these predictions, respondents reveal their meta-knowledge, which is knowledge of what other people know.
Comment
Four short links: 28 May 2015

Four short links: 28 May 2015

Messaging and Notifications, Game Postmortem, Recovering Robots, and Ethical AI

  1. Internet Trends 2015 (PDF) — Mary Meeker’s preso. Messaging + Notifications = Key Layers of Every Meaningful Mobile App, Messaging Leaders Aiming to Create Cross-Platform Operating Systems That Are Context-Persistent Communications Hubs for More & More Services. This year’s deck feels more superficial, less surprising than in years past.
  2. When the Land Goes Under the SeaAs it turns out: People really despise being told to not replay the game. Almost universally, the reaction to that was a kernel of unhappiness amidst mostly positive reviews. In retrospect, including that note was a mistake for a number of reasons. My favorite part of game postmortems is what the designers learned about how people approach experiences.
  3. Damage Recovery Algorithm for Robots (IEEE) — This illustrates how it’s possible to endow just about any robot with resiliency via this algorithm, as long as it’s got enough degrees of freedom to enable adaptive movement. Because otherwise the Terminators will just stop when we shoot them.
  4. The Counselor — short fiction with ethics, AI, and how good things become questionable.
Comment
Four short links: 17 April 2015

Four short links: 17 April 2015

Distributed SQLite, Communicating Scientists, Learning from Failure, and Cat Convergence

  1. Replicating SQLite using Raft Consensus — clever, he used a consensus algorithm to build a distributed (replicated) SQLite.
  2. When Open Access is the Norm, How do Scientists Communicate? (PLOS) — From interviews I’ve conducted with researchers and software developers who are modeling aspects of modern online collaboration, I’ve highlighted the most useful and reproducible practices. (via Jon Udell)
  3. Meet DJ Patil“It was this kind of moment when you realize: ‘Oh, my gosh, I am that stupid,’” he said.
  4. Interview with Bruce Sterling on the Convergence of Humans and MachinesIf you are a human being, and you are doing computation, you are trying to multiply 17 times five in your head. It feels like thinking. Machines can multiply, too. They must be thinking. They can do math and you can do math. But the math you are doing is not really what cognition is about. Cognition is about stuff like seeing, maneuvering, having wants, desires. Your cat has cognition. Cats cannot multiply 17 times five. They have got their own umwelt (environment). But they are mammalian, you are a mammalian. They are actually a class that includes you. You are much more like your house cat than you are ever going to be like Siri. You and Siri converging, you and your house cat can converge a lot more easily. You can take the imaginary technologies that many post-human enthusiasts have talked about, and you could afflict all of them on a cat. Every one of them would work on a cat. The cat is an ideal laboratory animal for all these transitions and convergences that we want to make for human beings. (via Vaughan Bell)
Comment