- Dragon: A Distributed Graph Query Engine — Facebook describes its internal graph query engine. [T]he layout of these indices on storage is optimized based on a deeper understanding of query patterns (e.g., many queries are about friends), as opposed to accepting random sharding, which is common in these systems. Wisely, the system is tailored to the use cases they have and the patterns they see in access.
- Almost Everyone Is Doing the API Economy Wrong (Techcrunch) — Redux: your API should help you make money when the API customer makes money, and you should set clear expectations for what’s acceptable and what’s not. But every developer should be forced to write 100 times: “if you build on a platform you don’t own, you’re building on a potential and probable future competitor.”
- Traditional Economics Failed, Here’s a Blueprint — runs through the shifts happening in our thinking about the world and ourselves (simple to complex, independent to interdependent, rational calculator to irrational approximators, etc) and concludes: True self-interest is mutual interest. The best way to improve your likelihood of surviving and thriving is to make sure those around you survive and thrive. See above API note.
- Blitzscaling (HBR) — as you move from village to city, functions are beginning to be differentiated; you’re really multithreading. I could write a thesis on the CAP theorem for business. And I have definitely worked for companies that have a “share nothing” approach to solving their threading issues.
"social graph" entries
Four short links: 23 March 2016
Graph Query, API Economy, Mutual Interest, and The Multithreading Organization
by Nat Torkington | @gnat | +Nat Torkington | March 23, 2016
Four short links: 4 March 2016
Snapchat's Business, Tracking Voters, Testing for Discriminatory Associations, and Assessing Impact
by Nat Torkington | @gnat | +Nat Torkington | March 4, 2016
- How Snapchat Built a Business by Confusing Olds (Bloomberg) — Advertisers don’t have a lot of good options to reach under-30s. The audiences of CBS, NBC, and ABC are, on average, in their 50s. Cable networks such as CNN and Fox News have it worse, with median viewerships near or past Social Security age. MTV’s median viewers are in their early 20s, but ratings have dropped in recent years. Marketers are understandably anxious, and Spiegel and his deputies have capitalized on those anxieties brilliantly by charging hundreds of thousands of dollars when Snapchat introduces an ad product.
- Tracking Voters — On the night of the Iowa caucus, Dstillery flagged all the [ad network-mediated ad] auctions that took place on phones in latitudes and longitudes near caucus locations. It wound up spotting 16,000 devices on caucus night, as those people had granted location privileges to the apps or devices that served them ads. It captured those mobile ID’s and then looked up the characteristics associated with those IDs in order to make observations about the kind of people that went to Republican caucus locations (young parents) versus Democrat caucus locations. It drilled down further (e.g., ‘people who like NASCAR voted for Trump and Clinton’) by looking at which candidate won at a particular caucus location.
- Discovering Unwarranted Associations in Data-Driven Applications with the FairTest Testing Toolkit (arXiv) — We describe FairTest, a testing toolkit that detects unwarranted associations between an algorithm’s outputs (e.g., prices or labels) and user subpopulations, including sensitive groups (e.g., defined by race or gender). FairTest reports statistically significant associations to programmers as association bugs, ranked by their strength and likelihood of being unintentional, rather than necessary effects. See also slides from PrivacyCon. Source code not yet released.
- Inferring Causal Impact Using Bayesian Structural Time-Series Models (Adrian Colyer) — understanding the impact of an intervention by building a predictive model of what would have happened without the intervention, then diffing reality to that model.
Four short links: 18 February 2016
Potteresque Project, Tumblr Teens, Hartificial Hand, and Denied by Data
by Nat Torkington | @gnat | +Nat Torkington | February 18, 2016
- Homemade Weasley Clock (imgur) — construction photos of a clever Potter-inspired clock that shows where people are. (via Archie McPhee)
- Secret Lives of Tumblr Teens — teens perform joy on Instagram but confess sadness on Tumblr.
- Amazing Biomimetic Anthropomorphic Hand (Spectrum IEEE) — First, they laser scanned a human skeleton hand, and then 3D-printed artificial bones to match, which allowed them to duplicate the unfixed joint axes that we have […] The final parts to UW’s hand are the muscles, which are made up of an array of 10 Dynamixel servos, whose cable routing closely mimics the carpal tunnel of a human hand. Amazing detail!
- Life Insurance Can Gattaca You (FastCo) — “Unfortunately after carefully reviewing your application, we regret that we are unable to provide you with coverage because of your positive BRCA 1 gene,” the letter reads. In the U.S., about one in 400 women have a BRCA 1 or 2 gene, which is associated with increased risk of breast and ovarian cancer.
Four short links: 28 January 2016
Augmented Intelligence, Social Network Limits, Microsoft Research, and Google's Go
by Nat Torkington | @gnat | +Nat Torkington | January 28, 2016
- Chimera (Paper a Day) — the authors summarise six main lessons learned while building Chimera: (1) Things break down at large scale; (2) Both learning and hand-crafted rules are critical; (3) Crowdsourcing is critical, but must be closely monitored; (4) Crowdsourcing must be coupled with in-house analysts and developers; (5) Outsourcing does not work at a very large scale; (6) Hybrid human-machine systems are here to stay.
- Do Online Social Media Remove Constraints That Limit the Size of Offline Social Networks? (Royal Society) — paper by Robin Dunbar. Answer: The data show that the size and range of online egocentric social networks, indexed as the number of Facebook friends, is similar to that of offline face-to-face networks.
- Microsoft Embedding Research — To break down the walls between its research group and the rest of the company, Microsoft reassigned about half of its more than 1,000 research staff in September 2014 to a new group called MSR NExT. Its focus is on projects with greater impact to the company rather than pure research. Meanwhile, the other half of Microsoft Research is getting pushed to find more significant ways it can contribute to the company’s products. The challenge is how to avoid short-term thinking from your research team. For instance, Facebook assigns some staff to focus on long-term research, and Google’s DeepMind group in London conducts pure AI research without immediate commercial considerations.
- Google’s Go-Playing AI — The key to AlphaGo is reducing the enormous search space to something more manageable. To do this, it combines a state-of-the-art tree search with two deep neural networks, each of which contains many layers with millions of neuron-like connections. One neural network, the “policy network,” predicts the next move, and is used to narrow the search to consider only the moves most likely to lead to a win. The other neural network, the “value network,” is then used to reduce the depth of the search tree — estimating the winner in each position in place of searching all the way to the end of the game.
Four short links: 5 January 2016
Inference with Privacy, RethinkDB Reliability, T-Mobile Choking Video, and Real-Time Streams
by Nat Torkington | @gnat | +Nat Torkington | January 5, 2016
- Privacy-Preserving Inference of Social Relationships from Location Data (PDF) — utilizes an untrusted server and computes the building blocks to support various social relationship studies, without disclosing location information to the server and other untrusted parties. (via CCC Blog)
- Jepson takes on Rethink — the glowingest review I’ve seen from Aphyr. As far as I can ascertain, RethinkDB’s safety claims are accurate.
- T-Mobile’s BingeOn `Optimization’ Is Just Throttling (EFF) — T-Mobile has claimed that this practice isn’t really “throttling,” but we disagree. It’s clearly not “optimization,” since T-Mobile doesn’t alter the actual content of the video streams in any way.
- qminer — BSD-licensed data analytics platform for processing large-scale, real-time streams containing structured and unstructured data.
Four short links: 1 December 2015
Radical Candour, Historical Social Network, Compliance Opportunities, and Mobile Numbers
by Nat Torkington | @gnat | +Nat Torkington | December 1, 2015
- Radical Candour: The Surprising Secret to Being a Good Boss — this, every word, this. “Caring personally makes it much easier to do the next thing you have to do as a good boss, which is being willing to piss people off.”
- Six Degrees of Francis Bacon — recreates the British early modern social network to trace the personal relationships among figures like Bacon, Shakespeare, Isaac Newton, and many others. (via CMU)
- Last Bus Startup Standing (TechCrunch) — Vahabzadeh stressed that a key point of Chariot’s survival has been that the company has been above-board with the law from day one. “They haven’t cowboy-ed it,” said San Francisco supervisor Scott Wiener, a mass transit advocate who recently pushed for a master subway plan for the city. “They’ve been good about taking feedback and making sure they’re complying with the law. I’m a fan and think that private transportation options and rideshares have a significant role to play in making us a transit-first city.”
- Mobile App Developers are Suffering — the top 20 app publishers, representing less than 0.005% of all apps, earn 60% of all app store revenue. The article posits causes of the particularly extreme power law.
Four short links: 28 October 2015
DRM-Breaking Broken, IT Failures, Social Graph Search, and Dataviz Interview
by Nat Torkington | @gnat | +Nat Torkington | October 28, 2015
- Librarian of Congress Grants Limited DRM-Breaking Rights (Cory Doctorow) — The Copyright Office said you will be able to defeat locks on your car’s electronics, provided: You wait a year first (the power to impose waiting times on exemptions at these hearings is not anywhere in the statute, is without precedent, and has no basis in law); You only look at systems that do not interact with your car’s entertainment system (meaning that car makers can simply merge the CAN bus and the entertainment system and get around the rule altogether); Your mechanic does not break into your car — only you are allowed to do so. The whole analysis is worth reading—this is not a happy middle-ground; it’s a mess. And remember: there are plenty of countries without even these exemptions.
- Lessons from a Decade of IT Failures (IEEE Spectrum) — full of cautionary tales like, Note: No one has an authoritative set of financials on ECSS. That was made clear in the U.S. Senate investigation report, which expressed frustration and outrage that the Air Force couldn’t tell it what was spent on what, when it was spent, nor even what ECSS had planned to spend over time. Scary stories to tell children at night.
- Unicorn: A System for Searching the Social Graph (Facebook) — we describe the data model and query language supported by Unicorn, which is an online, in-memory social graph-aware indexing system designed to search trillions of edges between tens of billions of users and entities on thousands of commodity servers. Unicorn is based on standard concepts in information retrieval, but it includes features to promote results with good social proximity. It also supports queries that require multiple round-trips to leaves in order to retrieve objects that are more than one edge away from source nodes.
- Alberto Cairo Interview — So, what really matters to me is not the intention of the visualization – whether you created it to deceive or with the best of intentions; what matters is the result: if the public is informed or the public is misled. In terms of ethics, I am a consequentialist – meaning that what matters to me ethically is the consequences of our actions, not so much the intentions of our actions.
Four short links: 2 October 2015
Automatic Environments, Majority Illusion, Bogus Licensing, and Orchestrating People and Machines
by Nat Torkington | @gnat | +Nat Torkington | October 2, 2015
- Announcing Otto — new Hashicorp tool that automatically builds development environments without any configuration; it can detect your project type and has built-in knowledge of industry-standard tools to setup a development environment that is ready to go. When you’re ready to deploy, Otto builds and manages an infrastructure, sets up servers, builds, and deploys the application.
- The Majority Illusion in Social Networks (arxiv) — if connectors do something, it’s perceived as more popular than if the same number of “unpopular” people in the social graph do it. (via MIT TR)
- Scientist Says Researcher in Immigrant-Friendly Countries Can’t Use His Software — software to build phylogenetic trees, but the author’s a loon. It’s another sign that it’s unwise to do science with non-free software.
- Orchestra — an open source system to orchestrate teams of experts and machines on complex projects.
Four short links: 15 April 2015
Facebook as Biometrics, Time Series Sequences, Programming Languages, and Oceanic Robots
by Nat Torkington | @gnat | +Nat Torkington | April 15, 2015
- Facebook Biometrics Cache (Business Insider) — Facebook has been accused of violating the privacy of its users by collecting their facial data, according to a class-action lawsuit filed last week. This data-collection program led to its well-known automatic face-tagging service. But it also helped Facebook create “the largest privately held stash of biometric face-recognition data in the world,” the Courthouse News Service reports.
- The Clustering of Time Series Sequences is Meaningless (PDF) — Clustering of time series subsequences is meaningless. More concretely, clusters extracted from these time series are forced to obey a certain constraint that is pathologically unlikely to be satisfied by any data set, and because of this, the clusters extracted by any clustering algorithm are essentially random. While this constraint can be intuitively demonstrated with a simple illustration and is simple to prove, it has never appeared in the literature. We can justify calling our claim surprising since it invalidates the contribution of dozens of previously published papers. We will justify our claim with a theorem, illustrative examples, and a comprehensive set of experiments on reimplementations of previous work. From 2003, warning against sliding window techniques.
- Toolkits for the Mind (MIT TR) — Programming–language designer Guido van Rossum, who spent seven years at Google and now works at Dropbox, says that once a software company gets to be a certain size, the only way to stave off chaos is to use a language that requires more from the programmer up front. “It feels like it’s slowing you down because you have to say everything three times,” van Rossum says. Amen!
- Robots Roam Earth’s Imperiled Oceans (Wired) — It’s six feet long and shaped like an airliner, with two wings and a tail fin, and bears the message, “OCEANOGRAPHIC INSTRUMENT PLEASE DO NOT DISTURB.” All caps considered, though, it’s a more innocuous epigram than the one on a drone I saw back at the dock: “Not a weapon — Science Instrument.”
Four short links: 1 April 2015
Tuning Fanout, Moore's Law, 3D Everything, and Social Graph Analysis
by Nat Torkington | @gnat | +Nat Torkington | April 1, 2015
- Facebook’s Mystery Machine — The goal of this paper is very similar to that of Google Dapper[…]. Both work [to] try to figure out bottlenecks in performance in high fanout large-scale Internet services. Both work us[ing] similar methods, however this work (the mystery machine) tries to accomplish the task relying on less instrumentation than Google Dapper. The novelty of the mystery machine work is that it tries to infer the component call graph implicitly via mining the logs, where as Google Dapper instrumented each call in a meticulous manner and explicitly obtained the entire call graph.
- The Multiple Lives of Moore’s Law — A shrinking transistor not only allowed more components to be crammed onto an integrated circuit but also made those transistors faster and less power hungry. This single factor has been responsible for much of the staying power of Moore’s Law, and it’s lasted through two very different incarnations. In the early days, a phase I call Moore’s Law 1.0, progress came by “scaling up”—adding more components to a chip. At first, the goal was simply to gobble up the discrete components of existing applications and put them in one reliable and inexpensive package. As a result, chips got bigger and more complex. The microprocessor, which emerged in the early 1970s, exemplifies this phase. But over the last few decades, progress in the semiconductor industry became dominated by Moore’s Law 2.0. This era is all about “scaling down,” driving down the size and cost of transistors even if the number of transistors per chip does not go up.
- BoXZY Rapid-Change FabLab: Mill, Laser Engraver, 3D Printer (Kickstarter) — project that promises you the ability to swap out heads to get different behaviour from the “move something in 3 dimensions” infrastructure in the box.
- SociaLite (Github) — a distributed query language for graph analysis and data mining. (via Ben Lorica)