"social graph" entries

Four short links: 28 January 2016

Four short links: 28 January 2016

Augmented Intelligence, Social Network Limits, Microsoft Research, and Google's Go

  1. Chimera (Paper a Day) — the authors summarise six main lessons learned while building Chimera: (1) Things break down at large scale; (2) Both learning and hand-crafted rules are critical; (3) Crowdsourcing is critical, but must be closely monitored; (4) Crowdsourcing must be coupled with in-house analysts and developers; (5) Outsourcing does not work at a very large scale; (6) Hybrid human-machine systems are here to stay.
  2. Do Online Social Media Remove Constraints That Limit the Size of Offline Social Networks? (Royal Society) — paper by Robin Dunbar. Answer: The data show that the size and range of online egocentric social networks, indexed as the number of Facebook friends, is similar to that of offline face-to-face networks.
  3. Microsoft Embedding ResearchTo break down the walls between its research group and the rest of the company, Microsoft reassigned about half of its more than 1,000 research staff in September 2014 to a new group called MSR NExT. Its focus is on projects with greater impact to the company rather than pure research. Meanwhile, the other half of Microsoft Research is getting pushed to find more significant ways it can contribute to the company’s products. The challenge is how to avoid short-term thinking from your research team. For instance, Facebook assigns some staff to focus on long-term research, and Google’s DeepMind group in London conducts pure AI research without immediate commercial considerations.
  4. Google’s Go-Playing AIThe key to AlphaGo is reducing the enormous search space to something more manageable. To do this, it combines a state-of-the-art tree search with two deep neural networks, each of which contains many layers with millions of neuron-like connections. One neural network, the “policy network,” predicts the next move, and is used to narrow the search to consider only the moves most likely to lead to a win. The other neural network, the “value network,” is then used to reduce the depth of the search tree — estimating the winner in each position in place of searching all the way to the end of the game.
Comment
Four short links: 5 January 2016

Four short links: 5 January 2016

Inference with Privacy, RethinkDB Reliability, T-Mobile Choking Video, and Real-Time Streams

  1. Privacy-Preserving Inference of Social Relationships from Location Data (PDF) — utilizes an untrusted server and computes the building blocks to support various social relationship studies, without disclosing location information to the server and other untrusted parties. (via CCC Blog)
  2. Jepson takes on Rethink — the glowingest review I’ve seen from Aphyr. As far as I can ascertain, RethinkDB’s safety claims are accurate.
  3. T-Mobile’s BingeOn `Optimization’ Is Just Throttling (EFF) — T-Mobile has claimed that this practice isn’t really “throttling,” but we disagree. It’s clearly not “optimization,” since T-Mobile doesn’t alter the actual content of the video streams in any way.
  4. qminer — BSD-licensed data analytics platform for processing large-scale, real-time streams containing structured and unstructured data.
Comment
Four short links: 1 December 2015

Four short links: 1 December 2015

Radical Candour, Historical Social Network, Compliance Opportunities, and Mobile Numbers

  1. Radical Candour: The Surprising Secret to Being a Good Boss — this, every word, this. “Caring personally makes it much easier to do the next thing you have to do as a good boss, which is being willing to piss people off.”
  2. Six Degrees of Francis Baconrecreates the British early modern social network to trace the personal relationships among figures like Bacon, Shakespeare, Isaac Newton, and many others. (via CMU)
  3. Last Bus Startup Standing (TechCrunch) — Vahabzadeh stressed that a key point of Chariot’s survival has been that the company has been above-board with the law from day one. “They haven’t cowboy-ed it,” said San Francisco supervisor Scott Wiener, a mass transit advocate who recently pushed for a master subway plan for the city. “They’ve been good about taking feedback and making sure they’re complying with the law. I’m a fan and think that private transportation options and rideshares have a significant role to play in making us a transit-first city.”
  4. Mobile App Developers are Sufferingthe top 20 app publishers, representing less than 0.005% of all apps, earn 60% of all app store revenue. The article posits causes of the particularly extreme power law.
Comment
Four short links: 28 October 2015

Four short links: 28 October 2015

DRM-Breaking Broken, IT Failures, Social Graph Search, and Dataviz Interview

  1. Librarian of Congress Grants Limited DRM-Breaking Rights (Cory Doctorow) — The Copyright Office said you will be able to defeat locks on your car’s electronics, provided: You wait a year first (the power to impose waiting times on exemptions at these hearings is not anywhere in the statute, is without precedent, and has no basis in law); You only look at systems that do not interact with your car’s entertainment system (meaning that car makers can simply merge the CAN bus and the entertainment system and get around the rule altogether); Your mechanic does not break into your car — only you are allowed to do so. The whole analysis is worth reading—this is not a happy middle-ground; it’s a mess. And remember: there are plenty of countries without even these exemptions.
  2. Lessons from a Decade of IT Failures (IEEE Spectrum) — full of cautionary tales like, Note: No one has an authoritative set of financials on ECSS. That was made clear in the U.S. Senate investigation report, which expressed frustration and outrage that the Air Force couldn’t tell it what was spent on what, when it was spent, nor even what ECSS had planned to spend over time. Scary stories to tell children at night.
  3. Unicorn: A System for Searching the Social Graph (Facebook) — we describe the data model and query language supported by Unicorn, which is an online, in-memory social graph-aware indexing system designed to search trillions of edges between tens of billions of users and entities on thousands of commodity servers. Unicorn is based on standard concepts in information retrieval, but it includes features to promote results with good social proximity. It also supports queries that require multiple round-trips to leaves in order to retrieve objects that are more than one edge away from source nodes.
  4. Alberto Cairo InterviewSo, what really matters to me is not the intention of the visualization – whether you created it to deceive or with the best of intentions; what matters is the result: if the public is informed or the public is misled. In terms of ethics, I am a consequentialist – meaning that what matters to me ethically is the consequences of our actions, not so much the intentions of our actions.
Comment
Four short links: 2 October 2015

Four short links: 2 October 2015

Automatic Environments, Majority Illusion, Bogus Licensing, and Orchestrating People and Machines

  1. Announcing Otto — new Hashicorp tool that automatically builds development environments without any configuration; it can detect your project type and has built-in knowledge of industry-standard tools to setup a development environment that is ready to go. When you’re ready to deploy, Otto builds and manages an infrastructure, sets up servers, builds, and deploys the application.
  2. The Majority Illusion in Social Networks (arxiv) — if connectors do something, it’s perceived as more popular than if the same number of “unpopular” people in the social graph do it. (via MIT TR)
  3. Scientist Says Researcher in Immigrant-Friendly Countries Can’t Use His Software — software to build phylogenetic trees, but the author’s a loon. It’s another sign that it’s unwise to do science with non-free software.
  4. Orchestraan open source system to orchestrate teams of experts and machines on complex projects.
Comment
Four short links: 15 April 2015

Four short links: 15 April 2015

Facebook as Biometrics, Time Series Sequences, Programming Languages, and Oceanic Robots

  1. Facebook Biometrics Cache (Business Insider) — Facebook has been accused of violating the privacy of its users by collecting their facial data, according to a class-action lawsuit filed last week. This data-collection program led to its well-known automatic face-tagging service. But it also helped Facebook create “the largest privately held stash of biometric face-recognition data in the world,” the Courthouse News Service reports.
  2. The Clustering of Time Series Sequences is Meaningless (PDF) — Clustering of time series subsequences is meaningless. More concretely, clusters extracted from these time series are forced to obey a certain constraint that is pathologically unlikely to be satisfied by any data set, and because of this, the clusters extracted by any clustering algorithm are essentially random. While this constraint can be intuitively demonstrated with a simple illustration and is simple to prove, it has never appeared in the literature. We can justify calling our claim surprising since it invalidates the contribution of dozens of previously published papers. We will justify our claim with a theorem, illustrative examples, and a comprehensive set of experiments on reimplementations of previous work. From 2003, warning against sliding window techniques.
  3. Toolkits for the Mind (MIT TR) — Programming–language designer Guido van Rossum, who spent seven years at Google and now works at Dropbox, says that once a software company gets to be a certain size, the only way to stave off chaos is to use a language that requires more from the programmer up front. “It feels like it’s slowing you down because you have to say everything three times,” van Rossum says. Amen!
  4. Robots Roam Earth’s Imperiled Oceans (Wired) — It’s six feet long and shaped like an airliner, with two wings and a tail fin, and bears the message, “OCEANOGRAPHIC INSTRUMENT PLEASE DO NOT DISTURB.” All caps considered, though, it’s a more innocuous epigram than the one on a drone I saw back at the dock: “Not a weapon — Science Instrument.”
Comment
Four short links: 1 April 2015

Four short links: 1 April 2015

Tuning Fanout, Moore's Law, 3D Everything, and Social Graph Analysis

  1. Facebook’s Mystery MachineThe goal of this paper is very similar to that of Google Dapper[…]. Both work [to] try to figure out bottlenecks in performance in high fanout large-scale Internet services. Both work us[ing] similar methods, however this work (the mystery machine) tries to accomplish the task relying on less instrumentation than Google Dapper. The novelty of the mystery machine work is that it tries to infer the component call graph implicitly via mining the logs, where as Google Dapper instrumented each call in a meticulous manner and explicitly obtained the entire call graph.
  2. The Multiple Lives of Moore’s LawA shrinking transistor not only allowed more components to be crammed onto an integrated circuit but also made those transistors faster and less power hungry. This single factor has been responsible for much of the staying power of Moore’s Law, and it’s lasted through two very different incarnations. In the early days, a phase I call Moore’s Law 1.0, progress came by “scaling up”—adding more components to a chip. At first, the goal was simply to gobble up the discrete components of existing applications and put them in one reliable and inexpensive package. As a result, chips got bigger and more complex. The microprocessor, which emerged in the early 1970s, exemplifies this phase. But over the last few decades, progress in the semiconductor industry became dominated by Moore’s Law 2.0. This era is all about “scaling down,” driving down the size and cost of transistors even if the number of transistors per chip does not go up.
  3. BoXZY Rapid-Change FabLab: Mill, Laser Engraver, 3D Printer (Kickstarter) — project that promises you the ability to swap out heads to get different behaviour from the “move something in 3 dimensions” infrastructure in the box.
  4. SociaLite (Github) — a distributed query language for graph analysis and data mining. (via Ben Lorica)
Comment: 1
Four short links: 13 February 2015

Four short links: 13 February 2015

Web Post-Mortem, Data Flow, Hospital Robots, and Robust Complex Networks

  1. What Happened to Web Intents (Paul Kinlan) — I love post-mortems, and this is a thoughtful one.
  2. Apache NiFi — incubated open source project for data flow.
  3. Tug Hospital Robot (Wired) — It may have an adult voice, but Tug has a childlike air, even though in this hospital you’re supposed to treat it like a wheelchair-bound old lady. It’s just so innocent, so earnest, and at times, a bit helpless. If there’s enough stuff blocking its way in a corridor, for instance, it can’t reroute around the obstruction. This happened to the Tug we were trailing in pediatrics. “Oh, something’s in its way!” a woman in scrubs says with an expression like she herself had ruined the robot’s day. She tries moving the wheeled contraption but it won’t budge. “Uh, oh!” She shoves on it some more and finally gets it to move. “Go, Tug, go!” she exclaims as the robot, true to its programming, continues down the hall.
  4. Improving the Robustness of Complex Networks with Preserving Community Structure (PLoSone) — To improve robustness while minimizing the above three costly changes, we first seek to verify that the community structure of networks actually do identify the robustness and vulnerability of networks to some extent. Then, we propose an effective 3-step strategy for robustness improvement, which retains the degree distribution of a network, as well as preserves its community structure.
Comment
Four short links: 5 August 2014

Four short links: 5 August 2014

Discussion Graph Tool, Superlinear Productivity, Go Concurrency, and R Map/Reduce Tools

  1. Discussion Graph Tool (Microsoft Research) — simplifies social media analysis by making it easy to extract high-level features and co-occurrence relationships from raw data.
  2. Superlinear Productivity in Collective Group Actions (PLoS ONE) — study of open source projects shows small groups exhibit non-linear productivity increases by size, which drop off at larger sizes. we document a size effect in the strength and variability of the superlinear effect, with smaller groups exhibiting widely distributed superlinear exponents, some of them characterizing highly productive teams. In contrast, large groups tend to have a smaller superlinearity and less variability.
  3. coop — cheat sheet of the most common concurrency program flows in Go.
  4. Tessera — set of open source tools around Hadoop, R, and visualization.
Comment
Four short links: 9 June 2014

Four short links: 9 June 2014

SQL against Text, Fake Social Networks, Hidden Biases, and Versioned Data

  1. textqlexecute SQL against structured text like CSV or TSV.
  2. Social Network Structure of Fake Friends — author bought 4,000 Twitter followers and studied their relationships.
  3. Hidden Biases in Big Datawith every big data set, we need to ask which people are excluded. Which places are less visible? What happens if you live in the shadow of big data sets? (via Quinn Norton)
  4. CoreObjecta version-controlled object database for Objective-C that supports powerful undo, semantic merging, and real-time collaborative editing.
Comment