- Discussion Graph Tool (Microsoft Research) — simplifies social media analysis by making it easy to extract high-level features and co-occurrence relationships from raw data.
- Superlinear Productivity in Collective Group Actions (PLoS ONE) — study of open source projects shows small groups exhibit non-linear productivity increases by size, which drop off at larger sizes. we document a size effect in the strength and variability of the superlinear effect, with smaller groups exhibiting widely distributed superlinear exponents, some of them characterizing highly productive teams. In contrast, large groups tend to have a smaller superlinearity and less variability.
- coop — cheat sheet of the most common concurrency program flows in Go.
- Tessera — set of open source tools around Hadoop, R, and visualization.
"social graph" entries
Name Analysis, Old UIs, Browser Crypto Social Network, and Smart Watch Displays
- How Well Does Name Analysis Work? (Pete Warden) — explanation of how those “turn a name into gender/ethnicity/etc” routines work, and how accurate they are. Age has the weakest correlation with names. There are actually some strong patterns by time of birth, with certain names widely recognized as old-fashioned or trendy, but those tend to be swamped by class and ethnicity-based differences in the popularity of names.
- Old Interfaces — a lazy-scrolling interface to Andy Baio’s collection of faux UIs from movies. (via Andy Baio)
- Pidder — browser-crypto’d social network, address book, messaging, RSS reader, and more.
- What I Learned From Researching Almost Every Single Smart Watch That Has Been Rumoured or Announced (Quartz) — interesting roundup of the different display technologies used in each of the smartwatches.
- Modeling Users’ Activity on Twitter Networks: Validation of Dunbar’s Number (PLoSone) — In this paper we analyze a dataset of Twitter conversations collected across six months involving 1.7 million individuals and test the theoretical cognitive limit on the number of stable social relationships known as Dunbar’s number. We find that the data are in agreement with Dunbar’s result; users can entertain a maximum of 100–200 stable relationships. Thus, the ‘economy of attention’ is limited in the online world by cognitive and biological constraints as predicted by Dunbar’s theory. We propose a simple model for users’ behavior that includes finite priority queuing and time resources that reproduces the observed social behavior.
- Mary Meeker’s Internet Trends (Slideshare) — check out slide 24, ~2x month-on-month growth for MyFitnessPal’s number of API calls, which Meeker users as a proxy for “fitness data on mobile + wearable devices”.
- What I Learned as an Oompa Loompa (Elaine Wherry) — working in a chocolate factory, learning the differences and overlaps between a web startup and an more traditional physical goods business. It’s so much easier to build a sustainable organization around a simple revenue model. There are no tensions between ad partners, distribution sites, engineering, and sales teams. There are fewer points of failure. Instead, everyone is aligned towards a simple goal: make something people want.
- Augmented Reality Futures (Quartz) — wrap-up of tech in the works and coming. Instruction is the bit that interests me, scaffolding our lives: While it isn’t on the market yet, Inglobe Technologies just previewed an augmented reality app that tracks and virtually labels the components of a car engine in real time. That would make popping the hood of your car on the side of the road much less scary. The app claims to simplify tasks like checking oil and topping up coolant fluid, even for novice mechanics.
Graph data is an area that has attracted many enthusiastic entrepreneurs and developers
The popular open source project GraphLab received a major boost early this week when a new company comprised of its founding developers, raised funding to develop analytic tools for graph data sets. GraphLab Inc. will continue to use the open source GraphLab to “push the limits of graph computation and develop new ideas”, but having a commercial company will accelerate development, and allow the hiring of resources dedicated to improving usability and documentation.
While social media placed graph data on the radar of many companies, similar data sets can be found in many domains including the life and health sciences, security, and financial services. Graph data is different enough that it necessitates special tools and techniques. Because tools were a bit too complex for casual users, in the past this meant graph data analytics was the province of specialists. Fortunately graph data is an area that has attracted many enthusiastic entrepreneurs and developers. The tools have improved and I expect things to get much easier for users in the future. A great place to learn more about tools for graph data, is at the upcoming GraphLab Workshop (on July 1st in SF).
Data wrangling: creating graphs
Before you can take advantage of the other tools mentioned in this post, you’ll need to turn your data (e.g., web pages) into graphs. GraphBuilder is an open source project from Intel, that uses Hadoop MapReduce1 to build graphs out of large data sets. Another option is the combination of GraphX/Spark described below. (A startup called Trifacta is building a general-purpose, data wrangling tool, that could help as well. )
Remixing Success, Scratch in the Browser, 3D Takedown, and Wolfram Network Analysis
- The Remixing Dilemma — summary of research on remixed projects, finding that (1) Projects with moderate amounts of code are remixed more often than either very simple or very complex projects. (2) Projects by more prominent creators are more generative. (3) Remixes are more likely to attract remixers than de novo projects.
- Scratch 2.0 — my favourite first programming language for kids and adults, now in the browser! Downloadable version for offline use coming soon. See the overview for what’s new.
- State Dept Takedown on 3D-Printed Gun (Forbes) — The government says it wants to review the files for compliance with arms export control laws known as the International Traffic in Arms Regulations, or ITAR. By uploading the weapons files to the Internet and allowing them to be downloaded abroad, the letter implies Wilson’s high-tech gun group may have violated those export controls.
- Data Science of the Facebook World (Stephen Wolfram) — More than a million people have now used our Wolfram|Alpha Personal Analytics for Facebook. And as part of our latest update, in addition to collecting some anonymized statistics, we launched a Data Donor program that allows people to contribute detailed data to us for research purposes. A few weeks ago we decided to start analyzing all this data… (via Phil Earnhardt)