- Mapping Twitter Topic Networks (Pew Internet) — Conversations on Twitter create networks with identifiable contours as people reply to and mention one another in their tweets. These conversational structures differ, depending on the subject and the people driving the conversation. Six structures are regularly observed: divided, unified, fragmented, clustered, and inward and outward hub and spoke structures. These are created as individuals choose whom to reply to or mention in their Twitter messages and the structures tell a story about the nature of the conversation. (via Washington Post)
- yasp — a fully functional web-based assembler development environment, including a real assembler, emulator and debugger. The assembler dialect is a custom which is held very simple so as to keep the learning curve as shallow as possible.
- The 12-Factor App — twelve habits of highly successful web developers, essentially.
- Fast Approximation of Betweenness Centrality through Sampling (PDF) — Betweenness centrality is a fundamental measure in social network analysis, expressing the importance or influence of individual vertices in a network in terms of the fraction of shortest paths that pass through them. Exact computation in large networks is prohibitively expensive and fast approximation algorithms are required in these cases. We present two efficient randomized algorithms for betweenness estimation.
"social graph" entries
Name Analysis, Old UIs, Browser Crypto Social Network, and Smart Watch Displays
- How Well Does Name Analysis Work? (Pete Warden) — explanation of how those “turn a name into gender/ethnicity/etc” routines work, and how accurate they are. Age has the weakest correlation with names. There are actually some strong patterns by time of birth, with certain names widely recognized as old-fashioned or trendy, but those tend to be swamped by class and ethnicity-based differences in the popularity of names.
- Old Interfaces — a lazy-scrolling interface to Andy Baio’s collection of faux UIs from movies. (via Andy Baio)
- Pidder — browser-crypto’d social network, address book, messaging, RSS reader, and more.
- What I Learned From Researching Almost Every Single Smart Watch That Has Been Rumoured or Announced (Quartz) — interesting roundup of the different display technologies used in each of the smartwatches.
- Modeling Users’ Activity on Twitter Networks: Validation of Dunbar’s Number (PLoSone) — In this paper we analyze a dataset of Twitter conversations collected across six months involving 1.7 million individuals and test the theoretical cognitive limit on the number of stable social relationships known as Dunbar’s number. We find that the data are in agreement with Dunbar’s result; users can entertain a maximum of 100–200 stable relationships. Thus, the ‘economy of attention’ is limited in the online world by cognitive and biological constraints as predicted by Dunbar’s theory. We propose a simple model for users’ behavior that includes finite priority queuing and time resources that reproduces the observed social behavior.
- Mary Meeker’s Internet Trends (Slideshare) — check out slide 24, ~2x month-on-month growth for MyFitnessPal’s number of API calls, which Meeker users as a proxy for “fitness data on mobile + wearable devices”.
- What I Learned as an Oompa Loompa (Elaine Wherry) — working in a chocolate factory, learning the differences and overlaps between a web startup and an more traditional physical goods business. It’s so much easier to build a sustainable organization around a simple revenue model. There are no tensions between ad partners, distribution sites, engineering, and sales teams. There are fewer points of failure. Instead, everyone is aligned towards a simple goal: make something people want.
- Augmented Reality Futures (Quartz) — wrap-up of tech in the works and coming. Instruction is the bit that interests me, scaffolding our lives: While it isn’t on the market yet, Inglobe Technologies just previewed an augmented reality app that tracks and virtually labels the components of a car engine in real time. That would make popping the hood of your car on the side of the road much less scary. The app claims to simplify tasks like checking oil and topping up coolant fluid, even for novice mechanics.
Graph data is an area that has attracted many enthusiastic entrepreneurs and developers
The popular open source project GraphLab received a major boost early this week when a new company comprised of its founding developers, raised funding to develop analytic tools for graph data sets. GraphLab Inc. will continue to use the open source GraphLab to “push the limits of graph computation and develop new ideas”, but having a commercial company will accelerate development, and allow the hiring of resources dedicated to improving usability and documentation.
While social media placed graph data on the radar of many companies, similar data sets can be found in many domains including the life and health sciences, security, and financial services. Graph data is different enough that it necessitates special tools and techniques. Because tools were a bit too complex for casual users, in the past this meant graph data analytics was the province of specialists. Fortunately graph data is an area that has attracted many enthusiastic entrepreneurs and developers. The tools have improved and I expect things to get much easier for users in the future. A great place to learn more about tools for graph data, is at the upcoming GraphLab Workshop (on July 1st in SF).
Data wrangling: creating graphs
Before you can take advantage of the other tools mentioned in this post, you’ll need to turn your data (e.g., web pages) into graphs. GraphBuilder is an open source project from Intel, that uses Hadoop MapReduce1 to build graphs out of large data sets. Another option is the combination of GraphX/Spark described below. (A startup called Trifacta is building a general-purpose, data wrangling tool, that could help as well. )
Remixing Success, Scratch in the Browser, 3D Takedown, and Wolfram Network Analysis
- The Remixing Dilemma — summary of research on remixed projects, finding that (1) Projects with moderate amounts of code are remixed more often than either very simple or very complex projects. (2) Projects by more prominent creators are more generative. (3) Remixes are more likely to attract remixers than de novo projects.
- Scratch 2.0 — my favourite first programming language for kids and adults, now in the browser! Downloadable version for offline use coming soon. See the overview for what’s new.
- State Dept Takedown on 3D-Printed Gun (Forbes) — The government says it wants to review the files for compliance with arms export control laws known as the International Traffic in Arms Regulations, or ITAR. By uploading the weapons files to the Internet and allowing them to be downloaded abroad, the letter implies Wilson’s high-tech gun group may have violated those export controls.
- Data Science of the Facebook World (Stephen Wolfram) — More than a million people have now used our Wolfram|Alpha Personal Analytics for Facebook. And as part of our latest update, in addition to collecting some anonymized statistics, we launched a Data Donor program that allows people to contribute detailed data to us for research purposes. A few weeks ago we decided to start analyzing all this data… (via Phil Earnhardt)
A disk-based, single-node, graph analytics system that scales to massive graphs
Designed specifically to run on a single computer with limited memory1 (DRAM), since its release a few months ago GraphChi has been used to analyze graphs with billions of edges. Running on a single machine means deployment and debugging are simpler. In addition it is no longer necessary to find (optimal) graph partitions that minimize communication between compute nodes – the starting point for many distributed graph computations.
The stated goal of GraphChi is to “Compute on graphs with billions of edges, in a reasonable time, on a single PC.” One way to define “reasonable amount of computation time” is to compare against the results produced by other graph processing systems. That’s exactly what GraphChi’s creators did in a recent paper. They found that GraphChi compared favorably to graph analytics packages such as Pegasus and Stanford GPS. While GraphChi was 2-3X slower2 in some cases, it is easier to deploy, easier to debug, and way more energy efficient. Read more…