- textql — execute SQL against structured text like CSV or TSV.
- Social Network Structure of Fake Friends — author bought 4,000 Twitter followers and studied their relationships.
- Hidden Biases in Big Data — with every big data set, we need to ask which people are excluded. Which places are less visible? What happens if you live in the shadow of big data sets? (via Quinn Norton)
- CoreObject — a version-controlled object database for Objective-C that supports powerful undo, semantic merging, and real-time collaborative editing.
ENTRIES TAGGED "social graph"
SQL against Text, Fake Social Networks, Hidden Biases, and Versioned Data
Twitter Clusters, Web Assembly, Modern Web Practices, and Social Network Algorithms
- Mapping Twitter Topic Networks (Pew Internet) — Conversations on Twitter create networks with identifiable contours as people reply to and mention one another in their tweets. These conversational structures differ, depending on the subject and the people driving the conversation. Six structures are regularly observed: divided, unified, fragmented, clustered, and inward and outward hub and spoke structures. These are created as individuals choose whom to reply to or mention in their Twitter messages and the structures tell a story about the nature of the conversation. (via Washington Post)
- yasp — a fully functional web-based assembler development environment, including a real assembler, emulator and debugger. The assembler dialect is a custom which is held very simple so as to keep the learning curve as shallow as possible.
- The 12-Factor App — twelve habits of highly successful web developers, essentially.
- Fast Approximation of Betweenness Centrality through Sampling (PDF) — Betweenness centrality is a fundamental measure in social network analysis, expressing the importance or influence of individual vertices in a network in terms of the fraction of shortest paths that pass through them. Exact computation in large networks is prohibitively expensive and fast approximation algorithms are required in these cases. We present two efficient randomized algorithms for betweenness estimation.
Commandline iMessage, Lovely Data, Software Plagiarism Detection, and 3D GIFs
SCADA Security, Graph Clustering, Facebook Flipbook, and Projections Illustrated
- Hackers Gain ‘Full Control’ of Critical SCADA Systems (IT News) — The vulnerabilities were discovered by Russian researchers who over the last year probed popular and high-end ICS and supervisory control and data acquisition (SCADA) systems used to control everything from home solar panel installations to critical national infrastructure. More on the Botnet of Things.
- mcl — Markov Cluster Algorithm, a fast and scalable unsupervised cluster algorithm for graphs (also known as networks) based on simulation of (stochastic) flow in graphs.
- Facebook to Launch Flipboard-like Reader (Recode) — what I’d actually like to see is Facebook join the open web by producing and consuming RSS/Atom/anything feeds, but that’s a long shot. I fear it’ll either limit you to whatever circle-jerk-of-prosperity paywall-penetrating content-for-advertising-eyeballs trades the Facebook execs have made, or else it’ll be a leech on the scrotum of the open web by consuming RSS without producing it. I’m all out of respect for empire-builders who think you’re a fool if you value the open web. AOL might have died, but its vision of content kings running the network is alive and well in the hands of Facebook and Google. I’ll gladly post about the actual product launch if it is neither partnership eyeball-abuse nor parasitism.
- Map Projections Illustrated with a Face (Flowing Data) — really neat, wish I’d had these when I was getting my head around map projections.
Wearables Mature, Network as Filter, To The Androidmobile, and U R Pwn3d
- Pebble Gets App Store (ReadWrite Web) — as both Pebble and MetaWatch go after the high-end watch market. Wearables becoming more than a nerd novelty.
- Thinking About the Network as Filter (JP Rangaswami) — Constant re-openings of the same debate as people try and get a synchronous outcome out of an asynchronous tool without the agreements and conventions in place to do it. He says friends are your social filters. You no longer have to read every email. When you come back from vacation, whatever has passed in the stream unread can stay unread but most social tools are built as collectors, not as filters. Looking forward to the rest in his series.
- Open Auto Alliance — The OAA is a global alliance of technology and auto industry leaders committed to bringing the Android platform to cars starting in 2014. “KidGamesPack 7 requires access to your history, SMS, location, network connectivity, speed, weight, in-car audio, and ABS control systems. Install or Cancel?”
- Jacob Appelbaum’s CCC Talk — transcript of an excellent talk. One of the scariest parts about this is that for this system or these sets of systems to exist, we have been kept vulnerable. So it is the case that if the Chinese, if the Russians, if people here wish to build this system, there’s nothing that stops them. And in fact the NSA has in a literal sense retarded the process by which we would secure the internet because it establishes a hegemony of power, their power in secret to do these things.
Name Analysis, Old UIs, Browser Crypto Social Network, and Smart Watch Displays
- How Well Does Name Analysis Work? (Pete Warden) — explanation of how those “turn a name into gender/ethnicity/etc” routines work, and how accurate they are. Age has the weakest correlation with names. There are actually some strong patterns by time of birth, with certain names widely recognized as old-fashioned or trendy, but those tend to be swamped by class and ethnicity-based differences in the popularity of names.
- Old Interfaces — a lazy-scrolling interface to Andy Baio’s collection of faux UIs from movies. (via Andy Baio)
- Pidder — browser-crypto’d social network, address book, messaging, RSS reader, and more.
- What I Learned From Researching Almost Every Single Smart Watch That Has Been Rumoured or Announced (Quartz) — interesting roundup of the different display technologies used in each of the smartwatches.
- Modeling Users’ Activity on Twitter Networks: Validation of Dunbar’s Number (PLoSone) — In this paper we analyze a dataset of Twitter conversations collected across six months involving 1.7 million individuals and test the theoretical cognitive limit on the number of stable social relationships known as Dunbar’s number. We find that the data are in agreement with Dunbar’s result; users can entertain a maximum of 100–200 stable relationships. Thus, the ‘economy of attention’ is limited in the online world by cognitive and biological constraints as predicted by Dunbar’s theory. We propose a simple model for users’ behavior that includes finite priority queuing and time resources that reproduces the observed social behavior.
- Mary Meeker’s Internet Trends (Slideshare) — check out slide 24, ~2x month-on-month growth for MyFitnessPal’s number of API calls, which Meeker users as a proxy for “fitness data on mobile + wearable devices”.
- What I Learned as an Oompa Loompa (Elaine Wherry) — working in a chocolate factory, learning the differences and overlaps between a web startup and an more traditional physical goods business. It’s so much easier to build a sustainable organization around a simple revenue model. There are no tensions between ad partners, distribution sites, engineering, and sales teams. There are fewer points of failure. Instead, everyone is aligned towards a simple goal: make something people want.
- Augmented Reality Futures (Quartz) — wrap-up of tech in the works and coming. Instruction is the bit that interests me, scaffolding our lives: While it isn’t on the market yet, Inglobe Technologies just previewed an augmented reality app that tracks and virtually labels the components of a car engine in real time. That would make popping the hood of your car on the side of the road much less scary. The app claims to simplify tasks like checking oil and topping up coolant fluid, even for novice mechanics.
Graph data is an area that has attracted many enthusiastic entrepreneurs and developers
The popular open source project GraphLab received a major boost early this week when a new company comprised of its founding developers, raised funding to develop analytic tools for graph data sets. GraphLab Inc. will continue to use the open source GraphLab to “push the limits of graph computation and develop new ideas”, but having a commercial company will accelerate development, and allow the hiring of resources dedicated to improving usability and documentation.
While social media placed graph data on the radar of many companies, similar data sets can be found in many domains including the life and health sciences, security, and financial services. Graph data is different enough that it necessitates special tools and techniques. Because tools were a bit too complex for casual users, in the past this meant graph data analytics was the province of specialists. Fortunately graph data is an area that has attracted many enthusiastic entrepreneurs and developers. The tools have improved and I expect things to get much easier for users in the future. A great place to learn more about tools for graph data, is at the upcoming GraphLab Workshop (on July 1st in SF).
Data wrangling: creating graphs
Before you can take advantage of the other tools mentioned in this post, you’ll need to turn your data (e.g., web pages) into graphs. GraphBuilder is an open source project from Intel, that uses Hadoop MapReduce1 to build graphs out of large data sets. Another option is the combination of GraphX/Spark described below. (A startup called Trifacta is building a general-purpose, data wrangling tool, that could help as well. )
Remixing Success, Scratch in the Browser, 3D Takedown, and Wolfram Network Analysis
- The Remixing Dilemma — summary of research on remixed projects, finding that (1) Projects with moderate amounts of code are remixed more often than either very simple or very complex projects. (2) Projects by more prominent creators are more generative. (3) Remixes are more likely to attract remixers than de novo projects.
- Scratch 2.0 — my favourite first programming language for kids and adults, now in the browser! Downloadable version for offline use coming soon. See the overview for what’s new.
- State Dept Takedown on 3D-Printed Gun (Forbes) — The government says it wants to review the files for compliance with arms export control laws known as the International Traffic in Arms Regulations, or ITAR. By uploading the weapons files to the Internet and allowing them to be downloaded abroad, the letter implies Wilson’s high-tech gun group may have violated those export controls.
- Data Science of the Facebook World (Stephen Wolfram) — More than a million people have now used our Wolfram|Alpha Personal Analytics for Facebook. And as part of our latest update, in addition to collecting some anonymized statistics, we launched a Data Donor program that allows people to contribute detailed data to us for research purposes. A few weeks ago we decided to start analyzing all this data… (via Phil Earnhardt)