- Find Communities — algorithm for uncovering communities in networks of millions of nodes, for producing identifiable subgroups as in LinkedIn InMaps. (via Matt Biddulph’s Delicious links)
- Seven Ways to Think Like The Web (Jon Udell) — seven principles that will head off a lot of mistakes. They should be seared into the minds of anyone working in the web. 2. Pass by reference rather than by value. [pass URLs, not copies of data] […] Why? Nobody else cares about your data as much as you do. If other people and other systems source your data from a canonical URL that you advertise and control, then they will always get data that’s as timely and accurate as you care to make it.
- Interview with Marco Arment (Rands in Repose) — Most people assume that online readers primarily view a small number of big-name sites. Nearly everyone who guesses at Instapaper’s top-saved-domain list and its proportions is wrong. The most-saved site is usually The New York Times, The Guardian, or another major traditional newspaper. But it’s only about 2% of all saved articles. The top 10 saved domains are only about 11% of saved articles. (via Courtney Johnston’s Instapaper Feed)
"social graph" entries
Work on data projects that matter, data journalism, and a social graph of the Marvel universe.
This week's big data news includes a call for Data Without Borders, data journalism catches the Knight Foundation's attention, IBM's new big data appliance, and a social graph built around the Marvel universe.
Compressing Graphs, Authentication Usability, Extreme Design, and Rails Geo
- On Compressing Social Networks (PDF) — paper looking at the theory and practice of compressing social network graphs. Our main innovation here is to come up with a quick and useful method for generating an ordering on the social network nodes so that nodes with lots of common neighbors are near each other in the ordering, a property which is useful for compression (via My Biased Coin, via Matt Biddulph on Delicious)
- Requiring Email and Passwords for New Accounts (Instapaper blog) — a list of reasons why the simple signup method of “pick a username, passwords are optional” turned out to be trouble in the long run. (via Courtney Johnston’s Instapaper feed)
- Extreme Design — building the amazing spacelog.org in an equally-amazing fashion. I want a fort.
- rgeo — a new geo library for Rails. (via Daniel Azuma via Glen Barnes on Twitter)
There's a difference between people you know and the people you're like.
Social search is similar to pre-Google traditional search: results feel arbitrary and unreliable. But a focus on similarity could push social search into a new phase.
Preservation, Scaling Social Networks, Monetizing Music, and Android Unopened Source
- Digital Continuity Conference Proceedings — proceedings from a New Zealand conference on digital archiving, preservation, and access for archives, museums, libraries, etc.
- What Are The Scaling Issues to Keep in Mind While Developing a Social Network Feed? (Quora) — insight into why you see the failwhale. (via kellan on Twitter)
- Fan Feeding Frenzy — Amanda Palmer sells $15k in merch and music in 3m via Bandcamp. Is the record available on iTunes yet? Absolutely not. We have nothing against iTunes, it’ll end up there eventually I’m sure, but it was important for us to do this in as close to a DIY manner as possible. If we were just using iTunes, we couldn’t be doing tie-ins with physical product, monitoring our stats (live), and helping people in real-time when they have a question regarding the service. Being able to do all of those things and having such a transparent format in which to do it has been a dream come true. We all buy stuff on the iTunes store – or AmazonMP3 or whatever – but it’s not THE way artists should be connecting to fans, and it’s certainly not the way someone is going to capture the most revenue on a new release. (via BoingBoing)
- Sad State of Open Source in Android Tablets — With the exception of Barnes & Noble’s Nook e-reader, a device that isn’t even really a tablet, I found one tablet manufacturer who was complying with the minimum of their legal open source requirements under GNU GPL. Let alone supporting community development.
Open Data, Open PCR, Open Sara Winge, and Open Source Big Graph Mining
- Learning from Libraries: the Literacy Challenge of Open Data (David Eaves) — a powerful continuation of the theme from my Rethinking Open Data post. David observes that dumping data over the fence isn’t enough, we must help citizens engage. We have a model for that help, in the form of libraries: We didn’t build libraries for an already literate citizenry. We built libraries to help citizens become literate. Today we build open data portals not because we have a data or public policy literate citizenry, we build them so that citizens may become literate in data, visualization, coding and public policy.
- OpenPCR on Kickstarter — In 1983, Kary Mullis first developed PCR, for which he later received a Nobel Prize. But the tool is still expensive, even though the technology is almost 30 years old. If computing grew at the same pace, we would all still be paying $2,000+ for a 1 MHz Apple II computer. Innovation in biotech needs a kick start!
- Wingeing It — profile of O’Reilly’s wonderful Sara Winge by the ever fabulous Quinn Norton.
- PEGASUS — petascale graph mining toolkit from CMU. See their most recent publication. (via univerself on Delicious)
Legal XML, Big Social Data, Crowdsourcing Tips, Copyright Balkanization
- XML in Legislature/Parliament Environments (Sean McGrath) — quite detailed background on the use of XML in legislation drafting systems, and the problems caused by convention in that world–page/line number citations, in particular. (Quick gloat: NZ’s legislature management system is kick-ass, and soon we’ll switch from print authoritative to digital authoritative)
- Large-Scale Social Media Analysis with Hadoop — In this tutorial we will discuss the use of Hadoop for processing large-scale social data sets. We will first cover the map/reduce paradigm in general and subsequently discuss the particulars of Hadoop’s implementation. We will then present several use cases for Hadoop in analyzing example data sets, examining the design and implementation of various algorithms with an emphasis on social network analysis. Accompanying data sets and code will be made available. (via atlamp on Delicious)
- Breaking Monotony with Meaning; Motivation in Crowdsourcing Markets (Crowdflower) — This finding has important implications for those who employ labor in crowdsourcing markets. Companies and intermediaries should develop an understanding of what motivates the people who work on tasks. Employers must think beyond monetary incentives and consider how they can reward workers through non-monetary incentives such as by changing how workers perceive their task. Alienated workers are less likely to do work if they don’t know the context of the work they are doing and employers may find they can get more work done for the same wages simply by telling turkers why they are working.
- Balkanizing the Web — The very absurdity of the global digital system is revealing itself. It created all the instruments for global access and, then, turned around and arbitrarily restricted its commercial use, paving the way for piracy. Think about it: our broadband networks now allow seamless streaming of films, TV shows, music and, soon, of a variety of multimedia products; we have created sophisticated transaction systems; we are getting extraordinary devices to enjoy all this; there is a growing English-speaking population that, for a significant part of it, is solvent and eager to buy this globalized culture and information. But guess what? Instead of a well-crafted, smoothly flowing distribution (and payment) system, we have these Cupertino, Seattle or Los Angeles-engineered restrictions. The U.S. insists on exporting harsh copyright penalties and restrictions, while not exporting license agreements and Fair Use, so the rest of the world gets very grumpy.
Secrets to Success, Sousveillance, Etherpad Lives, Personal Social Networks
- The Ten Commandments of Rock and Roll (BoingBoing) — ten rules that should be posted in every workplace as a guide to how to fail poisonously.
- Snapscouts — rather creepy sousveillance site. It’s up to you to keep America safe! If you see something suspicious, Snap it! If you see someone who doesn’t belong, Snap it! Not sure if someone or something is suspicious? Snap it anyway! I like the idea of promoting a shared interest in keeping us all safe, but I’m not sure SnapScouts is there yet. (update: Ha, it’s a brilliant joke! See the comments for more)
- Diaspora Kickstarter Project — team looking for seed funding to write an aGPLed “privacy aware, personally controlled, do-it-all distributed open source social network” (no news of dessert topping or floor wax applicability). Received 2.5x their requested funding in a few days.
Gov App Building, Android FPS, Graph Mining, Keeping Fit
- Who Is Going To Build The New Public Services? — a thoughtful exploration of the possibilities and challenges of third parties building public software systems. There’s a lot of talk of “just put up the data and we’ll build the apps” but I think this is a more substantial consideration of which apps can be built by whom.
- Quake 3 for Android — kiss the weekend goodbye, NexusOne owners! My theory is that no platform has “made it” until a first person shooter has been ported to it. (via BoingBoing)
- Graph Mining — slides and reading list from seminar series at UCSB on different aspects of mining graphs. Relevant because, obviously, social networks are one such graph to be mined.
- Treadmill Desk — I want one. Staying fit while working at a sedentary job is important but not easy. I tried to type while using a stepper, but that’s just a recipe for incomprehensible typing fail. (via BoingBoing)
Social Network Search for Morons, Bulking Up Bio Data, Better E-Mail, Better Standards
- Spokeo — abysmal indictment of society, first prize in mankind’s race to the bottom. Uncover personal photos, videos, and secrets … GUARANTEED! Spokeo deep searches within 48 major social networks to find truly mouth-watering news about friends and coworkers. PS, anybody who gives their gmail username and password to a site that specializes in dishing dirt can only be described as a fucking idiot. (via Jim Stogdill, who was equally disappointed in our species)
- Biologists rally to sequence ‘neglected’ microbes (Nature) — The Genomic Encyclopedia of Bacteria and Archaea is project to sequence genomes from more branches of the evolutionary tree of life. Eisen’s team selected and sequenced more than 100 ‘neglected’ species that lacked close relatives among the 1,000 genomes already in GenBank. The researchers reported earlier this year at the JGI’s Fourth Annual User Meeting that even mapping the first 56 of these microbes’ genomes increased the rate of discovery of new gene and protein families with new biological properties. It also improved the researchers’ ability to predict the role of genes with unknown functions in already sequenced organisms. (via Jonathan Eisen)
- Mail Learning: The What and the How (Simon Cozens) — a few things that a really good mail analysis tool needs to do. I hope that my mail client and server does these out of the box in the next five years.
- Introducing the Open Web Foundation Agreement — The Open Web Foundation Agreement itself establishes the copyright and patent rights for a specification, ensuring that downstream consumers may freely implement and reuse the licensed specification without seeking further permission. In addition to the agreement itself, we also created an easy-to-read “Deed” that provides a high level overview of the agreement. Applying the open source approach to better standards.