- Digital Music Consumption on the Internet: Evidence from Clickstream Data (Scribd) — The goal of this paper is to analyze the behavior of digital music consumers on the Internet. Using clickstream data on a panel of more than 16,000 European consumers, we estimate the effects of illegal downloading and legal streaming on the legal purchases of digital music. Our results suggest that Internet users do not view illegal downloading as a substitute to legal digital music. Although positive and signiﬁcant, our estimated elasticities are essentially zero: a 10% increase in clicks on illegal downloading websites leads to a 0.2% increase in clicks on legal purchases websites. Online music streaming services are found to have a somewhat larger (but still small) effect on the purchases of digital sound recordings, suggesting complementarities between these two modes of music consumption. According to our results, a 10% increase in clicks on legal streaming websites lead to up to a 0.7% increase in clicks on legal digital purchases websites. We ﬁnd important cross country difference in these eﬀects. A paper from the EU commission’s in-house science service. (via Don Christie)
- Six Degrees of Francis Bacon — data-driven research into “the early-modern social network”. (via Jonathan Gray)
- Internet Census 2012 — scanning the net via botnet. Appalling how many unsecured devices are directly connected to the net. Also appalling how underused the address space is.
Visualizing City Data, Gigabits Unrealized, Use Open Source, and Bad IPs Cluster
- VizCities Dev Diary — step-by-step recount of how they brought London’s data to life, SimCity-style.
- Google Fibre Isn’t That Impressive — For [gigabit broadband] to become truly useful and necessary, we’ll need to see a long-term feedback loop of utility and acceptance. First, super-fast lines must allow us to do things that we can’t do with the pedestrian internet. This will prompt more people to demand gigabit lines, which will in turn invite developers to create more apps that require high speed, and so on. What I discovered in Kansas City is that this cycle has not yet begun. Or, as Ars Technica put it recently, “The rest of the internet is too slow for Google Fibre.”
- gov.uk Recommendations on Open Source — Use open source software in preference to proprietary or closed source alternatives, in particular for operating systems, networking software, Web servers, databases and programming languages.
- Internet Bad Neighbourhoods (PDF) — bilingual PhD thesis. The idea behind the Internet Bad Neighborhood concept is that the probability of a host in behaving badly increases if its neighboring hosts (i.e., hosts within the same subnetwork) also behave badly. This idea, in turn, can be exploited to improve current Internet security solutions, since it provides an indirect approach to predict new sources of attacks (neighboring hosts of malicious ones).
- A Quantitative Literary History of 2,958 Nineteenth-Century British Novels: The Semantic Cohort Method (PDF) — This project was simultaneously an experiment in developing quantitative and computational methods for tracing changes in literary language. We wanted to see how far quantifiable features such as word usage could be pushed toward the investigation of literary history. Could we leverage quantitative methods in ways that respect the nuance and complexity we value in the humanities? To this end, we present a second set of results, the techniques and methodological lessons gained in the course of designing and running this project. Even litcrit becoming a data game.
- Easy6502 — get started writing 6502 assembly language. Fun way to get started with low-level coding.
- How Analytics Really Work at a Small Startup (Pete Warden) — The key for us is that we’re using the information we get primarily for decision-making (should we build out feature X?) rather than optimization (how can we improve feature X?). Nice rundown of tools and systems he uses, with plug for KissMetrics.
Search Ads Meh, Hacked Website Help, Web Design Sins, and Lazy Correlations
- Consumer Heterogeneity and Paid Search Effectiveness: A Large Scale Field Experiment (PDF) — We ﬁnd that new and infrequent users are positively inﬂuenced by ads but that existing loyal users whose purchasing behavior is not inﬂuenced by paid search account for most of the advertising expenses, resulting in average returns that are negative. We discuss substitution to other channels and implications for advertising decisions in large ﬁrms. eBay-commissioned research, so salt to taste. (via Guardian)
- Google’s Help for Hacked Webmasters — what it says.
- 14 Lousy Web Design Trends Making a Comeback Thanks to HTML 5 — “mystery meat icons” a pet bugbear of mine.
- The Human Microbiome 101 (SlideShare) — SciFoo alum Jonathan Eisen’s talk. Informative, but super-notable for “complexity is astonishing, massive risk for false positive associations”. Remember this the next time your Big Data Scientist (aka kid with R) tells you one surprising variable predicts 66% of anything. I wish I had the audio from this talk!
On Anonymous, Information Rights, RSS Readers, and CDN Sec
- Our Weirdness is Free (Gabriella Coleman) — Often lacking an overarching strategy, Anonymous operates tactically, along the lines proposed by the French Jesuit thinker Michel de Certeau. “Because it does not have a place, a tactic depends on time—it is always on the watch for opportunities that must be seized ‘on the wing,’” he writes in The Practice of Everyday Life (1980). “Whatever it wins, it does not keep. It must constantly manipulate events in order to turn them into ‘opportunities.’ The weak must continually turn to their own ends forces alien to them.” (via Jonas Kubilius)
- Information Rights and Copy Rights (YouTube) — Justice David Harvey’s keynote at Australian Digital Alliance forum, proposing balance of rights. (via Alastair Thompson)
- NewsBlur (GitHub) — one of the many trending repos in the wake of the announcement of Google Reader’s case of terminal lack of relevance to Google+. See also Tiny Tiny RSS, FastLadder, and a million repos empty but for “TODO” files listing the almighty RSS reading features yet to be added to the empty file. Also found: this obsessive guide to Reader’s history.
- The Pentester’s Guide to Akamai (PDF) — This paper summarizes the findings from NCC’s research into Akamai while providing advice to
companies wish to gain the maximum security when leveraging their solutions.
HTML DRM, Visualizing Medical Sciences, Lifelong Learning, and Hardware Hackery
- What Tim Berners-Lee Doesn’t Know About HTML DRM (Guardian) — Cory Doctorow lays it out straight. HTML DRM is a bad idea, no two ways. The future of the Web is the future of the world, because everything we do today involves the net and everything we’ll do tomorrow will require it. Now it proposes to sell out that trust, on the grounds that Big Content will lock up its “content” in Flash if it doesn’t get a veto over Web-innovation. [...] The W3C has a duty to send the DRM-peddlers packing, just as the US courts did in the case of digital TV.
- Visualizing the Topical Structure of the Medical Sciences: A Self-Organizing Map Approach (PLOSone) — a high-resolution visualization of the medical knowledge domain using the self-organizing map (SOM) method, based on a corpus of over two million publications.
- What Teens Get About The Internet That Parents Don’t (The Atlantic) — the Internet has been a lifeline for self-directed learning and connection to peers. In our research, we found that parents more often than not have a negative view of the role of the Internet in learning, but young people almost always have a positive one. (via Clive Thompson)
- Portable C64 — beautiful piece of C64 hardware hacking to embed a screen and battery in it. (via Hackaday)
Chrome Tricks, Sins of Journaling, Icon Font, and Sweet PD
- One Tab — turn tabs into lists, easily. (via Andy Baio)
- Deep Impact: Unintended Consequences of Journal Rank — These data confirm previous suspicions: using journal rank as an assessment tool is bad scientific practice. Moreover, the data lead us to argue that any journal rank (not only the currently-favored Impact Factor) would have this negative impact. Therefore, we suggest that abandoning journals altogether.
- Genericons — useful straightforward icon font.
- Public Domain Review Fundraising — Over the course of our two years we’ve created a large and ever growing archive of some of the most interesting and unusual artefacts in the history of art, literature and ideas. Love the idea of some limited edition reprints of these gorgeous works!
Comparing Algorithms, Programming & Visual Arts, Data Brokers, and Your Brain on Ebooks
- mlcomp — a free website for objectively comparing machine learning programs across various datasets for multiple problem domains.
- Printing Code: Programming and the Visual Arts (Vimeo) — Rune Madsen’s talk from Heroku’s Waza. (via Andrew Odewahn)
- What Data Brokers Know About You (ProPublica) — excellent run-down on the compilers of big data about us. Where are they getting all this info? The stores where you shop sell it to them.
- Subjective Impressions Do Not Mirror Online Reading Effort: Concurrent EEG-Eyetracking Evidence from the Reading of Books and Digital Media (PLOSone) — Comprehension accuracy did not differ across the three media for either group and EEG and eye fixations were the same. Yet readers stated they preferred paper. That preference, the authors conclude, isn’t because it’s less readable. From this perspective, the subjective ratings of our participants (and those in previous studies) may be viewed as attitudes within a period of cultural change.