- The Effect of Group Attachment and Social Position on Prosocial Behavior (PLoSone) — notable, in my mind, for We conducted lab-in-the-field experiments involving 2,597 members of producer organizations in rural Uganda. cf the recently reported “rich are more selfish than poor” findings, which (like a lot of behavioural economics research) studies Berkeley undergrads who weren’t smart enough to figure out what was being studied.
- elephant — a HTTP key/value store with full-text search and fast queries. Still a work in progress.
- geary (IndieGoGo) — a beautiful modern open-source email client. Found this roughly the same time as elasticinbox open source, reliable, distributed, scalable email store. Open source email action starting?
- The Faraday Copter (YouTube) — Tesla coil and quadrocopter madness. (via Jeff Jonas)
ENTRIES TAGGED "data"
Social Science, YAKVS, Open Source Mail, and Tesla Coil and Quadrocopter Fun
- A Quantitative Literary History of 2,958 Nineteenth-Century British Novels: The Semantic Cohort Method (PDF) — This project was simultaneously an experiment in developing quantitative and computational methods for tracing changes in literary language. We wanted to see how far quantifiable features such as word usage could be pushed toward the investigation of literary history. Could we leverage quantitative methods in ways that respect the nuance and complexity we value in the humanities? To this end, we present a second set of results, the techniques and methodological lessons gained in the course of designing and running this project. Even litcrit becoming a data game.
- Easy6502 — get started writing 6502 assembly language. Fun way to get started with low-level coding.
- How Analytics Really Work at a Small Startup (Pete Warden) — The key for us is that we’re using the information we get primarily for decision-making (should we build out feature X?) rather than optimization (how can we improve feature X?). Nice rundown of tools and systems he uses, with plug for KissMetrics.
Comparing Algorithms, Programming & Visual Arts, Data Brokers, and Your Brain on Ebooks
- mlcomp — a free website for objectively comparing machine learning programs across various datasets for multiple problem domains.
- Printing Code: Programming and the Visual Arts (Vimeo) — Rune Madsen’s talk from Heroku’s Waza. (via Andrew Odewahn)
- What Data Brokers Know About You (ProPublica) — excellent run-down on the compilers of big data about us. Where are they getting all this info? The stores where you shop sell it to them.
- Subjective Impressions Do Not Mirror Online Reading Effort: Concurrent EEG-Eyetracking Evidence from the Reading of Books and Digital Media (PLOSone) — Comprehension accuracy did not differ across the three media for either group and EEG and eye fixations were the same. Yet readers stated they preferred paper. That preference, the authors conclude, isn’t because it’s less readable. From this perspective, the subjective ratings of our participants (and those in previous studies) may be viewed as attitudes within a period of cultural change.
Kate Crawford argues for caution and care in data-driven decision making.
The biggest problems will almost always be those for which the size of the data is part of the problem.
In-memory data management brings data close to the computation.
Data Jurisdiction, TimBL Frowns, Google Transparency, and Secure Tools
- FISA Amendment Hits Non-Citizens — FISAAA essentially makes it lawful for the US to conduct purely political surveillance on foreigners’ data accessible in US Cloud providers. [...] [A] US judiciary subcommittee on FISAAA in 2008 stated that the Fourth Amendment has no relevance to non-US persons. Americans, think about how you’d feel keeping your email, CRM, accounts, and presentations on Russian or Chinese servers given the trust you have in those regimes. That’s how the rest of the world feels about American-provided services. Which jurisdiction isn’t constantly into invasive snooping, yet still has great bandwidth?
- Tim Berners-Lee Opposes Government Snooping — “The whole thing seems to me fraught with massive dangers and I don’t think it’s a good idea,” he said in reply to a question about the Australian government’s data retention plan.
- Google’s Approach to Government Requests for Information (Google Blog) — they’ve raised the dialogue about civil liberties by being so open about the requests for information they receive. Telcos and banks still regard these requests as a dirty secret that can’t be talked about, whereas Google gets headlines in NPR and CBS for it.
- Open Internet Tools Project — supports and incubates a collection of free and open source projects that enable anonymous, secure, reliable, and unrestricted communication on the Internet. Its goal is to enable people to talk directly to each other without being censored, surveilled or restricted.
- Bruce Sterling Interview — It changed my work profoundly when I realized I could talk to a global audience on the Internet, although I was legally limited from doing that by national publishing systems. The lack of any global book market has much reduced my interest in publishing books. National systems don’t “publish” me, but rather conceal me. This especially happens to writers outside the Anglophone market, but I know a lot of them, and I’ve become sensitized to their issues. It’s one of the general issues of globalization.
- bAdmin — database of default usernames and passwords for popular software. (via Reddit /r/netsec)
- Just Post It: The Lesson from Two Cases of Fabricated Data Detected by Statistics Alone (Uri Simonsohn) — I argue that requiring authors to post the raw data supporting their published results has, among many other benefits, that of making fraud much less likely to go undetected. I illustrate this point by describing two cases of fraud I identified exclusively through statistical analysis of reported means and standard deviations. Analyses of the raw data behind these provided invaluable confirmation of the initial suspicions, ruling out benign explanations (e.g., reporting errors, unusual distributions), identifying additional signs of fabrication, and also ruling out one of the suspected fraudster’s explanations for his anomalous results. (via The Atlantic)