- The Obfuscation of Culture — Tumblr and LJ users sep ar ate w ords thr ou gh o dd spacin g in o rde r to fo ol sea rc h en g i nes. Chinese users hide political messages in image attachments to seemingly benign posts on Weibo. General Pretraeus communicated solely through draft mode. 4chan scares away the faint of heart with porn. More technically astute groups communicate through obscure messaging systems. (via Beta Knowledge)
- log2viz — an open-source demonstration of the logs-as-data concept for Heroku apps. Log in and select one of your apps to see a live-updating dashboard of its web activity.
- Doctorow at LoC (YouTube) — video of Cory Doctorow’s talk on ebooks, libraries, and copyright at the Library of Congress.
- When TED Lost Control of its Crowd (HBR) — golden case study. You can’t “manage” a crowd—or a community—through transactional exchanges or economic incentives. You need something stronger: shared purpose
ENTRIES TAGGED "data"
- A Quantitative Literary History of 2,958 Nineteenth-Century British Novels: The Semantic Cohort Method (PDF) — This project was simultaneously an experiment in developing quantitative and computational methods for tracing changes in literary language. We wanted to see how far quantifiable features such as word usage could be pushed toward the investigation of literary history. Could we leverage quantitative methods in ways that respect the nuance and complexity we value in the humanities? To this end, we present a second set of results, the techniques and methodological lessons gained in the course of designing and running this project. Even litcrit becoming a data game.
- Easy6502 — get started writing 6502 assembly language. Fun way to get started with low-level coding.
- How Analytics Really Work at a Small Startup (Pete Warden) — The key for us is that we’re using the information we get primarily for decision-making (should we build out feature X?) rather than optimization (how can we improve feature X?). Nice rundown of tools and systems he uses, with plug for KissMetrics.
Comparing Algorithms, Programming & Visual Arts, Data Brokers, and Your Brain on Ebooks
- mlcomp — a free website for objectively comparing machine learning programs across various datasets for multiple problem domains.
- Printing Code: Programming and the Visual Arts (Vimeo) — Rune Madsen’s talk from Heroku’s Waza. (via Andrew Odewahn)
- What Data Brokers Know About You (ProPublica) — excellent run-down on the compilers of big data about us. Where are they getting all this info? The stores where you shop sell it to them.
- Subjective Impressions Do Not Mirror Online Reading Effort: Concurrent EEG-Eyetracking Evidence from the Reading of Books and Digital Media (PLOSone) — Comprehension accuracy did not differ across the three media for either group and EEG and eye fixations were the same. Yet readers stated they preferred paper. That preference, the authors conclude, isn’t because it’s less readable. From this perspective, the subjective ratings of our participants (and those in previous studies) may be viewed as attitudes within a period of cultural change.
Kate Crawford argues for caution and care in data-driven decision making.
The biggest problems will almost always be those for which the size of the data is part of the problem.
In-memory data management brings data close to the computation.
Data Jurisdiction, TimBL Frowns, Google Transparency, and Secure Tools
- FISA Amendment Hits Non-Citizens — FISAAA essentially makes it lawful for the US to conduct purely political surveillance on foreigners’ data accessible in US Cloud providers. [...] [A] US judiciary subcommittee on FISAAA in 2008 stated that the Fourth Amendment has no relevance to non-US persons. Americans, think about how you’d feel keeping your email, CRM, accounts, and presentations on Russian or Chinese servers given the trust you have in those regimes. That’s how the rest of the world feels about American-provided services. Which jurisdiction isn’t constantly into invasive snooping, yet still has great bandwidth?
- Tim Berners-Lee Opposes Government Snooping — “The whole thing seems to me fraught with massive dangers and I don’t think it’s a good idea,” he said in reply to a question about the Australian government’s data retention plan.
- Google’s Approach to Government Requests for Information (Google Blog) — they’ve raised the dialogue about civil liberties by being so open about the requests for information they receive. Telcos and banks still regard these requests as a dirty secret that can’t be talked about, whereas Google gets headlines in NPR and CBS for it.
- Open Internet Tools Project — supports and incubates a collection of free and open source projects that enable anonymous, secure, reliable, and unrestricted communication on the Internet. Its goal is to enable people to talk directly to each other without being censored, surveilled or restricted.
- Bruce Sterling Interview — It changed my work profoundly when I realized I could talk to a global audience on the Internet, although I was legally limited from doing that by national publishing systems. The lack of any global book market has much reduced my interest in publishing books. National systems don’t “publish” me, but rather conceal me. This especially happens to writers outside the Anglophone market, but I know a lot of them, and I’ve become sensitized to their issues. It’s one of the general issues of globalization.
- bAdmin — database of default usernames and passwords for popular software. (via Reddit /r/netsec)
- Just Post It: The Lesson from Two Cases of Fabricated Data Detected by Statistics Alone (Uri Simonsohn) — I argue that requiring authors to post the raw data supporting their published results has, among many other benefits, that of making fraud much less likely to go undetected. I illustrate this point by describing two cases of fraud I identified exclusively through statistical analysis of reported means and standard deviations. Analyses of the raw data behind these provided invaluable confirmation of the initial suspicions, ruling out benign explanations (e.g., reporting errors, unusual distributions), identifying additional signs of fabrication, and also ruling out one of the suspected fraudster’s explanations for his anomalous results. (via The Atlantic)
A joint effort by New York City, San Francisco, and Yelp brings government health data into Yelp reviews.
Industrial Control System Security, Geographic Pricing, Hacker Scouting, pressureNET Visualization
- Improving the Security Posture of Industrial Control Systems (NSA) — common-sense that owners of ICS should already be doing, but which (because it comes from the NSA) hopefully they’ll listen to. See also Wired article on NSA targeting domestic SCADA systems.
- Geographic Pricing Online (Wall Street) — Staples, Discover Financial Services, Rosetta Stone, and Home Depot offer discounts if you’re close to a competitor, higher prices otherwise. [U]sing geography as a pricing tool can also reinforce patterns that e-commerce had promised to erase: prices that are higher in areas with less competition, including rural or poor areas. It diminishes the Internet’s role as an equalizer.
- Hacker Scouting (NPR) — teaching kids to be safe and competent in the world of technology, just as traditional scouting teaches them to be safe and competent in the world of nature.
- pressureNET Data Visualization — open source barometric data-gathering software which runs on Android devices. Source is on GitHub.