ENTRIES TAGGED "data"

Four short links: 27 March 2013

Four short links: 27 March 2013

Social Science, YAKVS, Open Source Mail, and Tesla Coil and Quadrocopter Fun

  1. The Effect of Group Attachment and Social Position on Prosocial Behavior (PLoSone) — notable, in my mind, for We conducted lab-in-the-field experiments involving 2,597 members of producer organizations in rural Uganda. cf the recently reported “rich are more selfish than poor” findings, which (like a lot of behavioural economics research) studies Berkeley undergrads who weren’t smart enough to figure out what was being studied.
  2. elephanta HTTP key/value store with full-text search and fast queries. Still a work in progress.
  3. geary (IndieGoGo) — a beautiful modern open-source email client. Found this roughly the same time as elasticinbox open source, reliable, distributed, scalable email store. Open source email action starting?
  4. The Faraday Copter (YouTube) — Tesla coil and quadrocopter madness. (via Jeff Jonas)
Comment |
Four short links: 21 March 2013

Four short links: 21 March 2013

Obfuscation, Logging, Copyright, and Control

  1. The Obfuscation of CultureTumblr and LJ users sep ar ate w ords thr ou gh o dd spacin g in o rde r to fo ol sea rc h en g i nes. Chinese users hide political messages in image attachments to seemingly benign posts on Weibo. General Pretraeus communicated solely through draft mode. 4chan scares away the faint of heart with porn. More technically astute groups communicate through obscure messaging systems. (via Beta Knowledge)
  2. log2vizan open-source demonstration of the logs-as-data concept for Heroku apps. Log in and select one of your apps to see a live-updating dashboard of its web activity.
  3. Doctorow at LoC (YouTube) — video of Cory Doctorow’s talk on ebooks, libraries, and copyright at the Library of Congress.
  4. When TED Lost Control of its Crowd (HBR) — golden case study. You can’t “manage” a crowd—or a community—through transactional exchanges or economic incentives. You need something stronger: shared purpose
Comment |
Four short links: 18 March 2013

Four short links: 18 March 2013

Big Lit Data, 6502 Assembly, Small Startup Analytics, and Javascript Heatmaps

  1. A Quantitative Literary History of 2,958 Nineteenth-Century British Novels: The Semantic Cohort Method (PDF) — This project was simultaneously an experiment in developing quantitative and computational methods for tracing changes in literary language. We wanted to see how far quantifiable features such as word usage could be pushed toward the investigation of literary history. Could we leverage quantitative methods in ways that respect the nuance and complexity we value in the humanities? To this end, we present a second set of results, the techniques and methodological lessons gained in the course of designing and running this project. Even litcrit becoming a data game.
  2. Easy6502get started writing 6502 assembly language. Fun way to get started with low-level coding.
  3. How Analytics Really Work at a Small Startup (Pete Warden) — The key for us is that we’re using the information we get primarily for decision-making (should we build out feature X?) rather than optimization (how can we improve feature X?). Nice rundown of tools and systems he uses, with plug for KissMetrics.
  4. webgl-heatmap (GitHub) — a JavaScript library for high performance heatmap display.
Comment |
Four short links: 8 March 2013

Four short links: 8 March 2013

Comparing Algorithms, Programming & Visual Arts, Data Brokers, and Your Brain on Ebooks

  1. mlcompa free website for objectively comparing machine learning programs across various datasets for multiple problem domains.
  2. Printing Code: Programming and the Visual Arts (Vimeo) — Rune Madsen’s talk from Heroku’s Waza. (via Andrew Odewahn)
  3. What Data Brokers Know About You (ProPublica) — excellent run-down on the compilers of big data about us. Where are they getting all this info? The stores where you shop sell it to them.
  4. Subjective Impressions Do Not Mirror Online Reading Effort: Concurrent EEG-Eyetracking Evidence from the Reading of Books and Digital Media (PLOSone) — Comprehension accuracy did not differ across the three media for either group and EEG and eye fixations were the same. Yet readers stated they preferred paper. That preference, the authors conclude, isn’t because it’s less readable. From this perspective, the subjective ratings of our participants (and those in previous studies) may be viewed as attitudes within a period of cultural change.
Comment |

Untangling algorithmic illusions from reality in big data

Kate Crawford argues for caution and care in data-driven decision making.

Microsoft principal researcher Kate Crawford (@katecrawford) gave a strong talk at last week’s Strata Conference in Santa Clara, Calif. about the limits of big data. She pointed out potential biases in data collection, questioned who may be excluded from it, and hammered home the constant need for context in conclusions. Video of her talk is embedded below: Crawford explored…
Read Full Post | Comment |

Big data is dead, long live big data: Thoughts heading to Strata

The biggest problems will almost always be those for which the size of the data is part of the problem.

A recent VentureBeat article argues that “Big Data” is dead. It’s been killed by marketers. That’s an understandable frustration (and a little ironic to read about it in that particular venue). As I said sarcastically the other day, “Put your Big Data in the Cloud with a Hadoop.” You don’t have to read much industry news to get…
Read Full Post | Comments: 3 |

An update on in-memory data management

In-memory data management brings data close to the computation.

By Ben Lorica and Roger Magoulas We wanted to give you a brief update on what we’ve learned so far from our series of interviews with players and practitioners in the in-memory data management space. A few preliminary themes have emerged, some expected, others surprising. Performance improves as you put data as close to the computation as…
Read Full Post | Comment: 1 |
Four short links: 29 January 2013

Four short links: 29 January 2013

Data Jurisdiction, TimBL Frowns, Google Transparency, and Secure Tools

  1. FISA Amendment Hits Non-CitizensFISAAA essentially makes it lawful for the US to conduct purely political surveillance on foreigners’ data accessible in US Cloud providers. [...] [A] US judiciary subcommittee on FISAAA in 2008 stated that the Fourth Amendment has no relevance to non-US persons. Americans, think about how you’d feel keeping your email, CRM, accounts, and presentations on Russian or Chinese servers given the trust you have in those regimes. That’s how the rest of the world feels about American-provided services. Which jurisdiction isn’t constantly into invasive snooping, yet still has great bandwidth?
  2. Tim Berners-Lee Opposes Government Snooping“The whole thing seems to me fraught with massive dangers and I don’t think it’s a good idea,” he said in reply to a question about the Australian government’s data retention plan.
  3. Google’s Approach to Government Requests for Information (Google Blog) — they’ve raised the dialogue about civil liberties by being so open about the requests for information they receive. Telcos and banks still regard these requests as a dirty secret that can’t be talked about, whereas Google gets headlines in NPR and CBS for it.
  4. Open Internet Tools Projectsupports and incubates a collection of free and open source projects that enable anonymous, secure, reliable, and unrestricted communication on the Internet. Its goal is to enable people to talk directly to each other without being censored, surveilled or restricted.
Comment |
Four short links: 18 January 2013

Four short links: 18 January 2013

Audience Fragmentation, Default Passwords, Fabricated Data, and Javascript in Minecraft

  1. Bruce Sterling InterviewIt changed my work profoundly when I realized I could talk to a global audience on the Internet, although I was legally limited from doing that by national publishing systems. The lack of any global book market has much reduced my interest in publishing books. National systems don’t “publish” me, but rather conceal me. This especially happens to writers outside the Anglophone market, but I know a lot of them, and I’ve become sensitized to their issues. It’s one of the general issues of globalization.
  2. bAdmin — database of default usernames and passwords for popular software. (via Reddit /r/netsec)
  3. Just Post It: The Lesson from Two Cases of Fabricated Data Detected by Statistics Alone (Uri Simonsohn) — I argue that requiring authors to post the raw data supporting their published results has, among many other benefits, that of making fraud much less likely to go undetected. I illustrate this point by describing two cases of fraud I identified exclusively through statistical analysis of reported means and standard deviations. Analyses of the raw data behind these provided invaluable confirmation of the initial suspicions, ruling out benign explanations (e.g., reporting errors, unusual distributions), identifying additional signs of fabrication, and also ruling out one of the suspected fraudster’s explanations for his anomalous results. (via The Atlantic)
  4. ScriptCraft — Javascript in Minecraft. Important because All The Kids play Minecraft. (via Javascript Weekly)
Comment |

Yelp partners with NYC and SF on restaurant inspection data

A joint effort by New York City, San Francisco, and Yelp brings government health data into Yelp reviews.

One of the key notions in my “Government as a Platform” advocacy has been that there are other ways to partner with the private sector besides hiring contractors and buying technology. One of the best of these is to provide data that can be used by the private sector to build or enrich their own citizen-facing services. Yes,…
Read Full Post | Comment: 1 |