- Behind the Banner — visualization of what happens in the 150ms when the cabal of data vultures decide which ad to show you. They pass around your data as enthusiastically as a pipe at a Grateful Dead concert, and you’ve just as much chance of getting it back. (via John Battelle)
- pwnpad — Nexus 7 with Android and Ubuntu, high-gain USB bluetooth, ethernet adapter, and a gorgeous suite of security tools. (via Kyle Young)
- Terra — a simple, statically-typed, compiled language with manual memory management [...] designed from the beginning to interoperate with Lua. Terra functions are first-class Lua values created using the terra keyword. When needed they are JIT-compiled to machine code. (via Hacker News)
- Metaphor Identification in Large Texts Corpora (PLOSone) — The paper presents the most comprehensive study of metaphor identification in terms of scope of metaphorical phrases and annotated corpora size. Algorithms’ performance in identifying linguistic phrases as metaphorical or literal has been compared to human judgment. Overall, the algorithms outperform the state-of-the-art algorithm with 71% precision and 27% averaged improvement in prediction over the base-rate of metaphors in the corpus.
ENTRIES TAGGED "data"
Four short links: 14 May 2013
Privacy: Gone in 150ms, Pen-Testing Tablet, Low-Level in Lua, and Metaphor Identification Shootout
Another Serving of Data Skepticism
I was thrilled to receive an invitation to a new meetup: the NYC Data Skeptics Meetup. If you’re in the New York area, and you’re interested in seeing data used honestly, stop by!
That announcement pushed me to write another post about data skepticism. The past few days, I’ve seen a resurgence of the slogan that correlation…
A different take on data skepticism
Our tools should make common cases easy and safe, but that's not the reality today.
Recently, the Mathbabe (aka Cathy O’Neil) vented some frustration about the pitfalls in applying even simple machine learning (ML) methods like k-nearest neighbors. As data science is democratized, she worries that naive practitioners will shoot themselves in the foot because these tools can offer very misleading results. Maybe data science is best left to the pros? Mike…
Data skepticism
If data scientists aren't skeptical about how they use and analyze data, who will be?
A couple of months ago, I wrote that “big data” is heading toward the trough of a hype curve as a result of oversized hype and promises. That’s certainly true. I see more expressions of skepticism about the value of data every day. Some of the skepticism is a reaction against the hype; a lot of it arises…
The re-emergence of time-series
Researchers begin to scale up pattern recognition, machine-learning, and data management tools.
My first job after leaving academia was as a quant 1 for a hedge fund, where I performed (what are now referred to as) data science tasks on financial time-series. I primarily used techniques from probability & statistics, econometrics, and optimization, with occasional forays into machine-learning (clustering, classification, anomalies). More recently, I’ve been closely following the emergence of…
Four short links: 5 April 2013
Hi-Res Long-Distance, Robot Ants, Data Liberation, and Network Neutrality
- Millimetre-Accuracy 3D Imaging From 1km Away (The Register) — With further development, Heriot-Watt University Research Fellow Aongus McCarthy says, the system could end up both portable and with a range of up to 10 Km. See the paper for the full story.
- Robot Ants With Pheromones of Light (PLoS Comp Biol) — see also the video. (via IEEE Spectrum’s AI blog)
- tabula — open source tool for liberating data tables trapped inside PDF files. (via Source)
- There’s No Economic Imperative to Reconsider an Open Internet (SSRN) — The debate on the neutrality of Internet access isn’t new, and if its intensity varies over time, it has for a long while tainted the relationship between Internet Service Providers (ISPs) and Online Service Providers (OSPs). This paper explores the economic relationship between these two types of players, examines in laymen’s terms how the traffic can be routed efficiently and the associated cost of that routing. The paper then assesses various arguments in support of net discrimination to conclude that there is no threat to the internet economy such that reconsidering something as precious as an open internet would be necessary. (via Hamish MacEwan)
Four short links: 4 April 2013
Bootstrap Fun, Digital Public Library, Snake Robots, and Aboriginal Data
- geo-bootstrap — Twitter Bootstrap fork that looks like a classic geocities page. Because. (via Narciso Jaramillo)
- Digital Public Library of America — public libraries sharing full text and metadata for scans, coordinating digitisation, maximum reuse. See The Verge piece. (via Dan Cohen)
- Snake Robots — I don’t think this is a joke. The snake robot’s versatile abilities make it a useful tool for reaching locations or viewpoints that humans or other equipment cannot. The robots are able to climb to a high vantage point, maneuver through a variety of terrains, and fit through tight spaces like fences or pipes. These abilities can be useful for scouting and reconnaissance applications in either urban or natural environments. Watch the video, the nightmares will haunt you. (via Aaron Straup Cope)
- The Power of Data in Aboriginal Hands (PDF) — critique of government statistical data gathering of Aboriginal populations. That ABS [Australian Bureau of Statistics] survey is designed to assist governments, commentators or academics who want to construct policies that shape our lives or encourage a one-sided public discourse about us and our position in the Australian nation. The survey does not provide information that Indigenous people can use to advance our position because the data is aggregated at the national or state level or within the broad ABS categories of very remote, remote, regional or urban Australia. These categories are constructed in the imagination of the Australian nation state. They are not geographic, social or cultural spaces that have relevance to Aboriginal people. [...] The Australian nation’s foundation document of 1901 explicitly excluded Indigenous people from being counted in the national census. That provision in the constitution, combined with Section 51, sub section 26, which empowered the Commonwealth to make special laws for ‘the people of any race, other than the Aboriginal race in any State’ was an unambiguous and defining statement about Australian nation building. The Founding Fathers mandated the federated governments of Australia to oversee the disappearance of Aboriginal people in Australia.
Four short links: 3 April 2013
Binary Data Is Back, Scala Data, Visualization Grammar, and Pastebin Monitor
- Capn Proto — open source faster protocol buffers (binary data interchange format and RPC system).
- Saddle — a high performance data manipulation library for Scala.
- Vega — a visualization grammar, a declarative format for creating, saving and sharing visualization designs. (via Flowing Data)
- dumpmon — Twitter bot that monitors paste sites for password dumps and other sensitive information. Source on github, see the announcement for more.
Four short links: 27 March 2013
Social Science, YAKVS, Open Source Mail, and Tesla Coil and Quadrocopter Fun
- The Effect of Group Attachment and Social Position on Prosocial Behavior (PLoSone) — notable, in my mind, for We conducted lab-in-the-field experiments involving 2,597 members of producer organizations in rural Uganda. cf the recently reported “rich are more selfish than poor” findings, which (like a lot of behavioural economics research) studies Berkeley undergrads who weren’t smart enough to figure out what was being studied.
- elephant — a HTTP key/value store with full-text search and fast queries. Still a work in progress.
- geary (IndieGoGo) — a beautiful modern open-source email client. Found this roughly the same time as elasticinbox open source, reliable, distributed, scalable email store. Open source email action starting?
- The Faraday Copter (YouTube) — Tesla coil and quadrocopter madness. (via Jeff Jonas)
Four short links: 21 March 2013
Obfuscation, Logging, Copyright, and Control
- The Obfuscation of Culture — Tumblr and LJ users sep ar ate w ords thr ou gh o dd spacin g in o rde r to fo ol sea rc h en g i nes. Chinese users hide political messages in image attachments to seemingly benign posts on Weibo. General Pretraeus communicated solely through draft mode. 4chan scares away the faint of heart with porn. More technically astute groups communicate through obscure messaging systems. (via Beta Knowledge)
- log2viz — an open-source demonstration of the logs-as-data concept for Heroku apps. Log in and select one of your apps to see a live-updating dashboard of its web activity.
- Doctorow at LoC (YouTube) — video of Cory Doctorow’s talk on ebooks, libraries, and copyright at the Library of Congress.
- When TED Lost Control of its Crowd (HBR) — golden case study. You can’t “manage” a crowd—or a community—through transactional exchanges or economic incentives. You need something stronger: shared purpose
Radar
Radar on
Radar on
Radar on
Radar on 