ENTRIES TAGGED "data"

Big data vs. big reality

It's not the data itself but what you do with it that counts.

This post originally appeared on Cumulus Partners. It’s republished with permission. Quentin Hardy’s recent post in the Bits blog of The New York Times touched on the gap between representation and reality that is a core element of practically every human enterprise. His post is titled “Why Big Data is Not Truth,” and I recommend it for anyone…
Read Full Post | Comments: 5 |
Four short links: 28 May 2013

Four short links: 28 May 2013

Geeky Primer, Visible CSS, Remote Working, and Raspberry Pi Sentiment Server

  1. My Little Geek — children’s primer with a geeky bent. A is for Android, B is for Binary, C is for Caffeine …. They have a Kickstarter for two sequels: numbers and shapes.
  2. Visible CSS RulesEnter a url to see how the css rules interact with that page.
  3. How to Work Remotely — none of this is rocket science, it’s all true and things we had to learn the hard way.
  4. Raspberry Pi Twitter Sentiment Server — step-by-step guide, and github repo for the lazy. (via Jason Bell)
Comment |
Four short links: 22 May 2013

Four short links: 22 May 2013

New Kinect, Surveillance of Things, How to Criticise, and Compensating for Population

  1. XBox One Kinect Controller (Guardian) — the new Kinect controller can detect gaze, heartbeat, and the buttons on your shirt.
  2. Surveillance and the Internet of Things (Bruce Schneier) — Lots has been written about the “Internet of Things” and how it will change society for the better. It’s true that it will make a lot of wonderful things possible, but the “Internet of Things” will also allow for an even greater amount of surveillance than there is today. The Internet of Things gives the governments and corporations that follow our every move something they don’t yet have: eyes and ears.
  3. Daniel Dennett’s Intuition Pumps (extract)How to compose a successful critical commentary: 1. Attempt to re-express your target’s position so clearly, vividly and fairly that your target says: “Thanks, I wish I’d thought of putting it that way.” 2. List any points of agreement (especially if they are not matters of general or widespread agreement). 3. Mention anything you have learned from your target.4. Only then are you permitted to say so much as a word of rebuttal or criticism.
  4. New Data Science Toolkit Out (Pete Warden) — with population data to let you compensate for population in your heatmaps. No more “gosh, EVERYTHING is more prevalent where there are lots of people!” meaningless charts.
Comment |
Four short links: 14 May 2013

Four short links: 14 May 2013

Privacy: Gone in 150ms, Pen-Testing Tablet, Low-Level in Lua, and Metaphor Identification Shootout

  1. Behind the Banner — visualization of what happens in the 150ms when the cabal of data vultures decide which ad to show you. They pass around your data as enthusiastically as a pipe at a Grateful Dead concert, and you’ve just as much chance of getting it back. (via John Battelle)
  2. pwnpad — Nexus 7 with Android and Ubuntu, high-gain USB bluetooth, ethernet adapter, and a gorgeous suite of security tools. (via Kyle Young)
  3. Terraa simple, statically-typed, compiled language with manual memory management [...] designed from the beginning to interoperate with Lua. Terra functions are first-class Lua values created using the terra keyword. When needed they are JIT-compiled to machine code. (via Hacker News)
  4. Metaphor Identification in Large Texts Corpora (PLOSone) — The paper presents the most comprehensive study of metaphor identification in terms of scope of metaphorical phrases and annotated corpora size. Algorithms’ performance in identifying linguistic phrases as metaphorical or literal has been compared to human judgment. Overall, the algorithms outperform the state-of-the-art algorithm with 71% precision and 27% averaged improvement in prediction over the base-rate of metaphors in the corpus.
Comment |

Another Serving of Data Skepticism

I was thrilled to receive an invitation to a new meetup: the NYC Data Skeptics Meetup. If you’re in the New York area, and you’re interested in seeing data used honestly, stop by! That announcement pushed me to write another post about data skepticism. The past few days, I’ve seen a resurgence of the slogan that correlation…
Read Full Post | Comments: 3 |

A different take on data skepticism

Our tools should make common cases easy and safe, but that's not the reality today.

Recently, the Mathbabe (aka Cathy O’Neil) vented some frustration about the pitfalls in applying even simple machine learning (ML) methods like k-nearest neighbors. As data science is democratized, she worries that naive practitioners will shoot themselves in the foot because these tools can offer very misleading results. Maybe data science is best left to the pros? Mike…
Read Full Post | Comment: 1 |

Data skepticism

If data scientists aren't skeptical about how they use and analyze data, who will be?

A couple of months ago, I wrote that “big data” is heading toward the trough of a hype curve as a result of oversized hype and promises. That’s certainly true. I see more expressions of skepticism about the value of data every day. Some of the skepticism is a reaction against the hype; a lot of it arises…
Read Full Post | Comments: 5 |

The re-emergence of time-series

Researchers begin to scale up pattern recognition, machine-learning, and data management tools.

My first job after leaving academia was as a quant 1 for a hedge fund, where I performed (what are now referred to as) data science tasks on financial time-series. I primarily used techniques from probability & statistics, econometrics, and optimization, with occasional forays into machine-learning (clustering, classification, anomalies). More recently, I’ve been closely following the emergence of…
Read Full Post | Comment |
Four short links: 5 April 2013

Four short links: 5 April 2013

Hi-Res Long-Distance, Robot Ants, Data Liberation, and Network Neutrality

  1. Millimetre-Accuracy 3D Imaging From 1km Away (The Register) — With further development, Heriot-Watt University Research Fellow Aongus McCarthy says, the system could end up both portable and with a range of up to 10 Km. See the paper for the full story.
  2. Robot Ants With Pheromones of Light (PLoS Comp Biol) — see also the video. (via IEEE Spectrum’s AI blog)
  3. tabula — open source tool for liberating data tables trapped inside PDF files. (via Source)
  4. There’s No Economic Imperative to Reconsider an Open Internet (SSRN) — The debate on the neutrality of Internet access isn’t new, and if its intensity varies over time, it has for a long while tainted the relationship between Internet Service Providers (ISPs) and Online Service Providers (OSPs). This paper explores the economic relationship between these two types of players, examines in laymen’s terms how the traffic can be routed efficiently and the associated cost of that routing. The paper then assesses various arguments in support of net discrimination to conclude that there is no threat to the internet economy such that reconsidering something as precious as an open internet would be necessary. (via Hamish MacEwan)
Comment |
Four short links: 4 April 2013

Four short links: 4 April 2013

Bootstrap Fun, Digital Public Library, Snake Robots, and Aboriginal Data

  1. geo-bootstrap — Twitter Bootstrap fork that looks like a classic geocities page. Because. (via Narciso Jaramillo)
  2. Digital Public Library of America — public libraries sharing full text and metadata for scans, coordinating digitisation, maximum reuse. See The Verge piece. (via Dan Cohen)
  3. Snake Robots — I don’t think this is a joke. The snake robot’s versatile abilities make it a useful tool for reaching locations or viewpoints that humans or other equipment cannot. The robots are able to climb to a high vantage point, maneuver through a variety of terrains, and fit through tight spaces like fences or pipes. These abilities can be useful for scouting and reconnaissance applications in either urban or natural environments. Watch the video, the nightmares will haunt you. (via Aaron Straup Cope)
  4. The Power of Data in Aboriginal Hands (PDF) — critique of government statistical data gathering of Aboriginal populations. That ABS [Australian Bureau of Statistics] survey is designed to assist governments, commentators or academics who want to construct policies that shape our lives or encourage a one-sided public discourse about us and our position in the Australian nation. The survey does not provide information that Indigenous people can use to advance our position because the data is aggregated at the national or state level or within the broad ABS categories of very remote, remote, regional or urban Australia. These categories are constructed in the imagination of the Australian nation state. They are not geographic, social or cultural spaces that have relevance to Aboriginal people. [...] The Australian nation’s foundation document of 1901 explicitly excluded Indigenous people from being counted in the national census. That provision in the constitution, combined with Section 51, sub section 26, which empowered the Commonwealth to make special laws for ‘the people of any race, other than the Aboriginal race in any State’ was an unambiguous and defining statement about Australian nation building. The Founding Fathers mandated the federated governments of Australia to oversee the disappearance of Aboriginal people in Australia.
Comment |