ENTRIES TAGGED "data science"

The backlash against big data, continued

Ignore the hype. Learn to be a data skeptic.

Yawn. Yet another article trashing “big data,” this time an op-ed in the Times. This one is better than most, and ends with the truism that data isn’t a silver bullet. It certainly isn’t. I’ll spare you all the links (most of which are much less insightful than the Times piece), but the backlash against “big data” is clearly in…
Read Full Post | Comments: 5 |

Decision making under uncertainty

Edge contributors say it's time to retire the search for one-size-fits-all answers.

The 2014 Edge Annual Question (EAQ) is out. This year, the question posed to the contributors is: What scientific idea is ready for retirement? As usual with the EAQ, it provokes thought and promotes discussion. I have only read through a fraction of the responses so far, but I think it is important to highlight a few Edge contributors who answered with a…
Read Full Post | Comment |
Four short links: 9 December 2013

Four short links: 9 December 2013

Surveillance Demarcation, NYT Data Scientist, 2D Dart, and Bayesian Database

  1. Reform Government Surveillance — hard not to view this as a demarcation dispute. “Ruthlessly collecting every detail of online behaviour is something we do clandestinely for advertising purposes, it shouldn’t be corrupted because of your obsession over national security!”
  2. Brian Abelson — Data Scientist at the New York Times, blogging what he finds. He tackles questions like what makes a news app “successful” and how might we measure it. Found via this engaging interview at the quease-makingly named Content Strategist.
  3. StageXL — Flash-like 2D package for Dart.
  4. BayesDBlets users query the probable implications of their data as easily as a SQL database lets them query the data itself. Using the built-in Bayesian Query Language (BQL), users with no statistics training can solve basic data science problems, such as detecting predictive relationships between variables, inferring missing values, simulating probable observations, and identifying statistically similar database entries. Open source.
Comment: 1 |

Burning the silos

The boundaries created by traditional management are just getting in the way of reducing product cycle times.

If I’ve seen any theme come up repeatedly over the past year, it’s getting product cycle times down. It’s not the sexiest or most interesting theme, but it’s everywhere: if it’s not on the front burner, it’s always simmering in the background. Cutting product cycles to the bare minimum is one of the main themes of the Velocity Conference and…
Read Full Post | Comment: 1 |
Four short links: 22 May 2013

Four short links: 22 May 2013

New Kinect, Surveillance of Things, How to Criticise, and Compensating for Population

  1. XBox One Kinect Controller (Guardian) — the new Kinect controller can detect gaze, heartbeat, and the buttons on your shirt.
  2. Surveillance and the Internet of Things (Bruce Schneier) — Lots has been written about the “Internet of Things” and how it will change society for the better. It’s true that it will make a lot of wonderful things possible, but the “Internet of Things” will also allow for an even greater amount of surveillance than there is today. The Internet of Things gives the governments and corporations that follow our every move something they don’t yet have: eyes and ears.
  3. Daniel Dennett’s Intuition Pumps (extract)How to compose a successful critical commentary: 1. Attempt to re-express your target’s position so clearly, vividly and fairly that your target says: “Thanks, I wish I’d thought of putting it that way.” 2. List any points of agreement (especially if they are not matters of general or widespread agreement). 3. Mention anything you have learned from your target.4. Only then are you permitted to say so much as a word of rebuttal or criticism.
  4. New Data Science Toolkit Out (Pete Warden) — with population data to let you compensate for population in your heatmaps. No more “gosh, EVERYTHING is more prevalent where there are lots of people!” meaningless charts.
Comment |
Four short links: 10 May 2013

Four short links: 10 May 2013

Remixing Success, Scratch in the Browser, 3D Takedown, and Wolfram Network Analysis

  1. The Remixing Dilemma — summary of research on remixed projects, finding that (1) Projects with moderate amounts of code are remixed more often than either very simple or very complex projects. (2) Projects by more prominent creators are more generative. (3) Remixes are more likely to attract remixers than de novo projects.
  2. Scratch 2.0 — my favourite first programming language for kids and adults, now in the browser! Downloadable version for offline use coming soon. See the overview for what’s new.
  3. State Dept Takedown on 3D-Printed Gun (Forbes) — The government says it wants to review the files for compliance with arms export control laws known as the International Traffic in Arms Regulations, or ITAR. By uploading the weapons files to the Internet and allowing them to be downloaded abroad, the letter implies Wilson’s high-tech gun group may have violated those export controls.
  4. Data Science of the Facebook World (Stephen Wolfram) — More than a million people have now used our Wolfram|Alpha Personal Analytics for Facebook. And as part of our latest update, in addition to collecting some anonymized statistics, we launched a Data Donor program that allows people to contribute detailed data to us for research purposes. A few weeks ago we decided to start analyzing all this data… (via Phil Earnhardt)
Comment |

Steering the ship that is data science

Ideas on avoiding the data science equivalent of "repair-ware."

Mike Loukides recently recapped a conversation we’d had about leading indicators for data science efforts in an organization. We also pondered where the role of data scientist is headed and realized we could treat software development as a prototype case. It’s easy (if not eerie) to draw parallels between the Internet boom of…
Read Full Post | Comment: 1 |

Another Serving of Data Skepticism

I was thrilled to receive an invitation to a new meetup: the NYC Data Skeptics Meetup. If you’re in the New York area, and you’re interested in seeing data used honestly, stop by! That announcement pushed me to write another post about data skepticism. The past few days, I’ve seen a resurgence of the slogan that correlation…
Read Full Post | Comments: 3 |

Leading Indicators

In a conversation with Q Ethan McCallum (who should be credited as co-author), we wondered how to evaluate data science groups. If you’re looking at an organization’s data science group from the outside, possibly as a potential employee, what can you use to evaluate it? It’s not a simple problem under the best of conditions: you’re not an…
Read Full Post | Comment: 1 |

A different take on data skepticism

Our tools should make common cases easy and safe, but that's not the reality today.

Recently, the Mathbabe (aka Cathy O’Neil) vented some frustration about the pitfalls in applying even simple machine learning (ML) methods like k-nearest neighbors. As data science is democratized, she worries that naive practitioners will shoot themselves in the foot because these tools can offer very misleading results. Maybe data science is best left to the pros? Mike…
Read Full Post | Comment: 1 |