ENTRIES TAGGED "machine learning"

Four short links: 19 June 2013

Four short links: 19 June 2013

Thread Problems, Better Image Search, Open Standards, and GitHub Maps

  1. Multithreading is HardThe compiler and the processor both conspire to defeat your threads by moving your code around! Be warned and wary! You will have to do battle with both. Sample code and explanation of WTF the eieio barrier is (hint: nothing to do with Old McDonald’s server farm). (via Erik Michaels-Ober)
  2. Improving Photo Search (Google Research) — volume of training images, number of CPU cores, and Freebase entities. (via Alex Dong)
  3. Is Google Dumping Open Standards for Open Wallets? (Matt Asay) — it’s easier to ship than standardise, to innovate than integrate, but the ux of a citizen in the real world is pants. Like blog posts? Log into Facebook to read your friends! (or Google+) Chat is great, but you’d better have one client per corporation your friends hang out on. Nobody woke up this morning asking for features to make web pages only work on one browser. The user experience of isolationism is ugly.
  4. GitHub Renders GeoJSONUnder the hood we use Leaflet.js to render the geoJSON data, and overlay it on a custom version of MapBox’s street view baselayer — simplified so that your data can really shine. Best of all, the base map uses OpenStreetMap data, so if you find an area to improve, edit away.
Comment |
Four short links: 17 June 2013

Four short links: 17 June 2013

Deep Learning, Internet of ux Nightmares, Mozilla Science Lab, and Ground-Up Computing

  1. Weekend Reads on Deep Learning (Alex Dong) — an article and two videos unpacking “deep learning” such as multilayer neural networks.
  2. The Internet of Actual Things“I have 10 reliable activations remaining,” your bulb will report via some ridiculous light-bulbs app on your phone. “Now just nine. Remember me when I’m gone.” (via Andy Baio)
  3. Announcing the Mozilla Science Lab (Kaitlin Thaney) — We also want to find ways of supporting and innovating with the research community – building bridges between projects, running experiments of our own, and building community. We have an initial idea of where to start, but want to start an open dialogue to figure out together how to best do that, and where we can be of most value..
  4. NAND to TetrisThe site contains all the software tools and project materials necessary to build a general-purpose computer system from the ground up. We also provide a set of lectures designed to support a typical course on the subject. (via Hacker News)
Comment |
Four short links: 7 June 2013

Four short links: 7 June 2013

Open Source BigTable, Robots Lost, Changing the World, Secrecy Binge

  1. Accumulo — NSA’s BigTable implementation, released as an Apache project.
  2. How the Robots Lost (Business Week) — the decline of high-frequency trading profits (basically, markets worked and imbalances in speed and knowledge have been corrected). Notable for the regulators getting access to the technology that the traders had: Last fall the SEC said it would pay Tradeworx, a high-frequency trading firm, $2.5 million to use its data collection system as the basic platform for a new surveillance operation. Code-named Midas (Market Information Data Analytics System), it scours the market for data from all 13 public exchanges. Midas went live in February. The SEC can now detect anomalous situations in the market, such as a trader spamming an exchange with thousands of fake orders, before they show up on blogs like Nanex and ZeroHedge. If Midas sees something odd, Berman’s team can look at trading data on a deeper level, millisecond by millisecond.
  3. PRISM: Surprised? (Danny O’Brien) — I really don’t agree with the people who think “We don’t have the collective will”, as though there’s some magical way things got done in the past when everyone was in accord and surprised all the time. It’s always hard work to change the world. Endless, dull hard work. Ten years later, when you’ve freed the slaves or beat the Nazis everyone is like “WHY CAN’T IT BE AS EASY TO CHANGE THIS AS THAT WAS, BACK IN THE GOOD OLD DAYS. I GUESS WE’RE ALL JUST SHEEPLE THESE DAYS.”
  4. What We Don’t Know About Spying on Citizens is Scarier Than What We Do Know (Bruce Schneier) — The U.S. government is on a secrecy binge. It overclassifies more information than ever. And we learn, again and again, that our government regularly classifies things not because they need to be secret, but because their release would be embarrassing. Open source BigTable implementation: free. Data gathering operation around it: $20M/year. Irony in having the extent of authoritarian Big Brother government secrecy questioned just as a whistleblower’s military trial is held “off the record”: priceless.
Comment |
Four short links: 4 June 2013

Four short links: 4 June 2013

Distributed Browser-Based Computation, Streaming Regex, Preventing SQL Injections, and SVM for Faster Deep Learning

  1. WeevilScout — browser app that turns your browser into a worker for distributed computation tasks. See the poster (PDF). (via Ben Lorica)
  2. sregex (Github) — A non-backtracking regex engine library for large data streams. See also slide notes from a YAPC::NA talk. (via Ivan Ristic)
  3. Bobby Tables — a guide to preventing SQL injections. (via Andy Lester)
  4. Deep Learning Using Support Vector Machines (Arxiv) — we are proposing to train all layers of the deep networks by backpropagating gradients through the top level SVM, learning features of all layers. Our experiments show that simply replacing softmax with linear SVMs gives significant gains on datasets MNIST, CIFAR-10, and the ICML 2013 Representation Learning Workshop’s face expression recognition challenge. (via Oliver Grisel)
Comment |
Four short links: 16 May 2013

Four short links: 16 May 2013

Internet Filter Creep, Innovating in E-Mail/Gmail, Connected Devices Business Strategy, and Ecology Recapitulates Photography

  1. Australian Filter Scope CreepThe Federal Government has confirmed its financial regulator has started requiring Australian Internet service providers to block websites suspected of providing fraudulent financial opportunities, in a move which appears to also open the door for other government agencies to unilaterally block sites they deem questionable in their own portfolios.
  2. Embedding Actions in Gmail — after years of benign neglect, it’s good to see Gmail worked on again. We’ve said for years that email’s a fertile ground for doing stuff better, and Google seem to have the religion. (see Send Money with Gmail for more).
  3. What Keeps Me Up at Night (Matt Webb) — Matt’s building a business around connected devices. Here he explains why the category could be owned by any of the big players. In times like this I remember Howard Aiken’s advice: Don’t worry about people stealing your ideas. If it is original you will have to ram it down their throats.
  4. Image Texture Predicts Avian Density and Species Richness (PLOSone) — Surprisingly and interestingly, remotely sensed vegetation structure measures (i.e., image texture) were often better predictors of avian density and species richness than field-measured vegetation structure, and thus show promise as a valuable tool for mapping habitat quality and characterizing biodiversity across broad areas.
Comment |
Four short links: 15 May 2013

Four short links: 15 May 2013

Glass Face, Hardware Pricing: High, Hardware Pricing: Hard, Medical Image Search

  1. Facial Recognition in Google Glass (Mashable) — this makes Glass umpty more attractive to me. It was created in a hackathon for doctors to use with patients, but I need it wired into my eyeballs.
  2. How to Price Your Hardware ProjectAt the end of the day you are picking a price that enables you to stay in business. As @meganauman says, “Profit is not something to add at the end, it is something to plan for in the beginning.”
  3. Hardware Pricing (Matt Webb) — When products connect to the cloud, the cost structure changes once again. On the one hand, there are ongoing network costs which have to be paid by someone. You can do that with a cut of transactions on the platform, by absorbing the network cost upfront in the RRP, or with user-pays subscription.
  4. Dicoogle — open source medical image search. Written up in PLOSone paper.
Comment |

A different take on data skepticism

Our tools should make common cases easy and safe, but that's not the reality today.

Recently, the Mathbabe (aka Cathy O’Neil) vented some frustration about the pitfalls in applying even simple machine learning (ML) methods like k-nearest neighbors. As data science is democratized, she worries that naive practitioners will shoot themselves in the foot because these tools can offer very misleading results. Maybe data science is best left to the pros? Mike…
Read Full Post | Comment: 1 |

Data skepticism

If data scientists aren't skeptical about how they use and analyze data, who will be?

A couple of months ago, I wrote that “big data” is heading toward the trough of a hype curve as a result of oversized hype and promises. That’s certainly true. I see more expressions of skepticism about the value of data every day. Some of the skepticism is a reaction against the hype; a lot of it arises…
Read Full Post | Comments: 5 |
Four short links: 9 April 2013

Four short links: 9 April 2013

Electric Monks, Moore's Law's Death Spiral, Trafficking Technology, and Product Management

  1. Automated Essay Grading To Come to EdX (NY Times) — shortly after we get software that writes stories for us, we get software to read them for us.
  2. AMD Calls End of Moore’s Law in Ten Years (ComputerWorld) — story based on this video, where Michio Kaku lays out the timeline for Moore’s Law’s wind-down and the spin-up of new technology.
  3. Addressing Human Trafficking Through Technology (danah boyd) — technologists love to make tech and then assert it’ll help people. Danah’s work on teens and now trafficking steers us to do what works, rather than what is showy or easiest.
  4. Product Management (Rowan Simpson) — hand this to anyone who asks what product management actually is. Excellent explanation.
Comment |
Four short links: 1 April 2013

Four short links: 1 April 2013

Machine Learning Demos, iOS Debugging, Industrial Internet, and Deanonymity

  1. MLDemosan open-source visualization tool for machine learning algorithms created to help studying and understanding how several algorithms function and how their parameters affect and modify the results in problems of classification, regression, clustering, dimensionality reduction, dynamical systems and reward maximization. (via Mark Alen)
  2. kiln (GitHub) — open source extensible on-device debugging framework for iOS apps.
  3. Industrial Internet — the O’Reilly report on the industrial Internet of things is out. Prasad suggests an illustration: for every car with a rain sensor today, there are more than 10 that don’t have one. Instead of an optical sensor that turns on windshield wipers when it sees water, imagine the human in the car as a sensor — probably somewhat more discerning than the optical sensor in knowing what wiper setting is appropriate. A car could broadcast its wiper setting, along with its location, to the cloud. “Now you’ve got what you might call a rain API — two machines talking, mediated by a human being,” says Prasad. It could alert other cars to the presence of rain, perhaps switching on headlights automatically or changing the assumptions that nearby cars make about road traction.
  4. Unique in the Crowd: The Privacy Bounds of Human Mobility (PDF, Nature) — We study fifteen months of human mobility data for one and a half million individuals and find that human mobility traces are highly unique. In fact, in a dataset where the location of an individual is specified hourly, and with a spatial resolution equal to that given by the carrier’s antennas, four spatio-temporal points are enough to uniquely identify 95% of the individuals. We coarsen the data spatially and temporally to find a formula for the uniqueness of human mobility traces given their resolution and the available outside information. This formula shows that the uniqueness of mobility traces decays approximately as the 1/10 power of their resolution. Hence, even coarse datasets provide little anonymity. These findings represent fundamental constraints to an individual’s privacy and have important implications for the design of frameworks and institutions dedicated to protect the privacy of individuals. As Edd observed, “You are a unique snowflake, after all.” (via Alasdair Allan)
Comment |