- WeevilScout — browser app that turns your browser into a worker for distributed computation tasks. See the poster (PDF). (via Ben Lorica)
- sregex (Github) — A non-backtracking regex engine library for large data streams. See also slide notes from a YAPC::NA talk. (via Ivan Ristic)
- Bobby Tables — a guide to preventing SQL injections. (via Andy Lester)
- Deep Learning Using Support Vector Machines (Arxiv) — we are proposing to train all layers of the deep networks by backpropagating gradients through the top level SVM, learning features of all layers. Our experiments show that simply replacing softmax with linear SVMs gives significant gains on datasets MNIST, CIFAR-10, and the ICML 2013 Representation Learning Workshop’s face expression recognition challenge. (via Oliver Grisel)
ENTRIES TAGGED "machine learning"
Distributed Browser-Based Computation, Streaming Regex, Preventing SQL Injections, and SVM for Faster Deep Learning
Internet Filter Creep, Innovating in E-Mail/Gmail, Connected Devices Business Strategy, and Ecology Recapitulates Photography
- Australian Filter Scope Creep — The Federal Government has confirmed its financial regulator has started requiring Australian Internet service providers to block websites suspected of providing fraudulent financial opportunities, in a move which appears to also open the door for other government agencies to unilaterally block sites they deem questionable in their own portfolios.
- Embedding Actions in Gmail — after years of benign neglect, it’s good to see Gmail worked on again. We’ve said for years that email’s a fertile ground for doing stuff better, and Google seem to have the religion. (see Send Money with Gmail for more).
- What Keeps Me Up at Night (Matt Webb) — Matt’s building a business around connected devices. Here he explains why the category could be owned by any of the big players. In times like this I remember Howard Aiken’s advice: Don’t worry about people stealing your ideas. If it is original you will have to ram it down their throats.
- Image Texture Predicts Avian Density and Species Richness (PLOSone) — Surprisingly and interestingly, remotely sensed vegetation structure measures (i.e., image texture) were often better predictors of avian density and species richness than field-measured vegetation structure, and thus show promise as a valuable tool for mapping habitat quality and characterizing biodiversity across broad areas.
Glass Face, Hardware Pricing: High, Hardware Pricing: Hard, Medical Image Search
- Facial Recognition in Google Glass (Mashable) — this makes Glass umpty more attractive to me. It was created in a hackathon for doctors to use with patients, but I need it wired into my eyeballs.
- How to Price Your Hardware Project — At the end of the day you are picking a price that enables you to stay in business. As @meganauman says, “Profit is not something to add at the end, it is something to plan for in the beginning.”
- Hardware Pricing (Matt Webb) — When products connect to the cloud, the cost structure changes once again. On the one hand, there are ongoing network costs which have to be paid by someone. You can do that with a cut of transactions on the platform, by absorbing the network cost upfront in the RRP, or with user-pays subscription.
- Dicoogle — open source medical image search. Written up in PLOSone paper.
Our tools should make common cases easy and safe, but that's not the reality today.
If data scientists aren't skeptical about how they use and analyze data, who will be?
Machine Learning Demos, iOS Debugging, Industrial Internet, and Deanonymity
- MLDemos — an open-source visualization tool for machine learning algorithms created to help studying and understanding how several algorithms function and how their parameters affect and modify the results in problems of classification, regression, clustering, dimensionality reduction, dynamical systems and reward maximization. (via Mark Alen)
- kiln (GitHub) — open source extensible on-device debugging framework for iOS apps.
- Industrial Internet — the O’Reilly report on the industrial Internet of things is out. Prasad suggests an illustration: for every car with a rain sensor today, there are more than 10 that don’t have one. Instead of an optical sensor that turns on windshield wipers when it sees water, imagine the human in the car as a sensor — probably somewhat more discerning than the optical sensor in knowing what wiper setting is appropriate. A car could broadcast its wiper setting, along with its location, to the cloud. “Now you’ve got what you might call a rain API — two machines talking, mediated by a human being,” says Prasad. It could alert other cars to the presence of rain, perhaps switching on headlights automatically or changing the assumptions that nearby cars make about road traction.
- Unique in the Crowd: The Privacy Bounds of Human Mobility (PDF, Nature) — We study fifteen months of human mobility data for one and a half million individuals and find that human mobility traces are highly unique. In fact, in a dataset where the location of an individual is specified hourly, and with a spatial resolution equal to that given by the carrier’s antennas, four spatio-temporal points are enough to uniquely identify 95% of the individuals. We coarsen the data spatially and temporally to find a formula for the uniqueness of human mobility traces given their resolution and the available outside information. This formula shows that the uniqueness of mobility traces decays approximately as the 1/10 power of their resolution. Hence, even coarse datasets provide little anonymity. These findings represent fundamental constraints to an individual’s privacy and have important implications for the design of frameworks and institutions dedicated to protect the privacy of individuals. As Edd observed, “You are a unique snowflake, after all.” (via Alasdair Allan)
Analytics vs Learning, Reproducible Science, Ramping up Military Internet Attacks, and Compressed Sensing
- Analytics for Learning — Since doing good learning analytics is hard, we often do easy learning analytics and pretend that they are good instead. But pretending doesn’t make it so. (via Dan Meyer)
- Reproducible Research — a list of links to related work about reproducible research, reproducible research papers, etc. (via Stijn Debrouwere)
- Pentagon Deploying 100+ Cyber Teams — The organization defending military networks — cyber protection forces — will comprise more than 60 teams, a Pentagon official said. The other two organizations — combat mission forces and national mission forces — will conduct offensive operations. I’ll repeat that: offensive operations.
- Towards Deterministic Compressed Sensing (PDF) — instead of taking lots of data, compressing by throwing some away, can we only take a few samples and reconstruct the original from that? (more mathematically sound than my handwaving explanation). See also Compressed sensing and big data from the Practical Quant. (via Ben Lorica)
Comparing Algorithms, Programming & Visual Arts, Data Brokers, and Your Brain on Ebooks
- mlcomp — a free website for objectively comparing machine learning programs across various datasets for multiple problem domains.
- Printing Code: Programming and the Visual Arts (Vimeo) — Rune Madsen’s talk from Heroku’s Waza. (via Andrew Odewahn)
- What Data Brokers Know About You (ProPublica) — excellent run-down on the compilers of big data about us. Where are they getting all this info? The stores where you shop sell it to them.
- Subjective Impressions Do Not Mirror Online Reading Effort: Concurrent EEG-Eyetracking Evidence from the Reading of Books and Digital Media (PLOSone) — Comprehension accuracy did not differ across the three media for either group and EEG and eye fixations were the same. Yet readers stated they preferred paper. That preference, the authors conclude, isn’t because it’s less readable. From this perspective, the subjective ratings of our participants (and those in previous studies) may be viewed as attitudes within a period of cultural change.