ENTRIES TAGGED "machine learning"

Four short links: 16 May 2013

Four short links: 16 May 2013

Internet Filter Creep, Innovating in E-Mail/Gmail, Connected Devices Business Strategy, and Ecology Recapitulates Photography

  1. Australian Filter Scope CreepThe Federal Government has confirmed its financial regulator has started requiring Australian Internet service providers to block websites suspected of providing fraudulent financial opportunities, in a move which appears to also open the door for other government agencies to unilaterally block sites they deem questionable in their own portfolios.
  2. Embedding Actions in Gmail — after years of benign neglect, it’s good to see Gmail worked on again. We’ve said for years that email’s a fertile ground for doing stuff better, and Google seem to have the religion. (see Send Money with Gmail for more).
  3. What Keeps Me Up at Night (Matt Webb) — Matt’s building a business around connected devices. Here he explains why the category could be owned by any of the big players. In times like this I remember Howard Aiken’s advice: Don’t worry about people stealing your ideas. If it is original you will have to ram it down their throats.
  4. Image Texture Predicts Avian Density and Species Richness (PLOSone) — Surprisingly and interestingly, remotely sensed vegetation structure measures (i.e., image texture) were often better predictors of avian density and species richness than field-measured vegetation structure, and thus show promise as a valuable tool for mapping habitat quality and characterizing biodiversity across broad areas.
Comment |
Four short links: 15 May 2013

Four short links: 15 May 2013

Glass Face, Hardware Pricing: High, Hardware Pricing: Hard, Medical Image Search

  1. Facial Recognition in Google Glass (Mashable) — this makes Glass umpty more attractive to me. It was created in a hackathon for doctors to use with patients, but I need it wired into my eyeballs.
  2. How to Price Your Hardware ProjectAt the end of the day you are picking a price that enables you to stay in business. As @meganauman says, “Profit is not something to add at the end, it is something to plan for in the beginning.”
  3. Hardware Pricing (Matt Webb) — When products connect to the cloud, the cost structure changes once again. On the one hand, there are ongoing network costs which have to be paid by someone. You can do that with a cut of transactions on the platform, by absorbing the network cost upfront in the RRP, or with user-pays subscription.
  4. Dicoogle — open source medical image search. Written up in PLOSone paper.
Comment |

A different take on data skepticism

Our tools should make common cases easy and safe, but that's not the reality today.

Recently, the Mathbabe (aka Cathy O’Neil) vented some frustration about the pitfalls in applying even simple machine learning (ML) methods like k-nearest neighbors. As data science is democratized, she worries that naive practitioners will shoot themselves in the foot because these tools can offer very misleading results. Maybe data science is best left to the pros? Mike…
Read Full Post | Comment: 1 |

Data skepticism

If data scientists aren't skeptical about how they use and analyze data, who will be?

A couple of months ago, I wrote that “big data” is heading toward the trough of a hype curve as a result of oversized hype and promises. That’s certainly true. I see more expressions of skepticism about the value of data every day. Some of the skepticism is a reaction against the hype; a lot of it arises…
Read Full Post | Comments: 5 |
Four short links: 9 April 2013

Four short links: 9 April 2013

Electric Monks, Moore's Law's Death Spiral, Trafficking Technology, and Product Management

  1. Automated Essay Grading To Come to EdX (NY Times) — shortly after we get software that writes stories for us, we get software to read them for us.
  2. AMD Calls End of Moore’s Law in Ten Years (ComputerWorld) — story based on this video, where Michio Kaku lays out the timeline for Moore’s Law’s wind-down and the spin-up of new technology.
  3. Addressing Human Trafficking Through Technology (danah boyd) — technologists love to make tech and then assert it’ll help people. Danah’s work on teens and now trafficking steers us to do what works, rather than what is showy or easiest.
  4. Product Management (Rowan Simpson) — hand this to anyone who asks what product management actually is. Excellent explanation.
Comment |
Four short links: 1 April 2013

Four short links: 1 April 2013

Machine Learning Demos, iOS Debugging, Industrial Internet, and Deanonymity

  1. MLDemosan open-source visualization tool for machine learning algorithms created to help studying and understanding how several algorithms function and how their parameters affect and modify the results in problems of classification, regression, clustering, dimensionality reduction, dynamical systems and reward maximization. (via Mark Alen)
  2. kiln (GitHub) — open source extensible on-device debugging framework for iOS apps.
  3. Industrial Internet — the O’Reilly report on the industrial Internet of things is out. Prasad suggests an illustration: for every car with a rain sensor today, there are more than 10 that don’t have one. Instead of an optical sensor that turns on windshield wipers when it sees water, imagine the human in the car as a sensor — probably somewhat more discerning than the optical sensor in knowing what wiper setting is appropriate. A car could broadcast its wiper setting, along with its location, to the cloud. “Now you’ve got what you might call a rain API — two machines talking, mediated by a human being,” says Prasad. It could alert other cars to the presence of rain, perhaps switching on headlights automatically or changing the assumptions that nearby cars make about road traction.
  4. Unique in the Crowd: The Privacy Bounds of Human Mobility (PDF, Nature) — We study fifteen months of human mobility data for one and a half million individuals and find that human mobility traces are highly unique. In fact, in a dataset where the location of an individual is specified hourly, and with a spatial resolution equal to that given by the carrier’s antennas, four spatio-temporal points are enough to uniquely identify 95% of the individuals. We coarsen the data spatially and temporally to find a formula for the uniqueness of human mobility traces given their resolution and the available outside information. This formula shows that the uniqueness of mobility traces decays approximately as the 1/10 power of their resolution. Hence, even coarse datasets provide little anonymity. These findings represent fundamental constraints to an individual’s privacy and have important implications for the design of frameworks and institutions dedicated to protect the privacy of individuals. As Edd observed, “You are a unique snowflake, after all.” (via Alasdair Allan)
Comment |
Four short links: 25 March 2013

Four short links: 25 March 2013

Analytics vs Learning, Reproducible Science, Ramping up Military Internet Attacks, and Compressed Sensing

  1. Analytics for LearningSince doing good learning analytics is hard, we often do easy learning analytics and pretend that they are good instead. But pretending doesn’t make it so. (via Dan Meyer)
  2. Reproducible Research — a list of links to related work about reproducible research, reproducible research papers, etc. (via Stijn Debrouwere)
  3. Pentagon Deploying 100+ Cyber TeamsThe organization defending military networks — cyber protection forces — will comprise more than 60 teams, a Pentagon official said. The other two organizations — combat mission forces and national mission forces — will conduct offensive operations. I’ll repeat that: offensive operations.
  4. Towards Deterministic Compressed Sensing (PDF) — instead of taking lots of data, compressing by throwing some away, can we only take a few samples and reconstruct the original from that? (more mathematically sound than my handwaving explanation). See also Compressed sensing and big data from the Practical Quant. (via Ben Lorica)
Comment |
Four short links: 11 March 2013

Four short links: 11 March 2013

Ransom Money, High School CS, Wikipedia Links, and Social Teens

  1. Adventures in the Ransom Trade — between insurance, protection, and ransoms, Sean Gourley describes it as “one of the more interesting grey markets.” (via Sean Gourley)
  2. About High School Computer Science Teachers (Selena Deckelmann) — Selena gets an education in the state of high school computer science education.
  3. Learning From Big Data (Google Research) — the Wikilinks Corpus: 40 million total disambiguated mentions within over 10 million web pages [...] The mentions are found by looking for links to Wikipedia pages where the anchor text of the link closely matches the title of the target Wikipedia page. If we think of each page on Wikipedia as an entity (an idea we’ve discussed before), then the anchor text can be thought of as a mention of the corresponding entity.
  4. Teens Have Always Gone Where Identity Isn’tif you look back at one of the first dominant social platforms, AOL Instant Messenger, it looks a lot like the pseudonymous Tumblr and Snapchat of today in many respects. You used an avatar that was not your face. Your screenname was not indexed and not personally identifiable (mine was Goober1310).
Comment |
Four short links: 8 March 2013

Four short links: 8 March 2013

Comparing Algorithms, Programming & Visual Arts, Data Brokers, and Your Brain on Ebooks

  1. mlcompa free website for objectively comparing machine learning programs across various datasets for multiple problem domains.
  2. Printing Code: Programming and the Visual Arts (Vimeo) — Rune Madsen’s talk from Heroku’s Waza. (via Andrew Odewahn)
  3. What Data Brokers Know About You (ProPublica) — excellent run-down on the compilers of big data about us. Where are they getting all this info? The stores where you shop sell it to them.
  4. Subjective Impressions Do Not Mirror Online Reading Effort: Concurrent EEG-Eyetracking Evidence from the Reading of Books and Digital Media (PLOSone) — Comprehension accuracy did not differ across the three media for either group and EEG and eye fixations were the same. Yet readers stated they preferred paper. That preference, the authors conclude, isn’t because it’s less readable. From this perspective, the subjective ratings of our participants (and those in previous studies) may be viewed as attitudes within a period of cultural change.
Comment |
Four short links: 7 March 2013

Four short links: 7 March 2013

Drug Interactions from Search History, Web Satire, Visible Peer Review, and Rights-based Copyright

  1. Pharmacovigilance — Signals from The Crowd (PDF) — in the NY Times’ words: Using automated software tools to examine queries by 6 million Internet users taken from Web search logs in 2010, the researchers looked for searches relating to an antidepressant, paroxetine, and a cholestorol lowering drug, pravastatin. They were able to find evidence that the combination of the two drugs caused high blood sugar. (via New York Times)
  2. The World Wide Web is Moving to AOL — best satire you’ll read this month.
  3. Review History for Perceptual elements in Penn & Teller’s “Cups and Balls” magic trick — PeerJ makes peer review history available for the articles it publishes. Not only does this build reputation for peer reviewers who want it, but it is also a wonderful insight into how paranoid science must be to defend against mistakes in data interpretation. (The finished paper is fun, too)
  4. A New Basis for CopyrightNZ’s most technically-literate judge floats an idea for how copyright might be reimagined in a more useful way for the modern age by considering it in terms of human rights. Perhaps there should be consideration of a new copyright model that recognises content user rights against a backdrop of the right to receive and impart information and a truly balanced approach to information and expression that recognises that ideas expressed are building blocks for new ideas. Underpinning this must be a recognition on the part of content owners that the properties of new technologies dictate our responses, our behaviours, our values and our ways of thinking. These should not be seen as a threat but an opportunity. It cannot be a one-way street with traffic heading only in the direction dictated by content owners.
Comment |