ENTRIES TAGGED "machine learning"

Strata Week: Data prospecting with Kaggle

Strata Week: Data prospecting with Kaggle

Kaggle now accepting data before a contest, HP's Autonomy purchase comes into focus, Cloudera's new Hadoop distribution.

In this week's data news, Kaggle launches Prospect, HP unveils its big data plans, and Cloudera releases CDH4 (the latest version of its Hadoop distribution).

Read Full Post | Comment |
Four short links: 10 May 2012

Four short links: 10 May 2012

Illuminated Mario, Touchstone Facts, Calculating Spamicity, and Abstract Quantified Self

  1. Gravity in the Margins (Got Medieval) — illuminating illuminated manuscripts with Mario. (via BoingBoing)
  2. Hours Days, Who’s Counting? (Jon Udell) — What prompted me to check? My friend Mike Caulfield, who’s been teaching and writing about quantitative literacy, says it’s because in this case I did have some touchstone facts parked in my head, including the number 10 million (roughly) for barrels of oil imported daily to the US. The reason I’ve been working through a bunch of WolframAlpha exercises lately is that I know I don’t have those touchstones in other areas, and want to develop them. The idea of “touchstone facts” resonates with me.
  3. Spotting Fake Reviewer Groups in Consumer Reviews (PDF) — gotta love any paper that says We calculated the “spamicity” (degree of spam) of each group by assigning 1 point for each spam judgment, 0.5 point for each borderline judgment and 0 point for each non-spam judgment a group received and took the average of all 8 labelers. (via Google Research Blog)
  4. Visualizing Physical Activity Using Abstract Ambient Art (Quantified Self) — kinda like the iTunes visualizer but for your Fitbit Tracker.
Comment |
Four short links: 2 May 2012

Four short links: 2 May 2012

Elective Dickery, Probabilistic Data Analysis, Data Cleaning, and SSL Security

  1. Punting on SxSW (Brad Feld) — I came across this old post and thought: if you can make money by being a dick, or make money by being a caring family person, why would you choose to be a dick? As far as I can tell, being a dick is optional. Brogrammers, take note. Be more like Brad Feld, who prioritises his family and acts accordingly.
  2. Probabilistic Structures for Data Mining — readable introduction to useful algorithms and datastructures showing their performance, reliability, and resources trade-off. (via Hacker News)
  3. Dataset — a Javascript library for transforming, querying, manipulating data from different sources.
  4. Many HTTPS Servers are Insecure — 75% still vulnerable to the BEAST attack.
Comment |
Four short links: 19 April 2012

Four short links: 19 April 2012

Text Similarity, Designing Engagement, Clustering Stories, and Prince of Persia

  1. Superfastmatch — open source text comparison tool, used to locate plagiarism/churnalism in online news sites. You can pull out the text engine and use it for your own “find where this text is used elsewhere” applications (e.g., what’s being forwarded out in email, how much of this RFP is copy and paste, what’s NOT boilerplate in this contract, etc.). (via Pete Warden)
  2. Ten Design Principles for Engaging Math Tasks (Dan Meyer) — education gold, engagement gold, and some serious ideas you can use in your own apps.
  3. Clustering Related Stories (Jenny Finkel) — description of how to cluster related stories, talks about some of the tricks. Interesting without being too scary.
  4. Prince of Persia (GitHub) — I have waited to see if the novelty wore off, but I still find this cool: 1980s source code on GitHub.
Comment |

What it takes to build great machine learning products

Rich machine learning products come from skilled and knowledgeable teams.

Specific insights into a problem and careful model design separate a machine learning system that doesn't work from one that people will actually use.

Read Full Post | Comments: 7 |
Top Stories: April 9-13, 2012

Top Stories: April 9-13, 2012

Carsharing boosts city governments, why complex systems fail, and what web ops teams could do with big data.

This week on O'Reilly: How Zipcar's technology is saving big money for U.S. city governments, why scalable clouds need simple parts, and pondering the possibilities of web ops and machine learning.

Read Full Post | Comment |
Editorial Radar with Mike Loukides & Mike Hendrickson

Editorial Radar with Mike Loukides & Mike Hendrickson

Discussion on machine learning, 3D printing, devices and JavaScript

In this first episode of "Editorial Radar," O'Reilly editors Mike Loukides and Mike Hendrickson discuss the important technologies they're tracking.

Read Full Post | Comments: 2 |
Top Stories: March 19-23, 2012

Top Stories: March 19-23, 2012

Google Maps alternatives, inside Dart, and the upside of offline.

This week on O'Reilly: StreetEasy's Sebastian Delmont explained why his team left Google Maps behind, we looked at the ins and outs of the Dart programming platform, and Jim Stogdill considered the alternatives to always-on living.

Read Full Post | Comment |
Strata Week: Machine learning vs domain expertise

Strata Week: Machine learning vs domain expertise

Debating the data skills of machines and experts, a key data move for Microsoft, and Google Analytics gets social.

This week's data news includes another look at the Strata Conference's debate about machine learning versus subject matter expertise, Raghu Ramakrishnan moves from Yahoo to Microsoft, and more social data comes to Google Analytics.

Read Full Post | Comment |
Four short links: 16 March 2012

Four short links: 16 March 2012

Squirrel Targeting with Computer Vision, Audio Recognition, Single Page Apps, and Persisting at Failing

  1. Militarizing Your Backyard With Python and Computer Vision (video) — using a water cannon, computer video, Arduino, and Python to keep marauding squirrel hordes under control. See the finished result for Yakkity Saxed moist rodent goodness.
  2. Soundbite — dialogue search for Apple’s Final Cut Pro and Adobe Premiere Pro. Boris Soundbite quickly and accurately finds any word or phrase spoken in recorded media. Shoot squirrels with computer vision, search audio with computer hearing. We live in the future, people. (via Andy Baio)
  3. Single Page Apps with Backbone.js — interesting and detailed dissection of how one site did it. Single page apps are where the server sends back one HTML file which changes (via Javascript) in response to the user’s activity, possibly with API calls happening in the background, but where the browser is very definitely not requesting more full HTML pages from the server. The idea is to have speed (pull less across the wire each time the page changes) and also to use the language you already know to build the web page (Javascript).
  4. Why Finish Books? (NY Review of Books) — the more bad books you finish, the fewer good ones you”ll have time to start. Applying this to the rest of life is left as an exercise for the reader.
Comments: 5 |