- Trusting Browser Code (Tim Bray) — on the fundamental weakness of the ‘net as manifest in the browser.
- Deep Learning in the Raspberry Pi (Pete Warden) — $30 now gets you a computer you can run deep learning algorithms on. Awesome.
- Announcing Docker Hub and Official Repositories — as Docker went 1.0 and people rave about how they use it, comes this. They’re thinking hard about “integrating into the build ship run loop”, which aligns well with DevOps-enabling tool use.
- Apple’s Secure Database for Users (Ian Waring) — excellent breakdown of how Apple have gone out of their way to make their cloud database product safe and robust. They may be slow to “the cloud” but they have decades of experience having users as customers instead of products.
ENTRIES TAGGED "machine learning"
More than algorithms, companies gain access to models that incorporate ideas generated by teams of data scientists
Data scientists were among the earliest and most enthusiastic users of crowdsourcing services. Lukas Biewald noted in a recent talk that one of the reasons he started CrowdFlower was that as a data scientist he got frustrated with having to create training sets for many of the problems he faced. More recently, companies have been experimenting with active learning (humans1 take care of uncertain cases, models handle the routine ones). Along those lines, Adam Marcus described in detail how Locu uses Crowdsourcing services to perform structured extraction (converting semi/unstructured data into structured data).
Another area where crowdsourcing is popping up is feature engineering and feature discovery. Experienced data scientists will attest that generating features is as (if not more) important than choice of algorithm. Startup CrowdAnalytix uses public/open data sets to help companies enhance their analytic models. The company has access to several thousand data scientists spread across 50 countries and counts a major social network among its customers. Its current focus is on providing “enterprise risk quantification services to Fortune 1000 companies”.
CrowdAnalytix breaks up projects in two phases: feature engineering and modeling. During the feature engineering phase, data scientists are presented with a problem (independent variable(s)) and are asked to propose features (predictors) and brief explanations for why they might prove useful. A panel of judges evaluate2 features based on the accompanying evidence and explanations. Typically 100+ teams enter this phase of the project, and 30+ teams propose reasonable features.