- Web Search Education (Google) — lesson plans and materials for teaching people how to use search, from operators to critically evaluating sites. This latter area is the weakest: when I teach innocents about the web, I show them organic vs paid results, discuss why people advertise, how people pay for their sites, noticing domain names and organizations, etc. I wonder how much of the weakness of Google’s materials is due to their business model.
- Metroid Source Code — reverse-engineered source code from the original classic Metroid. (via Hacker News)
- Speaker Recognition From Encrypted VoIP Communications (PDF) — speaker identification, even one encrypted VoIP communications, is 70-75% among a pool of 10 candidates. Impressive. (via Bruce Schneier)
- SQL Injection Cheat Sheet — rundown of the different techniques for doing SQL injection. (via Gaëtan De Brucker)
Search Education, Classic Source, Analyzing Encrypted VoIP, and SQL Injection
Education Startups, Smartphone Robotics, Google SQL, and Deleted Timezones
- Why Education Startups Do Not Succeed —This fundamental investment vs. expenditure mindset changes everything. You think of education as fundamentally a quality problem. The average person thinks of education as fundamentally a cost problem. This and many other insights that repay the reading. (via Hacker News)
- Romo — smartphone robotics platform Kickstarter project.
- Google Cloud SQL — Google offers proper SQL for AppEngine. Edd notes that this happened just as Oracle offered a NoSQL server. Worth remembering that the label on the technology isn’t a magic bullet to solve your problems: SQL and NoSQL aren’t what’s important, you still must understand how they work with your particular data types and patterns of access.
- Olson Timezone Database Deleted — the USA permits copyrighting of facts, whereas facts [not being the product of a creative act] are not copyrightable in much of the rest of the world. One of the sources for historical timezone data threatened legal action, and the maintainers chose to delete their database. This is a bugger: without it, there’s no way to map GMT onto local time for arbitrary times in the past.
Gamification Critique, Google+ API, Time Series Visualization, and SQL on Map-Reduce
- A Quick Buck by Copy and Paste — scorching review of O’Reilly’s Gamification by Design title. tl;dr: reviewer, he does not love. Tim responded on Google Plus. Also on the gamification wtfront, Mozilla Open Badges. It talks about establishing a part of online identity, but to me it feels a little like a Mozilla Open Gradients project would: cargocult-confusing the surface for the substance.
- Google + API Launched — first piece of a Google + API is released. It provides read-only programmatic access to people, posts, checkins, and shares. Activities are retrieved as triples of (subject, verb, object), which is semweb cute and ticks the social object box, but is unlikely in present form to reverse Declining numbers of users.
- Cube — open source time-series visualization software from Square, built on MongoDB, Node, and Redis. As Artur Bergman noted, the bigger news might be that Square is using MongoDB (known meh).
- Tenzing — an SQL implementation on top of Map/Reduce. Tenzing supports a mostly complete SQL implementation (with several extensions) combined with several key characteristics such as heterogeneity, high performance, scalability, reliability, metadata awareness, low latency, support for columnar storage and structured data, and easy extensibility. Tenzing is currently used internally at Google by 1000+ employees and serves 10000+ queries per day over 1.5 petabytes of compressed data. In this paper, we describe the architecture and implementation of Tenzing, and present benchmarks of typical analytical queries. (via Raphaël Valyi)
SQL Injection, Optical Stick, SQL for Crowdsourcing, and DIY Medical Records
- SQL Injection Pocket Reference (Google Docs) — just what it sounds like. (via ModSecurity SQL Injection Challenge: Lessons Learned)
- isostick: The Optical Drive in a Stick (KickStarter) — clever! A USB memory stick with drivers that emulate optical drives so you can boot off .iso files you’ve put on the memory stick. (via Extreme Tech)
- CrowdDB: Answering Queries with Crowdsourcing (Berkeley) — CrowdDB uses human input via crowdsourcing to process queries that neither database systems nor search engines can adequately answer. It uses SQL both as a language for posing complex queries and as a way to model data. (via Big Data)
- The DIY Electronic Medical Record (Bryce Roberts) — I had a record of my daily weight, my exercising (catalogued by type), my walking, my calories burned and now, with the addition of Zeo, my nightly sleep patterns. All of this data had been passively collected with little to no manual input required from me. Total investment in this personal sensor network was in the range of a couple hundred dollars. And, as I rummaged through my data it began to hit me that what I’ve really been doing is creating my own DIY Electronic Medical Record. The Quantified Self is about more than obsessively cataloguing your bowel movements in low-contrast infographics. I’m less enthused by the opportunities to publicly perform private data, a-la the wifi body scale, than I am by opportunities to gain personal insight.
Martin Hall explains how Karmasphere is integrating Hadoop into enterprises.
You don't have to throw away existing investments in skills and tools to use Hadoop for big data, as Karmasphere's Martin Hall explains.
- MySQL EXPLAINer — visualize the output of the MySQL EXPLAIN command. (via eonarts on Twitter)
- Google Code University — updated with new classes, including C++ and Android app development.
- Cloudtop Applications (Anil Dash) — Anil calling “trend” on multiplatform native apps with cloud storage. Another layer in the Web 2.0 story Tim’s been telling for years, with some interesting observations from Anil, such as: Cloudtop apps seem to use completely proprietary APIs, and nobody seems overly troubled by the fact they have purpose-built interfaces.
The SimpleGEO CTO and former Digg architect discusses NoSQL and location's future
I recently had a long conversation with Joe Stump, CTO of SimpleGeo, about location, geodata, and the NoSQL movement. Stump, who was formerly lead architect at Digg, had a lot to say. Here’s the highlights, you can find the full interview elsewhere on Radar.
Open Source CMS and OPAC, Timely SQL, A Bid Secret, Basic Research
- Scriblio — open source CMS and catalogue built on WordPress, with faceted search and browse. (via titine on Delicious)
- Useful Temporal Functions and Queries — SQL tricksies for those working with timeseries data. (via mbiddulph on Delicious)
- Optimal Starting Prices for Negotiations and Auctions –Mind Hacks discussion of a research paper on whether high or low initial prices lead to higher price outcomes in negotiations and online auctions. Many negotiation books recommend waiting for the other side to offer first. However, existing empirical research contradicts this conventional wisdom: The final outcome in single and multi-issue negotiations, both in the United States and Thailand, often depends on whether the buyer or the seller makes the first offer. Indeed, the final price tends to be higher when a seller (who wants a higher price and thus sets a high first offer) makes the first offer than when the buyer (who offers a low first offer to achieve a low final price) goes first.
- WiFi Science History — Australian scientist studies black holes in the 70s, has to develop a way of piecing together signals that have been distorted as they travel through space. Realizes, when he starts playing with networked computers in the late 80s, that this same technique would let you “cut the wires”. A decade later it emerged as a critical part of wireless networking. As Aaron Small says, it shows the value of basic research, where you don’t have immediate applications in mind and can’t show short-term deliverables or an application to a current high-value problem.
As the web increasingly becomes real-time, marketers and publishers need analytic tools that can produce real-time reports. As an example, the basic task of calculating the number of unique users is typically done in batch mode (e.g. daily) and in many cases using a random sample from relevant log files. If unique user counts can be accurately computed in real-time, publishers and marketers can mount A/B tests or referral analysis to dynamically adjust their campaigns.