- intention.js — manipulates the DOM via HTML attributes. The methods for manipulation are placed with the elements themselves, so flexible layouts don’t seem so abstract and messy.
- F1: A Distributed SQL Database That Scales — a distributed relational database system built at Google to support the AdWords business. F1 is a hybrid database that combines high availability, the scalability of NoSQL systems like Bigtable, and the consistency and usability of traditional SQL databases. F1 is built on Spanner, which provides synchronous cross-datacenter replication and strong consistency. Synchronous replication implies higher commit latency, but we mitigate that latency by using a hierarchical schema model with structured data types and through smart application design. F1 also includes a fully functional distributed SQL query engine and automatic change tracking and publishing.
- Looking Inside The (Drop)Box (PDF) — This paper presents new and generic techniques, to reverse engineer frozen Python applications, which are not limited to just the Dropbox world. We describe a method to bypass Dropbox’s two factor authentication and hijack Dropbox accounts. Additionally, generic techniques to intercept SSL data using code injection techniques and monkey patching are presented. (via Tech Republic)
ENTRIES TAGGED "Google"
Flexible Layouts, Web Components, Distributed SQL Database, and Reverse-Engineering Dropbox Client
Mobile Image Cache, Google on Net Neutrality, Future of Programming, and PSD Files in Ruby
- How to Easily Resize and Cache Images for the Mobile Web (Pete Warden) — I set up a server running the excellent ImageProxy open-source project, and then I placed a Cloudfront CDN in front of it to cache the results. (a how-to covering the tricksy bits)
- Google’s Position on Net Neutrality Changes? (Wired) — At issue is Google Fiber’s Terms of Service, which contains a broad prohibition against customers attaching “servers” to its ultrafast 1 Gbps network in Kansas City. Google wants to ban the use of servers because it plans to offer a business class offering in the future. [...] In its response [to a complaint], Google defended its sweeping ban by citing the very ISPs it opposed through the years-long fight for rules that require broadband providers to treat all packets equally.
- The Future of Programming (Bret Victor) — gorgeous slides, fascinating talk, and this advice from Alan Kay: I think the trick with knowledge is to “acquire it, and forget all except the perfume” — because it is noisy and sometimes drowns out one’s own “brain voices”. The perfume part is important because it will help find the knowledge again to help get to the destinations the inner urges pick.
- psd.rb — Ruby code for reading PSD files (MIT licensed).
Retreading old topics can be a powerful source of epiphany, sometimes more so than simple extra-box thinking. I was a computer science student, of course I knew statistics. But my recent years as a NoSQL (or better stated: distributed systems) junkie have irreparably colored my worldview, filtering every metaphor with a tinge of information management.
Lounging on a half-world plane ride has its benefits, namely, the opportunity to read. Most of my Delta flight from Tel Aviv back home to Portland lacked both wifi and (in my case) a workable laptop power source. So instead, I devoured Nate Silver’s book, The Signal and the Noise. When Nate reintroduced me to the concept of statistical overfit, and relatedly underfit, I could not help but consider these cases in light of the modern problem of distributed data management, namely, operators (you may call these operators DBAs, but please, not to their faces).
When collecting information, be it for a psychological profile of chimp mating rituals, or plotting datapoints in search of the Higgs Boson, the ultimate goal is to find some sort of usable signal, some trend in the data. Not every point is useful, and in fact, any individual could be downright abnormal. This is why we need several points to spot a trend. The world rarely gives us anything clearer than a jumble of anecdotes. But plotted together, occasionally a pattern emerges. This pattern, if repeatable and useful for prediction, becomes a working theory. This is science, and is generally considered a good method for making decisions.
On the other hand, when lacking experience, we tend to over value the experience of others when we assume they have more. This works in straightforward cases, like learning to cook a burger (watch someone make one, copy their process). This isn’t so useful as similarities diverge. Watching someone make a cake won’t tell you much about the process of crafting a burger. Folks like to call this cargo cult behavior.
How Fit are You, Bro?
You need to extract useful information from experience (which I’ll use the math-y sounding word datapoints). Having a collection of datapoints to choose from is useful, but that’s only one part of the process of decision-making. I’m not speaking of a necessarily formal process here, but in the case of database operators, merely a collection of experience. Reality tends to be fairly biased toward facts (despite the desire of many people for this to not be the case). Given enough experience, especially if that experience is factual, we tend to make better and better decisions more inline with reality. That’s pretty much the essence of prediction. Our mushy human brains are more-or-less good at that, at least, better than other animals. It’s why we have computers and Everybody Loves Raymond, and my cat pees in a box.
Imagine you have a sufficient amount of relevant datapoints that you can plot on a chart. Assuming the axes have any relation to each other, and the data is sound, a trend may emerge, such as a line, or some other bounding shape. A signal is relevant data that corresponds to the rules we discover by best fit. Noise is everything else. It’s somewhat circular sounding logic, and it’s really hard to know what is really a signal. This is why science is hard, and so is choosing a proper database. We’re always checking our assumptions, and one solid counter signal can really be disastrous for a model. We may have been wrong all along, missing only enough data. As Einstein famously said in response to the book 100 Authors Against Einstein: “If I were wrong, then one would have been enough!”
Database operators (and programmers forced to play this role) must make predictions all the time, against a seemingly endless series of questions. How much data can I handle? What kind of latency can I expect? How many servers will I need, and how much work to manage them?
So, like all decision making processes, we refer to experience. The problem is, as our industry demands increasing scale, very few people actually have much experience managing giant scale systems. We tend to draw our assumptions from our limited, or biased smaller scale experience, and extrapolate outward. The theories we then tend to concoct are not the optimal fit that we desire, but instead tend to be overfit.
Overfit is when we have a limited amount of data, and overstate its general implications. If we imagine a plot of likely failure scenarios against a limited number of servers, we may be tempted to believe our biggest odds of failure are insufficient RAM, or disk failure. After all, my network has never given me problems, but I sure have lost a hard drive or two. We take these assumptions, which are only somewhat relevant to the realities of scalable systems and divine some rules for ourselves that entirely miss the point.
In a real distributed system, network issues tend to consume most of our interest. Single-server consistency is a solved problem, and most (worthwhile) distributed databases have some sense of built in redundancy (usually replication, the root of all distributed evil).
The App Store model has increased the uncertainty of the software release process
The recent unavailability of the Apple Developer’s Portal just underscores how increasingly dependent developers have become on third parties during the software lifecycle. For those who are not following the fun and games, the developer.apple.com sites, which include much of the functionality needed to develop Mac and iOS applications, has been unavailable for more than a week as of this writing. Although iTunes Connect, the portal used to actually deploy apps to the App Stores, has remained available, the remainder of the site territory has been off-limits. This is all thanks to a security intrusion (evidently by an over-zealous researcher.)
The App Store model has fundamentally changed how software is distributed, mostly for the better (IMHO), but it has also removed some of the control of the release process from the hands of the developers and companies they work for. As I have spelled out previously in my book on iOS enterprise development, the fact that Apple has the final say on if and when software goes into the store has required more conservative release timelines. If you want to release on the first of September, you need to count back at least two weeks for “gold master”, because you need to upload the app, potentially go through a round of rejection from Apple, and then upload a fixed version.
Android apps don’t suffer from this lag, because most of the Android stores don’t do any significant checking of the applications uploaded to them. The Devil’s Deal that Apple developers have made with Apple is that in return for the longer wait time to get apps in the store (and having to follow Apple’s rules), they get a de facto seal of approval from Apple. In other words, it is assumed that apps in the iTunes store are more stringently policed and less likely to crash or do harm (deliberately or else-wise.)
The current downtime has brought that deal into question, however. Suddenly, developers who need new provisioning certificates, passbook certificates, or push notification certificates find themselves with nowhere to go. Even if iTunes Connect is available, it doesn’t do you any good if you can’t get a distribution certificate to sign your app for the store. I’m sure that there are developers at this moment who have had their finely tuned release strategies thrown into disarray by the in-availability of the developer portal.
Being essentially at the mercy of Apple’s whims (or Google’s, for that matter) can’t be a pleasant sensation for a company or individual trying to get a new piece of software out the door. The question that the developer community will have to answer is if the benefits of the App Store model make it worth the hassles, in the long run.
Rules of the Internet, Bigness of the Data, Wifi ADCs, and Google Flirts with Client-Side Encryption
- Ten Rules of the Internet (Anil Dash) — they’re all candidates for becoming “Dash’s Law”. I like this one the most: When a company or industry is facing changes to its business due to technology, it will argue against the need for change based on the moral importance of its work, rather than trying to understand the social underpinnings.
- Data Storage by Vertical (Quartz) — The US alone is home to 898 exabytes (1 EB = 1 billion gigabytes)—nearly a third of the global total. By contrast, Western Europe has 19% and China has 13%. Legally, much of that data itself is property of the consumers or companies who generate it, and licensed to companies that are responsible for it. And in the US—a digital universe of 898 exabytes (1 EB = 1 billion gigabytes)—companies have some kind of liability or responsibility for 77% of all that data.
- x-OSC — a wireless I/O board that provides just about any software with access to 32 high-performance analogue/digital channels via OSC messages over WiFi. There is no user programmable firmware and no software or drivers to install making x-OSC immediately compatible with any WiFi-enabled platform. All internal settings can be adjusted using any web browser.
- Google Experimenting with Encrypting Google Drive (CNet) — If that’s the case, a government agency serving a search warrant or subpoena on Google would be unable to obtain the unencrypted plain text of customer files. But the government might be able to convince a judge to grant a wiretap order, forcing Google to intercept and divulge the user’s login information the next time the user types it in. Advertising depends on the service provider being able to read your data. Either your Drive’s contents aren’t valuable to Google advertising, or it won’t be a host-resistant encryption process.
Microvideos for MIcrohelp, Organic Search, Probabilistic Programming, and Cluster Management
- How to Make Help Microvideos For Your Site (Alex Holovaty) — Instead of one monolithic video, we decided to make dozens of tiny, five-second videos separately demonstrating features.
- How Google is Killing Organic Search — 13% of the real estate is organic results in a search for “auto mechanic”, 7% for “italian restaurant”, 0% if searching on an iPhone where organic results are four page scrolls away. SEO Book did an extensive analysis of just how important the top left of the page, previously occupied by organic results actually is to visitors. That portion of the page is now all Google. (via Alex Dong)
- Church — probabilistic programming language from MIT, with tutorials. (via Edd Dumbill)
- mesos — a cluster manager that provides efficient resource isolation and sharing across distributed applications, or frameworks. It can run Hadoop, MPI, Hypertable, Spark (a new framework for low-latency interactive and iterative jobs), and other applications. Mesos is open source in the Apache Incubator. (via Ben Lorica)
OSCON 2013 Speaker Series
Note: Amy Unruh, Google Cloud Platform Developer Relations, is just one of the many fantastic speakers we have at OSCON this year. If you are interested in attending to check out Amy’s talk or the many other cool sessions, click over to the OSCON website where you can use the discount code OS13PROG to get 20% off your registration fee.
At this year’s Google I/O, we launched the PHP runtime for Google App Engine, part of the Google Cloud Platform. App Engine is a service that lets you build web apps using the same scalable infrastructure that powers many of Google’s own applications. With App Engine, there are no servers to maintain; you just upload your application, and it’s ready to go.
App Engine’s services support and simplify many aspects of app development. One of those services is Task Queues, which lets you easily add asynchronous background processing to your PHP app, and allows you to simultaneously make your applications more responsive, more reliable, and more scalable.
The App Engine Task Queue service allows your application to define tasks, add them to a queue, and then use the queue to process them asynchronously, in the background. App Engine automatically scales processing capacity to match your queue configuration and processing volume. You define a Task by specifying the application-specific URL of a handler for the task, along with (optionally) parameters or a payload for the task, and other settings, then add it to a Task Queue.
Thread Problems, Better Image Search, Open Standards, and GitHub Maps
- Multithreading is Hard — The compiler and the processor both conspire to defeat your threads by moving your code around! Be warned and wary! You will have to do battle with both. Sample code and explanation of WTF the eieio barrier is (hint: nothing to do with Old McDonald’s server farm). (via Erik Michaels-Ober)
- Improving Photo Search (Google Research) — volume of training images, number of CPU cores, and Freebase entities. (via Alex Dong)
- Is Google Dumping Open Standards for Open Wallets? (Matt Asay) — it’s easier to ship than standardise, to innovate than integrate, but the ux of a citizen in the real world is pants. Like blog posts? Log into Facebook to read your friends! (or Google+) Chat is great, but you’d better have one client per corporation your friends hang out on. Nobody woke up this morning asking for features to make web pages only work on one browser. The user experience of isolationism is ugly.
- GitHub Renders GeoJSON — Under the hood we use Leaflet.js to render the geoJSON data, and overlay it on a custom version of MapBox’s street view baselayer — simplified so that your data can really shine. Best of all, the base map uses OpenStreetMap data, so if you find an area to improve, edit away.
Could technology be bringing people closer together?
I had quite an experience at Maker Faire this weekend. So instead of a follow up on Google I/O today I’m going talk about how wearables, specifically Google Glass, seem to be bringing people closer together rather than farther apart. So, more on Google I/O later in the week.
A Tale of Two Events
I first broke out my Google Glass at Google I/O where Glass Explorers and Googlers filled the Moscone West sporting the device. Glass Explorers are those that pre-ordered the I/O last year and winners of the #IfIHadGlass contest. The mood towards Glass at I/O was, generally, split into the have’s and have not’s. Those with them proudly showed them off while others fell into the following camps: carefully measured excitement, cool intrigue, and those who were over it. I think for the most part the subdued reaction was a reflection of attendees wanting to be able to get into the action immediately. It was a shame that Glass wasn’t available for purchase to those at I/O this year.
In stark contrast to that reaction was the response I received from attendees of this past weekend’s Maker Faire. My first inkling of what was ahead were the whispers. I would hear excitedly, “Is that the Google Glass?” which made me smile. However, when I met up with my 11:30 a.m. appointment at his booth and started talking about and sharing the Glass with him and his colleagues a mob quickly formed. Frankly, I got scared for a moment as a mass of people forced inward towards me, and then thought what if someone just takes off with these? But, no one did. These mini-mobs happened to me twice, both times in the Electronics area (not surprisingly). The outcome of these Glass flash mobs, however, was quite simply lovely. Individuals were polite, asked me questions, wanted to take pictures of themselves with it and that was it. Throughout the day people would comment on them, stop me to talk, but it was always a pleasure with people smiling ear to ear when I had them play with the device.
What will wearables really mean to society?
The quick answer for now—who knows? I have to say I was a bit overwhelmed by all of this social engagement. I had anticipated some notice, but this? Now, granted, the attendees of a Maker Faire might skew towards being interested in new gadgets and devices but my experience was unexpected—and wonderful. I talked to more random, happy people at this event than I have in a long while. It has given me a new perspective on recent issues that have come up regarding the Glass, such as invasion of privacy and the idea that we are disconnecting with the world more and more via personal devices, when in fact I was finding just the opposite. Maybe in time everyone will have a Glass or have seen one and it won’t be a big deal. But for now, it is generating interaction and discussion about technology with young and old alike.
Oh, and here you can see what it is like to be attacked by a T-Rex from my POV via the Glass, scary stuff. Click here to see the T-Rex Attack.
This will be the first post in a series on my journey through the world with Glass.
Two views on new Google Maps; a look at predictive, intelligent apps; and Aaron Swartz's and Kevin Poulsen's anonymous inbox launches.
Google aims for a new level of map customization
Google introduced a new version of Google maps at Google I/O this week that learns from each use to customize itself to individual users, adapting based on user clicks and searches. A post on the Google blog outlines the updates, which include recommendations for places you might enjoy (based upon your map activity), ratings and reviews, integrated Google Earth, and tours generated from user photos, to name a few.