- Steve Yegge on GROK (YouTube) — The Grok Project is an internal Google initiative to simplify the navigation and querying of very large program source repositories. We have designed and implemented a language-neutral, canonical representation for source code and compiler metadata. Our data production pipeline runs compiler clusters over all Google’s code and third-party code, extracting syntactic and semantic information. The data is then indexed and served to a wide variety of clients with specialized needs. The entire ecosystem is evolving into an extensible platform that permits languages, tools, clients and build systems to interoperate in well-defined, standardized protocols.
- Deep Learning for Semantic Analysis — When trained on the new treebank, this model outperforms all previous methods on several metrics. It pushes the state of the art in single sentence positive/negative classification from 80% up to 85.4%. The accuracy of predicting fine-grained sentiment labels for all phrases reaches 80.7%, an improvement of 9.7% over bag of features baselines. Lastly, it is the only model that can accurately capture the effect of contrastive conjunctions as well as negation and its scope at various tree levels for both positive and negative phrases.
- Fireshell — workflow tools and framework for front-end developers.
ENTRIES TAGGED "Google"
Google Code Analysis, Deep Learning, Front-End Workflow, and SICP in JS
Google's Data Centers, Top Engineers, Hiring, and Git Explained
- Google Has Spent 21 Billion on Data Centers — The company invested a record $1.6 billion in its data centers in the second quarter of 2013. Puts my impulse-purchased second external hard-drive into context, doesn’t it honey?
- 10x Engineer (Shanley) — in which the idea that it’s scientifically shown that some engineers are innately 10x others is given a rough and vigorous debunking.
- How to Hire — great advice, including “Poaching is the titty twister of Silicon Valley relationships”.
- Think Like a Git — a guide to git, for the perplexed.
Fanout Architectures, In-Browser Emulation, Paean to Programmability, and Social Hardware
- Achieving Rapid Response Times in Large Online Services (PDF) — slides from a talk by Jeff Dean on fanout architectures. (via Alex Dong)
- Go Ahead, Mess with Texas Instruments (The Atlantic) — School typically assumes that answers fall neatly into categories of “right” and “wrong.” As a conventional tool for computing “right” answers, calculators often legitimize this idea; the calculator solves problems, gives answers. But once an endorsed, conventional calculator becomes a subversive, programmable computer it destabilizes this polarity. Programming undermines the distinction between “right” and “wrong” by emphasizing the fluidity between the two. In programming, there is no “right” answer. Sure, a program might not compile or run, but making it offers multiple pathways to success, many of which are only discovered through a series of generative failures. Programming does not reify “rightness;” instead, it orients the programmer toward intentional reading, debugging, and refining of language to ensure clarity.
- When A Spouse Puts On Google Glass (NY Times) — Google Glass made me realize how comparably social mobile phones are. [...] People gather around phones to watch YouTube videos or look at a funny tweet together or jointly analyze a text from a friend. With Glass, there was no such sharing.
Flexible Layouts, Web Components, Distributed SQL Database, and Reverse-Engineering Dropbox Client
- intention.js — manipulates the DOM via HTML attributes. The methods for manipulation are placed with the elements themselves, so flexible layouts don’t seem so abstract and messy.
- F1: A Distributed SQL Database That Scales — a distributed relational database system built at Google to support the AdWords business. F1 is a hybrid database that combines high availability, the scalability of NoSQL systems like Bigtable, and the consistency and usability of traditional SQL databases. F1 is built on Spanner, which provides synchronous cross-datacenter replication and strong consistency. Synchronous replication implies higher commit latency, but we mitigate that latency by using a hierarchical schema model with structured data types and through smart application design. F1 also includes a fully functional distributed SQL query engine and automatic change tracking and publishing.
- Looking Inside The (Drop)Box (PDF) — This paper presents new and generic techniques, to reverse engineer frozen Python applications, which are not limited to just the Dropbox world. We describe a method to bypass Dropbox’s two factor authentication and hijack Dropbox accounts. Additionally, generic techniques to intercept SSL data using code injection techniques and monkey patching are presented. (via Tech Republic)
Mobile Image Cache, Google on Net Neutrality, Future of Programming, and PSD Files in Ruby
- How to Easily Resize and Cache Images for the Mobile Web (Pete Warden) — I set up a server running the excellent ImageProxy open-source project, and then I placed a Cloudfront CDN in front of it to cache the results. (a how-to covering the tricksy bits)
- Google’s Position on Net Neutrality Changes? (Wired) — At issue is Google Fiber’s Terms of Service, which contains a broad prohibition against customers attaching “servers” to its ultrafast 1 Gbps network in Kansas City. Google wants to ban the use of servers because it plans to offer a business class offering in the future. [...] In its response [to a complaint], Google defended its sweeping ban by citing the very ISPs it opposed through the years-long fight for rules that require broadband providers to treat all packets equally.
- The Future of Programming (Bret Victor) — gorgeous slides, fascinating talk, and this advice from Alan Kay: I think the trick with knowledge is to “acquire it, and forget all except the perfume” — because it is noisy and sometimes drowns out one’s own “brain voices”. The perfume part is important because it will help find the knowledge again to help get to the destinations the inner urges pick.
- psd.rb — Ruby code for reading PSD files (MIT licensed).
Rules of the Internet, Bigness of the Data, Wifi ADCs, and Google Flirts with Client-Side Encryption
- Ten Rules of the Internet (Anil Dash) — they’re all candidates for becoming “Dash’s Law”. I like this one the most: When a company or industry is facing changes to its business due to technology, it will argue against the need for change based on the moral importance of its work, rather than trying to understand the social underpinnings.
- Data Storage by Vertical (Quartz) — The US alone is home to 898 exabytes (1 EB = 1 billion gigabytes)—nearly a third of the global total. By contrast, Western Europe has 19% and China has 13%. Legally, much of that data itself is property of the consumers or companies who generate it, and licensed to companies that are responsible for it. And in the US—a digital universe of 898 exabytes (1 EB = 1 billion gigabytes)—companies have some kind of liability or responsibility for 77% of all that data.
- x-OSC — a wireless I/O board that provides just about any software with access to 32 high-performance analogue/digital channels via OSC messages over WiFi. There is no user programmable firmware and no software or drivers to install making x-OSC immediately compatible with any WiFi-enabled platform. All internal settings can be adjusted using any web browser.
- Google Experimenting with Encrypting Google Drive (CNet) — If that’s the case, a government agency serving a search warrant or subpoena on Google would be unable to obtain the unencrypted plain text of customer files. But the government might be able to convince a judge to grant a wiretap order, forcing Google to intercept and divulge the user’s login information the next time the user types it in. Advertising depends on the service provider being able to read your data. Either your Drive’s contents aren’t valuable to Google advertising, or it won’t be a host-resistant encryption process.
Microvideos for MIcrohelp, Organic Search, Probabilistic Programming, and Cluster Management
- How to Make Help Microvideos For Your Site (Alex Holovaty) — Instead of one monolithic video, we decided to make dozens of tiny, five-second videos separately demonstrating features.
- How Google is Killing Organic Search — 13% of the real estate is organic results in a search for “auto mechanic”, 7% for “italian restaurant”, 0% if searching on an iPhone where organic results are four page scrolls away. SEO Book did an extensive analysis of just how important the top left of the page, previously occupied by organic results actually is to visitors. That portion of the page is now all Google. (via Alex Dong)
- Church — probabilistic programming language from MIT, with tutorials. (via Edd Dumbill)
- mesos — a cluster manager that provides efficient resource isolation and sharing across distributed applications, or frameworks. It can run Hadoop, MPI, Hypertable, Spark (a new framework for low-latency interactive and iterative jobs), and other applications. Mesos is open source in the Apache Incubator. (via Ben Lorica)
Thread Problems, Better Image Search, Open Standards, and GitHub Maps
- Multithreading is Hard — The compiler and the processor both conspire to defeat your threads by moving your code around! Be warned and wary! You will have to do battle with both. Sample code and explanation of WTF the eieio barrier is (hint: nothing to do with Old McDonald’s server farm). (via Erik Michaels-Ober)
- Improving Photo Search (Google Research) — volume of training images, number of CPU cores, and Freebase entities. (via Alex Dong)
- Is Google Dumping Open Standards for Open Wallets? (Matt Asay) — it’s easier to ship than standardise, to innovate than integrate, but the ux of a citizen in the real world is pants. Like blog posts? Log into Facebook to read your friends! (or Google+) Chat is great, but you’d better have one client per corporation your friends hang out on. Nobody woke up this morning asking for features to make web pages only work on one browser. The user experience of isolationism is ugly.
- GitHub Renders GeoJSON — Under the hood we use Leaflet.js to render the geoJSON data, and overlay it on a custom version of MapBox’s street view baselayer — simplified so that your data can really shine. Best of all, the base map uses OpenStreetMap data, so if you find an area to improve, edit away.