ENTRIES TAGGED "research"
Google Code Analysis, Deep Learning, Front-End Workflow, and SICP in JS
- Steve Yegge on GROK (YouTube) — The Grok Project is an internal Google initiative to simplify the navigation and querying of very large program source repositories. We have designed and implemented a language-neutral, canonical representation for source code and compiler metadata. Our data production pipeline runs compiler clusters over all Google’s code and third-party code, extracting syntactic and semantic information. The data is then indexed and served to a wide variety of clients with specialized needs. The entire ecosystem is evolving into an extensible platform that permits languages, tools, clients and build systems to interoperate in well-defined, standardized protocols.
- Deep Learning for Semantic Analysis — When trained on the new treebank, this model outperforms all previous methods on several metrics. It pushes the state of the art in single sentence positive/negative classification from 80% up to 85.4%. The accuracy of predicting fine-grained sentiment labels for all phrases reaches 80.7%, an improvement of 9.7% over bag of features baselines. Lastly, it is the only model that can accurately capture the effect of contrastive conjunctions as well as negation and its scope at various tree levels for both positive and negative phrases.
- Fireshell — workflow tools and framework for front-end developers.
Google's Data Centers, Top Engineers, Hiring, and Git Explained
- Google Has Spent 21 Billion on Data Centers — The company invested a record $1.6 billion in its data centers in the second quarter of 2013. Puts my impulse-purchased second external hard-drive into context, doesn’t it honey?
- 10x Engineer (Shanley) — in which the idea that it’s scientifically shown that some engineers are innately 10x others is given a rough and vigorous debunking.
- How to Hire — great advice, including “Poaching is the titty twister of Silicon Valley relationships”.
- Think Like a Git — a guide to git, for the perplexed.
Verified Web, Verified Base64, Theorem Prover, and Fast Events in C
- Quark — a web browser with a formally-proven kernel.
- High-Assurance Base64 — formally verified C implementation of Base64.
- z3 — fast theorem prover from Microsoft Research.
- libphenom (GitHub) — Facebook’s open sourced eventing framework. (High-scalability, natch)
Remote Work, Raspberry Pi Code Machine, Low-Latency Data Processing, and Probabilistic Table Parsing
- Fog Creek’s Remote Work Policy — In the absence of new information, the assumption is that you’re producing. When you step outside the HQ work environment, you should flip that burden of proof. The burden is on you to show that you’re being productive. Is that because we don’t trust you? No. It’s because a few normal ways of staying involved (face time, informal chats, lunch) have been removed.
- MillWheel (PDF) — a framework for building low-latency data-processing applications that is widely used at Google. Users specify a directed computation graph and application code for individual nodes, and the system manages persistent state and the continuous ﬂow of records, all within the envelope of the framework’s fault-tolerance guarantees. From Google Research.
- Probabilistic Scraping of Plain Text Tables — the method leverages topological understanding of tables, encodes it declaratively into a mixed integer/linear program, and integrates weak probabilistic signals to classify the whole table in one go (at sub second speeds). This method can be used for any kind of classification where you have strong logical constraints but noisy data.
Google Play Services, Self-Signed Kernels, Visualising Scientific Papers, and New Microcontroller
- How Google’s Defragging Android (Ars Technica) — Android’s becoming a pudgy microkernel for the Google Play Services layer that’s in userland, closed source, and a way to bypass carriers’ lag for upgrades.
- Booting a Self-Signed Linux Kernel (Greg Kroah-Hartman) — procedures for how to boot a self-signed Linux kernel on a platform so that you do not have to rely on any external signing authority.
- Paperscape — A map of scientific papers from the arXiv.
- Trinket — Adafruit’s latest microcontroller board. Small but perfectly formed.
Fanout Architectures, In-Browser Emulation, Paean to Programmability, and Social Hardware
- Achieving Rapid Response Times in Large Online Services (PDF) — slides from a talk by Jeff Dean on fanout architectures. (via Alex Dong)
- Go Ahead, Mess with Texas Instruments (The Atlantic) — School typically assumes that answers fall neatly into categories of “right” and “wrong.” As a conventional tool for computing “right” answers, calculators often legitimize this idea; the calculator solves problems, gives answers. But once an endorsed, conventional calculator becomes a subversive, programmable computer it destabilizes this polarity. Programming undermines the distinction between “right” and “wrong” by emphasizing the fluidity between the two. In programming, there is no “right” answer. Sure, a program might not compile or run, but making it offers multiple pathways to success, many of which are only discovered through a series of generative failures. Programming does not reify “rightness;” instead, it orients the programmer toward intentional reading, debugging, and refining of language to ensure clarity.
- When A Spouse Puts On Google Glass (NY Times) — Google Glass made me realize how comparably social mobile phones are. [...] People gather around phones to watch YouTube videos or look at a funny tweet together or jointly analyze a text from a friend. With Glass, there was no such sharing.
Flexible Layouts, Web Components, Distributed SQL Database, and Reverse-Engineering Dropbox Client
- intention.js — manipulates the DOM via HTML attributes. The methods for manipulation are placed with the elements themselves, so flexible layouts don’t seem so abstract and messy.
- F1: A Distributed SQL Database That Scales — a distributed relational database system built at Google to support the AdWords business. F1 is a hybrid database that combines high availability, the scalability of NoSQL systems like Bigtable, and the consistency and usability of traditional SQL databases. F1 is built on Spanner, which provides synchronous cross-datacenter replication and strong consistency. Synchronous replication implies higher commit latency, but we mitigate that latency by using a hierarchical schema model with structured data types and through smart application design. F1 also includes a fully functional distributed SQL query engine and automatic change tracking and publishing.
- Looking Inside The (Drop)Box (PDF) — This paper presents new and generic techniques, to reverse engineer frozen Python applications, which are not limited to just the Dropbox world. We describe a method to bypass Dropbox’s two factor authentication and hijack Dropbox accounts. Additionally, generic techniques to intercept SSL data using code injection techniques and monkey patching are presented. (via Tech Republic)
Distrusting CA Certs, Brain Talk, Ineffective Interventions, and Visual A/B Tools
- Reducing the Roots of Some Evil (Etsy) — Based on our first two months of data we have removed a number of unused CA certificates from some pilot systems to test the effects, and will run CAWatch for a full six months to build up a more comprehensive view of what CAs are in active use. Sign of how broken the CA system for SSL is. (via Alex Dong)
- Mind the Brain — PLOS podcast interviews Sci Foo alum and delicious neuroscience brain of awesome, Vaughan Bell. (via Fabiana Kubke)
- How Often are Ineffective Interventions Still Used in Practice? (PLOSone) — tl;dr: 8% of the time. Imagine the number if you asked how often ineffective software development practices are still used.
- Announcing Evan’s Awesome A/B Tools — I am calling these tools awesome because they are intuitive, visual, and easy-to-use. Unlike other online statistical calculators you’ve probably seen, they’ll help you understand what’s going on “under the hood” of common statistical tests, and by providing ample visual context, they make it easy for you to explain p-values and confidence intervals to your boss. (And they’re free!)
Driverless Intersections, Quantum Information, Low-Energy Wireless Networking, and Scammy Game Tactics
- Autonomous Intersection Management Project — a scalable, safe, and efficient multiagent framework for managing autonomous vehicles at intersections. (via How Driverless Cars Could Reshape Cities)
- Quantum Information (New Scientist) — a gentle romp through the possible and the actual for those who are new to the subject.
- Ambient Backscatter (PDF) — a new communication primitive where devices communicate by backscattering ambient RF signals. Our design avoids the expensive process of generating radio waves; backscatter communication is orders of magnitude more power-efﬁcient than traditional radio communication. (via Hacker News)
- Top Free-to-Play Monetization Tricks (Gamasutra) — amazingly evil ways that free games lure you into paying. At this point the user must choose to either spend about $1 or lose their rewards, lose their stamina (which they could get back for another $1), and lose their progress. To the brain this is not just a loss of time. If I spend an hour writing a paper and then something happens and my writing gets erased, this is much more painful to me than the loss of an hour. The same type of achievement loss is in effect here. Note that in this model the player could be defeated multiple times in the boss battle and in getting to the boss battle, thus spending several dollars per dungeon.
Huxley Beat Orwell?, Cloud Keys, Motorola's DARPA, and Internet Archive Credit Union
- Huxley vs Orwell — buy Amusing Ourselves to Death if this rings true. The future is here, it’s just not evenly surveilled. (via rone)
- KeyMe — keys in the cloud. (Digital designs as backups for physical objects)
- Motorola Advanced Technology and Products Group — The philosophy behind Motorola ATAP is to create an organization with the same level of appetite for technology advancement as DARPA, but with a consumer focus. It is a pretty interesting place to be. And they hired the excellent Johnny Chung Lee.
- Internet Credit Union — Internet Archive starts a Credit Union. Can’t wait to see memes on debit cards.