OpenRefine — (edited: 7 Dec 2013) Google abandoned Google bought Freebase’s GridWorks, turned it into the excellent Refine tool for working with data sets, now picked up and developed by open source community.
CC 4.0 Out — The 4.0 licenses are extremely well-suited for use by governments and publishers of public sector information and other data, especially for those in the European Union. This is due to the expansion in license scope, which now covers sui generis database rights that exist there and in a handful of other countries.
Algorithms and Accountability — Thus, the appearance of an autocompletion suggestion during the search process might make people decide to search for this suggestion although they didn’t have the intention to. A recent paper by Baker and Potts (2013) consequently questions “the extent to which such algorithms inadvertently help to perpetuate negative stereotypes”. (via New Aesthetic Tumblr)
Udacity/Thrun Profile — A student taking college algebra in person was 52% more likely to pass than one taking a Udacity class, making the $150 price tag–roughly one-third the normal in-state tuition–seem like something less than a bargain. In which Udacity pivots to hiring-sponsored workforce training and the new educational revolution looks remarkably like sponsored content.
Amazon is Building Substations (GigaOm) — the company even has firmware engineers whose job it is to rewrite the archaic code that normally runs on the switchgear designed to control the flow of power to electricity infrastructure. Pretty sure that wasn’t a line item in the pitch deck for “the first Internet bookstore”.
Panoramic Images — throw the camera in the air, get a 360×360 image from 36 2-megapixel lenses. Not sure that throwing was previously a recognised UI gesture.
Science Not as Self-Correcting As It Thinks (Economist) — REALLY good discussion of the shortcomings in statistical practice by scientists, peer-review failures, and the complexities of experimental procedure and fuzziness of what reproducibility might actually mean.
Reproducibility Initiative Receives Grant to Validate Landmark Cancer Studies — The key experimental findings from each cancer study will be replicated by experts from the Science Exchange network according to best practices for replication established by the Center for Open Science through the Center’s Open Science Framework, and the impact of the replications will be tracked on Mendeley’s research analytics platform. All of the ultimate publications and data will be freely available online, providing the first publicly available complete dataset of replicated biomedical research and representing a major advancement in the study of reproducibility of research.
Reimagine the Chemistry Set — $50k prize in contest to design a “chemistry set” type kit that will engage kids as young as 8 and inspire people who are 88. We’re looking for ideas that encourage kids to explore, create, build and question. We’re looking for ideas that honor kids’ curiosity about how things work. Backed by the Moore Foundation and Society for Science and the Public.
BF Skinner’s Baby Make Project (BoingBoing) — I got to read some of Skinner’s original writing on the Air-Crib recently and couple of things stuck out to me. First, it cracked me up. The article, published in 1959 in Cumulative Record, is written in the kind of extra-enthusiastic voice you’re used to hearing Makers use to describe particularly exciting DIY projects.
Redecentralize — project highlighting developers and software that disintermediates the ad-serving parasites preying on our human communication.
The Internet Will Suck All Creative Content Out of the World (David Byrne) — persuasively argued that labels are making all the money from streaming services like Spotify, et al. Musicians are increasingly suspicious of the money and equity changing hands between these services and record labels – both money and equity has been exchanged based on content and assets that artists produced but seem to have no say over. Spotify gave $500m in advances to major labels in the US for the right to license their catalogues.
Your Car is About to go Open Source (ComputerWorld) — an open-source IVI operating system would create a reusable platform consisting of core services, middleware and open application layer interfaces that eliminate the redundant efforts to create separate proprietary systems. Leaving them to differentiate the traditional way: ad-retargeting and spyware.
Steve Yegge on GROK (YouTube) — The Grok Project is an internal Google initiative to simplify the navigation and querying of very large program source repositories. We have designed and implemented a language-neutral, canonical representation for source code and compiler metadata. Our data production pipeline runs compiler clusters over all Google’s code and third-party code, extracting syntactic and semantic information. The data is then indexed and served to a wide variety of clients with specialized needs. The entire ecosystem is evolving into an extensible platform that permits languages, tools, clients and build systems to interoperate in well-defined, standardized protocols.
Deep Learning for Semantic Analysis — When trained on the new treebank, this model outperforms all previous methods on several metrics. It pushes the state of the art in single sentence positive/negative classification from 80% up to 85.4%. The accuracy of predicting fine-grained sentiment labels for all phrases reaches 80.7%, an improvement of 9.7% over bag of features baselines. Lastly, it is the only model that can accurately capture the effect of contrastive conjunctions as well as negation and its scope at various tree levels for both positive and negative phrases.
Fireshell — workflow tools and framework for front-end developers.
Researchers Can Slip an Undetectable Trojan into Intel’s Ivy Bridge CPUs (Ars Technica) — The exploit works by severely reducing the amount of entropy the RNG normally uses, from 128 bits to 32 bits. The hack is similar to stacking a deck of cards during a game of Bridge. Keys generated with an altered chip would be so predictable an adversary could guess them with little time or effort required. The severely weakened RNG isn’t detected by any of the “Built-In Self-Tests” required for the P800-90 and FIPS 140-2 compliance certifications mandated by the National Institute of Standards and Technology.
rethinkdb — open-source distributed JSON document database with a pleasant and powerful query language.
Teach Kids Programming — a collection of resources. I start on Scratch much sooner, and 12+ definitely need the Arduino, but generally I agree with the things I recognise, and have a few to research …
Fog Creek’s Remote Work Policy — In the absence of new information, the assumption is that you’re producing. When you step outside the HQ work environment, you should flip that burden of proof. The burden is on you to show that you’re being productive. Is that because we don’t trust you? No. It’s because a few normal ways of staying involved (face time, informal chats, lunch) have been removed.
MillWheel (PDF) — a framework for building low-latency data-processing applications that is widely used at Google. Users specify a directed computation graph and application code for individual nodes, and the system manages persistent state and the continuous ﬂow of records, all within the envelope of the framework’s fault-tolerance guarantees. From Google Research.
Probabilistic Scraping of Plain Text Tables — the method leverages topological understanding of tables, encodes it declaratively into a mixed integer/linear program, and integrates weak probabilistic signals to classify the whole table in one go (at sub second speeds). This method can be used for any kind of classification where you have strong logical constraints but noisy data.