Spinal Cord Injury Breakthrough by Software — This wasn’t the result of a new, long-term study, but a meta-analysis of $60 million worth of basic research written off as useless 20 years ago by a team of neuroscientists and statisticians led by the University of California San Francisco and partnering with the software firm Ayasdi, using mathematical and machine learning techniques that hadn’t been invented yet when the trials took place.
The Assassination Complex (The Intercept) — America’s drone program’s weaknesses highlighted in new document dump: Taken together, the secret documents lead to the conclusion that Washington’s 14-year high-value targeting campaign suffers from an overreliance on signals intelligence, an apparently incalculable civilian toll, and — due to a preference for assassination rather than capture — an inability to extract potentially valuable intelligence from terror suspects.
Ashley Madison’s Fembot Con (Gizmodo) — As documents from company e-mails now reveal, 80% of first purchases on Ashley Madison were a result of a man trying to contact a bot, or reading a message from one.
Why Futurism Has a Cultural Blindspot (Nautilus) — As the psychologist George Lowenstein and colleagues have argued, in a phenomenon they termed “projection bias,” people “tend to exaggerate the degree to which their future tastes will resemble their current tastes.”
Mind-Controlled Prosthetic Arm (Quartz) — The robotic arm is connected by wires that link up to the wearer’s motor cortex — the part of the brain that controls muscle movement — and sensory cortex, which identifies tactile sensations when you touch things. The wires from the motor cortex allow the wearer to control the motion of the robot arm, and pressure sensors in the arm that connect back into the sensory cortex give the wearer the sensation that they are touching something.
Brain-Machine-Interface for Exoskeleton — no need to worry about the “think of sex every seven seconds” trope, the new system allows users to move forwards, turn left and right, sit and stand simply by staring at one of five flickering LEDs.
P Values are not Error Probabilities (PDF) — In particular, we illustrate how this mixing of statistical testing methodologies has resulted in widespread confusion over the interpretation of p values (evidential measures) and α levels (measures of error). We demonstrate that this confusion was a problem between the Fisherian and Neyman–Pearson camps, is not uncommon among statisticians, is prevalent in statistics textbooks, and is well nigh universal in the pages of leading (marketing) journals. This mass confusion, in turn, has rendered applications of classical statistical testing all but meaningless among applied researchers.
Modern Methods for Sentiment Analysis — Recently, Google developed a method called Word2Vec that captures the context of words, while at the same time reducing the size of the data. Gentle introduction, with code.
UK Copyright Law Permits Researchers to Data Mine — changes mean Copyright holders can require researchers to pay to access their content but cannot then restrict text or data mining for non-commercial purposes thereafter, under the new rules. However, researchers that use the text or data they have mined for anything other than a non-commercial purpose will be said to have infringed copyright, unless the activity has the consent of rights holders. In addition, the sale of the text or data mined by researchers is prohibited. The derivative works will be very interesting: if university mines the journals, finds new possibility for a Thing, is verified experimentally, is that Thing the university’s to license commercially for profit?
Efficient Online Summary of Microblogging Streams (PDF) — research paper. The algorithm we propose uses a word graph, along with optimization techniques such as decaying windows and pruning. It outperforms the baseline in terms of summary quality, as well as time and memory efficiency.
Statistical Shortcomings in Standard Math Libraries — or “Why C Derivatives Are Not Popular With Statistical Scientists”. The following mathematical functions are necessary for implementing any rudimentary statistics application; and yet they are general enough to have many applications beyond statistics. I hereby propose adding them to the standard C math library and to the libraries which inherit from it. For purposes of future discussion, I will refer to these functions as the Elusive Eight.
fail2ban — open source tool that scans logfiles for signs of malice, and triggers actions (e.g., iptables updates).
101 Uses for Content Mining — between the list in the post and the comments from readers, it’s a good introduction to some of the value to be obtained from full-text structured and unstructured access to scientific research publications.
Steve Yegge on GROK (YouTube) — The Grok Project is an internal Google initiative to simplify the navigation and querying of very large program source repositories. We have designed and implemented a language-neutral, canonical representation for source code and compiler metadata. Our data production pipeline runs compiler clusters over all Google’s code and third-party code, extracting syntactic and semantic information. The data is then indexed and served to a wide variety of clients with specialized needs. The entire ecosystem is evolving into an extensible platform that permits languages, tools, clients and build systems to interoperate in well-defined, standardized protocols.
Deep Learning for Semantic Analysis — When trained on the new treebank, this model outperforms all previous methods on several metrics. It pushes the state of the art in single sentence positive/negative classification from 80% up to 85.4%. The accuracy of predicting fine-grained sentiment labels for all phrases reaches 80.7%, an improvement of 9.7% over bag of features baselines. Lastly, it is the only model that can accurately capture the effect of contrastive conjunctions as well as negation and its scope at various tree levels for both positive and negative phrases.
Fireshell — workflow tools and framework for front-end developers.
textfsm — Python module which implements a template based state machine for parsing semi-formatted text. Originally developed to allow programmatic access to information returned from the command line interface (CLI) of networking devices. TextFSM was developed internally at Google and released under the Apache 2.0 licence for the benefit of the wider community.
The Money is in the Bitcoin Protocol (Vikram Kumar) — some of the basics in this post as well as how people are thinking about using the Bitcoin protocol to do some very innovative things. MUST. READ.
Deep Learning Using Support Vector Machines (Arxiv) — we are proposing to train all layers of the deep networks by backpropagating gradients through the top level SVM, learning features of all layers. Our experiments show that simply replacing softmax with linear SVMs gives significant gains on datasets MNIST, CIFAR-10, and the ICML 2013 Representation Learning Workshop’s face expression recognition challenge. (via Oliver Grisel)