Chimera (Paper a Day) — the authors summarise six main lessons learned while building Chimera: (1) Things break down at large scale; (2) Both learning and hand-crafted rules are critical; (3) Crowdsourcing is critical, but must be closely monitored; (4) Crowdsourcing must be coupled with in-house analysts and developers; (5) Outsourcing does not work at a very large scale; (6) Hybrid human-machine systems are here to stay.
Microsoft Embedding Research — To break down the walls between its research group and the rest of the company, Microsoft reassigned about half of its more than 1,000 research staff in September 2014 to a new group called MSR NExT. Its focus is on projects with greater impact to the company rather than pure research. Meanwhile, the other half of Microsoft Research is getting pushed to find more significant ways it can contribute to the company’s products. The challenge is how to avoid short-term thinking from your research team. For instance, Facebook assigns some staff to focus on long-term research, and Google’s DeepMind group in London conducts pure AI research without immediate commercial considerations.
Google’s Go-Playing AI — The key to AlphaGo is reducing the enormous search space to something more manageable. To do this, it combines a state-of-the-art tree search with two deep neural networks, each of which contains many layers with millions of neuron-like connections. One neural network, the “policy network,” predicts the next move, and is used to narrow the search to consider only the moves most likely to lead to a win. The other neural network, the “value network,” is then used to reduce the depth of the search tree — estimating the winner in each position in place of searching all the way to the end of the game.
You Can’t Destroy a Village to Save It (EFF) — EFF have a clever compromise for W3C’s proposal for DRM on the Web. [T]he W3C could have its cake and eat it, too. It could adopt a rule that requires members who help make DRM standards to promise not to sue people who report bugs in tools that conform to those standards, nor could they sue people just for making a standards-based tool that connected to theirs. They could make DRM, but only if they made sure that they took steps to stop that DRM from being used to attack the open Web. I hope the W3C take it.
Copyright Law Shouldn’t Keep Me From Fixing a Tractor (Slate) — When a farmer friend of mine wanted to know if there was a way to tweak the copyrighted software of his broken tractor, I knew it was going to be rough. The only way to get around the DMCA’s restriction on software tinkering is to ask the Copyright Office for an exemption at the Section 1201 Rulemaking, an arduous proceeding that takes place just once every three years.
License to Drive — I have difficulty viewing No Drive Day as imminent. We’re maybe 95% there, but that last 5% will be a lengthy slog.
2015 CCC Videos — collected talks from the 32nd Chaos Computer Congress conference.
An Integrated Bayesian Approach for Effective Multi-Truth Discovery (PDF) — Integrating data from multiple sources has been increasingly becoming commonplace in both Web and the emerging Internet of Things (IoT) applications to support collective intelligence and collaborative decision-making. Unfortunately, it is not unusual that the information about a single item comes from different sources, which might be noisy, out-of-date, or even erroneous. It is therefore of paramount importance to resolve such conflicts among the data and to find out which piece of information is more reliable.
A Psychological Exploration of Engagement in Geek Culture — Seven studies (N = 2354) develop the Geek Culture Engagement Scale (GCES) to quantify geek engagement and assess its relationships to theoretically relevant personality and individual differences variables. These studies present evidence that individuals may engage in geek culture in order to maintain narcissistic self-views (the great fantasy migration hypothesis), to fulfill belongingness needs (the belongingness hypothesis), and to satisfy needs for creative expression (the need for engagement hypothesis). Geek engagement is found to be associated with elevated grandiose narcissism, extraversion, openness to experience, depression, and subjective well-being across multiple samples.
How Machines Write Poetry — Harmon would love to have writers or other experts judge FIGURE8’s work, too. Her online subjects tended to rate the similes better if they were obvious. “The snow continued like a heavy rain” got high scores, for example, even though Harmon thought this was quite a bad effort on FIGURE8’s part. She preferred “the snow falls like a dead cat,” which got only middling ratings from humans. “They might have been cat lovers,” she says.FIGURE8 (PDF) system generates figurative language.
The Decisions the Pentagon Wants to Leave to Robots — “You cannot have a human operator operating at human speed fighting back at determined cyber tech,” Work said. “You are going to need have a learning machine that does that.” I for one welcome our new robot script kiddie overlords.
Love in the Age of Big Data — Over decades, John has observed more than 3,000 couples longitudinally, discovering patterns of argument and subtle behaviors that can predict whether a couple would be happily partnered years later or unhappy or divorced. Turns out, “don’t be a jerk” is good advice for marriages, too. (via Cory Doctorow)
Spinal Cord Injury Breakthrough by Software — This wasn’t the result of a new, long-term study, but a meta-analysis of $60 million worth of basic research written off as useless 20 years ago by a team of neuroscientists and statisticians led by the University of California San Francisco and partnering with the software firm Ayasdi, using mathematical and machine learning techniques that hadn’t been invented yet when the trials took place.
The Assassination Complex (The Intercept) — America’s drone program’s weaknesses highlighted in new document dump: Taken together, the secret documents lead to the conclusion that Washington’s 14-year high-value targeting campaign suffers from an overreliance on signals intelligence, an apparently incalculable civilian toll, and — due to a preference for assassination rather than capture — an inability to extract potentially valuable intelligence from terror suspects.
Ashley Madison’s Fembot Con (Gizmodo) — As documents from company e-mails now reveal, 80% of first purchases on Ashley Madison were a result of a man trying to contact a bot, or reading a message from one.
Why Futurism Has a Cultural Blindspot (Nautilus) — As the psychologist George Lowenstein and colleagues have argued, in a phenomenon they termed “projection bias,” people “tend to exaggerate the degree to which their future tastes will resemble their current tastes.”
Mind-Controlled Prosthetic Arm (Quartz) — The robotic arm is connected by wires that link up to the wearer’s motor cortex — the part of the brain that controls muscle movement — and sensory cortex, which identifies tactile sensations when you touch things. The wires from the motor cortex allow the wearer to control the motion of the robot arm, and pressure sensors in the arm that connect back into the sensory cortex give the wearer the sensation that they are touching something.
Brain-Machine-Interface for Exoskeleton — no need to worry about the “think of sex every seven seconds” trope, the new system allows users to move forwards, turn left and right, sit and stand simply by staring at one of five flickering LEDs.
P Values are not Error Probabilities (PDF) — In particular, we illustrate how this mixing of statistical testing methodologies has resulted in widespread confusion over the interpretation of p values (evidential measures) and α levels (measures of error). We demonstrate that this confusion was a problem between the Fisherian and Neyman–Pearson camps, is not uncommon among statisticians, is prevalent in statistics textbooks, and is well nigh universal in the pages of leading (marketing) journals. This mass confusion, in turn, has rendered applications of classical statistical testing all but meaningless among applied researchers.
Modern Methods for Sentiment Analysis — Recently, Google developed a method called Word2Vec that captures the context of words, while at the same time reducing the size of the data. Gentle introduction, with code.
UK Copyright Law Permits Researchers to Data Mine — changes mean Copyright holders can require researchers to pay to access their content but cannot then restrict text or data mining for non-commercial purposes thereafter, under the new rules. However, researchers that use the text or data they have mined for anything other than a non-commercial purpose will be said to have infringed copyright, unless the activity has the consent of rights holders. In addition, the sale of the text or data mined by researchers is prohibited. The derivative works will be very interesting: if university mines the journals, finds new possibility for a Thing, is verified experimentally, is that Thing the university’s to license commercially for profit?
Efficient Online Summary of Microblogging Streams (PDF) — research paper. The algorithm we propose uses a word graph, along with optimization techniques such as decaying windows and pruning. It outperforms the baseline in terms of summary quality, as well as time and memory efficiency.
Statistical Shortcomings in Standard Math Libraries — or “Why C Derivatives Are Not Popular With Statistical Scientists”. The following mathematical functions are necessary for implementing any rudimentary statistics application; and yet they are general enough to have many applications beyond statistics. I hereby propose adding them to the standard C math library and to the libraries which inherit from it. For purposes of future discussion, I will refer to these functions as the Elusive Eight.
fail2ban — open source tool that scans logfiles for signs of malice, and triggers actions (e.g., iptables updates).
101 Uses for Content Mining — between the list in the post and the comments from readers, it’s a good introduction to some of the value to be obtained from full-text structured and unstructured access to scientific research publications.