"science" entries

Four short links: 17 April 2015

Distributed SQLite, Communicating Scientists, Learning from Failure, and Cat Convergence

by Nat Torkington | @gnat | +Nat Torkington | April 17, 2015

Replicating SQLite using Raft Consensus — clever, he used a consensus algorithm to build a distributed (replicated) SQLite.
When Open Access is the Norm, How do Scientists Communicate? (PLOS) — From interviews I’ve conducted with researchers and software developers who are modeling aspects of modern online collaboration, I’ve highlighted the most useful and reproducible practices. (via Jon Udell)
Meet DJ Patil — “It was this kind of moment when you realize: ‘Oh, my gosh, I am that stupid,’” he said.
Interview with Bruce Sterling on the Convergence of Humans and Machines — If you are a human being, and you are doing computation, you are trying to multiply 17 times five in your head. It feels like thinking. Machines can multiply, too. They must be thinking. They can do math and you can do math. But the math you are doing is not really what cognition is about. Cognition is about stuff like seeing, maneuvering, having wants, desires. Your cat has cognition. Cats cannot multiply 17 times five. They have got their own umwelt (environment). But they are mammalian, you are a mammalian. They are actually a class that includes you. You are much more like your house cat than you are ever going to be like Siri. You and Siri converging, you and your house cat can converge a lot more easily. You can take the imaginary technologies that many post-human enthusiasts have talked about, and you could afflict all of them on a cat. Every one of them would work on a cat. The cat is an ideal laboratory animal for all these transitions and convergences that we want to make for human beings. (via Vaughan Bell)

Four short links: 7 April 2015

JavaScript Numeric Methods, Misunderstood Statistics, Web Speed, and Sentiment Analysis

by Nat Torkington | @gnat | +Nat Torkington | April 7, 2015

NumericJS — numerical methods in JavaScript.
P Values are not Error Probabilities (PDF) — In particular, we illustrate how this mixing of statistical testing methodologies has resulted in widespread confusion over the interpretation of p values (evidential measures) and α levels (measures of error). We demonstrate that this confusion was a problem between the Fisherian and Neyman–Pearson camps, is not uncommon among statisticians, is prevalent in statistics textbooks, and is well nigh universal in the pages of leading (marketing) journals. This mass confusion, in turn, has rendered applications of classical statistical testing all but meaningless among applied researchers.
Breaking the 1000ms Time to Glass Mobile Barrier (YouTube) —
See also slides. Stay under 250 ms to feel “fast.” Stay under 1000 ms to keep users’ attention.
Modern Methods for Sentiment Analysis — Recently, Google developed a method called Word2Vec that captures the context of words, while at the same time reducing the size of the data. Gentle introduction, with code.

Four short links: 30 March 2015

Philosophical Research, Reading Turing, Security Exercises, and Golang Madness

by Nat Torkington | @gnat | +Nat Torkington | March 30, 2015

The Trolley and the Psychopath — Not only does a “utilitarian” response (“just kill the fat guy”) not actually reflect a utilitarian outlook, it may actually be driven by broad antisocial tendencies, such as lowered empathy and a reduced aversion to causing someone harm. Questionably expanding scope of claims in the behavioural philosophy research. (via Ed Yong)
Summary of Computing Machinery and Intelligence (1950) by Alan Turing (Jack Hoy) — still interesting and relevant today. cf Why Aren’t We Reading Turing
Exploit Exercises — a variety of virtual machines, documentation, and challenges that can be used to learn about a variety of computer security issues, such as privilege escalation, vulnerability analysis, exploit development, debugging, reverse engineering, and general cyber security issues.
GopherJS — golang to Javascript compiler so you can experience the ease of typed compiled languages in the security and stability of the browser platform.

Four short links: 19 March 2015

Changing Behaviour, Building Filters, Public Access, and Working Capital

by Nat Torkington | @gnat | +Nat Torkington | March 19, 2015

Using Monitoring Dashboards to Change Behaviour — [After years of neglect] One day we wrote some brittle Ruby scripts that polled various services. They collated the metrics into a simple database and we automated some email reports and built a dashboard showing key service metrics. We pinpointed issues that we wanted to show people. Things like the login times, how long it would take to search for certain keywords in the app, and how many users were actually using the service, along with costs and other interesting facts. We sent out the link to the dashboard at 9am on Monday morning, before the weekly management call. Within 2 weeks most problems were addressed. It is very difficult to combat data, especially when it is laid out in an easy to understand way.
Quiet Mitsubishi Cars — noise-cancelling on phone calls by using machine learning to build the filters.
NSF Requiring Public Access — NSF will require that articles in peer-reviewed scholarly journals and papers in juried conference proceedings or transactions be deposited in a public access compliant repository and be available for download, reading, and analysis within one year of publication.
Filtered for Capital (Matt Webb) — It’s important to get a credit line [for hardware startups] because growing organically isn’t possible — even if half your sell-in price is margin, you can only afford to grow your batch size at 50% per cycle… and whether it’s credit or re-investing the margin, all that growth incurs risk, because the items aren’t pre-sold. There are double binds all over the place here.

Four short links: 11 February 2015

Crowdsourcing Working, etcd DKVS, Psychology Progress, and Inferring Logfile Rules

by Nat Torkington | @gnat | +Nat Torkington | February 11, 2015

Crowdsourcing Isn’t Broken — great rundown of ways to keep crowdsourcing on track. As with open sourcing something, just throwing open the doors and hoping for the best has a low probability of success.
etcd Hits 2.0 — first major stable release of an open source, distributed, consistent key-value store for shared configuration, service discovery, and scheduler coordination.
You Can’t Play 20 Questions With Nature and Win (PDF) — There is, I submit, a view of the scientific endeavor that is implicit (and sometimes explicit) in the picture I have presented above. Science advances by playing 20 questions with nature. The proper tactic is to frame a general question, hopefully binary, that can be attacked experimentally. Having settled that bits-worth, one can proceed to the next. The policy appears optimal – one never risks much, there is feedback from nature at every step, and progress is inevitable. Unfortunately, the questions never seem to be really answered, the strategy does not seem to work. An old paper, but still resonant today. (via Mind Hacks)
Sequence: Automated Analyzer for Reducing 100k Messages to 10s of Patterns — induces patterns from the examples in log files.

Four short links: 7 January 2015

Program Synthesis, Data Culture, Metrics, and Information Biology

by Nat Torkington | @gnat | +Nat Torkington | January 7, 2015

Program Synthesis Explained — The promise of program synthesis is that programmers can stop telling computers how to do things, and focus instead on telling them what they want to do. Inductive program synthesis tackles this problem with fairly vague specifications and, although many of the algorithms seem intractable, in practice they work remarkably well.
Creating a Data-Driven Culture — new (free!) ebook from Hilary Mason and DJ Patil. The editor of that team is the luckiest human being alive.
Ev Williams on Metrics — a master-class in how to think about and measure what matters. If what you care about — or are trying to report on — is impact on the world, it all gets very slippery. You’re not measuring a rectangle, you’re measuring a multi-dimensional space. You have to accept that things are very imperfectly measured and just try to learn as much as you can from multiple metrics and anecdotes.
Nature, the IT Wizard (Nautilus) — a fun walk through the connections between information theory, computation, and biology.

Four short links: 26 December 2014

Science Software, Better Bitmaps, Pushy Internet, and Graphical Perception

by Nat Torkington | @gnat | +Nat Torkington | December 26, 2014

How Bad Software Leads to Bad Science — 21% of scientists who write software have never received training in software development.
Roaring Bitmaps — compressed bitmaps which tend to outperform conventional compressed bitmaps such as WAH, EWAH or Concise. In some instances, they can be hundreds of times faster and they often offer significantly better compression.
Two Eras of the Internet: From Pull to Push (Chris Dixon) — in which the consumer becomes the infinite sink for an unending and constant stream of updates, media, and social mobile local offers to swipe right on brands near you.
Graphical Perception: Theory, Experimentation, and Application to the Development of Graphical Methods (PDF) — research on how well people decode visual cues. In order: Position along a common scale e.g. scatter plot; Position on identical but nonaligned scales e.g. multiple scatter plots; Length e.g. bar chart; Angle & Slope (tie) e.g. pie chart; Area e.g. bubbles; Volume, density, and color saturation (tie) e.g. heatmap; Color hue e.g. newsmap. (via Flowing Data)

Four short links: 1 December 2014

Marketing Color, Brain Time, Who Could Have Foreseen 19100, and ASCII Cam

by Nat Torkington | @gnat | +Nat Torkington | December 1, 2014

Psychology of Color in Marketing and Branding — sidesteps the myths and chromobollocks, and gives the simplest pictorial views of some basic colour choice systems that I found very useful.
Brain Time (David Eagleman) — the visual system is a distributed system with some flexible built-in consistency. So if the visual brain wants to get events correct timewise, it may have only one choice: wait for the slowest information to arrive. To accomplish this, it must wait about a tenth of a second. In the early days of television broadcasting, engineers worried about the problem of keeping audio and video signals synchronized. Then they accidentally discovered that they had around a hundred milliseconds of slop: As long as the signals arrived within this window, viewers’ brains would automatically resynchronize the signals; outside that tenth-of-a-second window, it suddenly looked like a badly dubbed movie.
CS Bumper Stickers (PDF) — Allocate four digits for the year part of a date: a new millenium is coming. —David Martin. From 1985.
ASCIIcam — real-time ASCII output from your videocamera. This is doing terrible things to my internal chronometer. Is it 2014 or 1984? Yes!

Four short links: 25 November 2014

NSA Playset, Open Access, XSS Framework, and Security Test Cases

by Nat Torkington | @gnat | +Nat Torkington | November 25, 2014

Michael Ossman and the NSA Playset — the guy who read the leaked descriptions of the NSA’s toolchest, built them, and open sourced the designs. One device, dubbed TWILIGHTVEGETABLE, is a knock off of an NSA-built GSM cell phone that’s designed to sniff and monitor Internet traffic. The ANT catalog lists it for $15,000; the NSA Playset researchers built one using a USB flash drive, a cheap SDR, and an antenna, for about $50. The most expensive device, a drone that spies on WiFi traffic called PORCUPINEMASQUERADE, costs about $600 to assemble. At Defcon, a complete NSA Playset toolkit was auctioned by the EFF for $2,250.
Gates Foundation Announces World’s Strongest Policy on Open Access Research (Nature) — Once made open, papers must be published under a license that legally allows unrestricted re-use — including for commercial purposes. This might include ‘mining’ the text with computer software to draw conclusions and mix it with other work, distributing translations of the text, or selling republished versions. CC-BY! We believe that published research resulting from our funding should be promptly and broadly disseminated.
Xenotix — an advanced Cross Site Scripting (XSS) vulnerability detection and exploitation framework. It provides Zero False Positive scan results with its unique Triple Browser Engine (Trident, WebKit, and Gecko) embedded scanner. It is claimed to have the world’s 2nd largest XSS Payloads of about 4700+ distinctive XSS Payloads for effective XSS vulnerability detection and WAF Bypass. Xenotix Scripting Engine allows you to create custom test cases and addons over the Xenotix API. It is incorporated with a feature-rich Information Gathering module for target Reconnaissance. The Exploit Framework includes offensive XSS exploitation modules for Penetration Testing and Proof of Concept creation.
Firing Range — Google’s open source set of web security test cases for scanners.