- UK Copyright Law Permits Researchers to Data Mine — changes mean Copyright holders can require researchers to pay to access their content but cannot then restrict text or data mining for non-commercial purposes thereafter, under the new rules. However, researchers that use the text or data they have mined for anything other than a non-commercial purpose will be said to have infringed copyright, unless the activity has the consent of rights holders. In addition, the sale of the text or data mined by researchers is prohibited. The derivative works will be very interesting: if university mines the journals, finds new possibility for a Thing, is verified experimentally, is that Thing the university’s to license commercially for profit?
- Efficient Online Summary of Microblogging Streams (PDF) — research paper. The algorithm we propose uses a word graph, along with optimization techniques such as decaying windows and pruning. It outperforms the baseline in terms of summary quality, as well as time and memory efficiency.
- Statistical Shortcomings in Standard Math Libraries — or “Why C Derivatives Are Not Popular With Statistical Scientists”. The following mathematical functions are necessary for implementing any rudimentary statistics application; and yet they are general enough to have many applications beyond statistics. I hereby propose adding them to the standard C math library and to the libraries which inherit from it. For purposes of future discussion, I will refer to these functions as the Elusive Eight.
- fail2ban — open source tool that scans logfiles for signs of malice, and triggers actions (e.g., iptables updates).
ENTRIES TAGGED "science"
Failure of Imagination, Meat Failure Mode, Grand Challenges, and Data Programming
- Maximum Happy Imagination (Matt Jones) — questioning the true vision of Marc Andreessen’s recent Twitter discourse on the great future that awaits us. His analogies run out in the 20th century when it comes to the political, social and economic implications of his maximum happy imagination.
- The Mirrortocracy — It’s astonishing how many of the people conducting interviews and passing judgement on the careers of candidates have had no training at all on how to do it well. Aside from their own interviews, they may not have ever seen one. I’m all for learning on your own but at least when you write a program wrong it breaks. Without a natural feedback loop, interviewing mostly runs on myth and survivor bias.
- Longitude Prize — six prize areas, Grand Challenge style, in clean flight, antibiotic resistance, dementia, food, water, and overcoming paralysis. Mysteriously none for library system that avoids DLL hell.
- The Re-Emergence of Datalog — Michael Fogus overviews Datalog and provides examples of how it is implemented and used in Datomic, Cascalog, and the Bacwn Clojure library. See also notes from the talk.
Right to Mine, Summarising Microblogs, C Sucks for Stats, and Scanning Logfiles
- 16 Interviewing Tips for User Studies — these apply to many situations beyond user interviews, too.
- The Backlash Against Big Data contd. (Mike Loukides) — Learn to be a data skeptic. That doesn’t mean becoming skeptical about the value of data; it means asking the hard questions that anyone claiming to be a data scientist should ask. Think carefully about the questions you’re asking, the data you have to work with, and the results that you’re getting. And learn that data is about enabling intelligent discussions, not about turning a crank and having the right answer pop out.
- The Science of Science Writing (American Scientist) — also applicable beyond the specific field for which it was written.
Time Series, CT Scanner, Reading List, and Origami Microscope
Internet of Listeners, Mobile Deep Belief, Crowdsourced Spectrum Data, and Quantum Minecraft
- Jasper Project — an open source platform for developing always-on, voice-controlled applications. Shouting is the new swiping—I eagerly await Gartner touting the Internet-of-things-that-misunderstand-you.
- DeepBeliefSDK — deep neural network library for iOS. (via Pete Warden)
- Microsoft Spectrum Observatory — crowdsourcing spectrum utilisation information. Just open sourced their code.
- qcraft — beginner’s guide to quantum physics in Minecraft. (via Nelson Minar)
Understanding Image Processing, Sharing Data, Fixing Bad Science, and Delightful Dashboard
- 2D Image Post-Processing Techniques and Algorithms (DIY Drones) — understanding how automated image matching and processing tools work means you can also get a better understanding how to shoot your images and what to prevent to get good matches.
- Scientists Need to Learn to Share — despite science’s reputation for rigor, sloppiness is a substantial problem in some fields. You’re much more likely to check your work and follow best data-handling practices when you know someone is going to run your code and parse your data.
- METRICS — Meta-Research Innovation Center at Stanford. John Ioannidis has a posse: connecting researchers into weak science, running conferences, creating a “journal watch”, and engaging policy makers. (says The Economist)
- Grafana — elegant dashboard for graphite (the realtime data graphing engine).
Web Past, Web Future, Automated Jerkholism, and Science Education
- High Volume Web Sites — Tim Berners-Lee answers my question on provisioning a popular web server in 1993. The info.cern.ch server which has the Subject Catalogue gets probably a relatively high usage, about 10k requests a day, or (thinks…) one every 9 seconds. the CPU load is negligible. In fact of course the peak rate is higher, but still its not really a factor. That was when the server forked a subprocess for each request, too. See also one of my early contributions to the nascent field of web operations (language alert).
- Tim Berners-Lee Calls For Web Magna Carta (Guardian) — Unless we have an open, neutral internet we can rely on without worrying about what’s happening at the back door, we can’t have open government, good democracy, good healthcare, connected communities and diversity of culture. It’s not naive to think we can have that, but it is naive to think we can just sit back and get it.
- BroApp — Automatically message your girlfriend sweet things so you can spend more time with the Bros. Reminds me of the Electric Monk in Dirk Gently’s Holistic Detective Agency. The monk notices that humans have machines to watch TV for them. Now we have machines to be shitty boyfriends for us. (via Beta Knowledge)
- World Science U — quick answers, short courses, long MOOCs. I wonder how you’d know whether this was effective at increasing scientific literacy, and therefore whether it’d be worth doing for computational thought or programming.
Minecraft+Pi+Python, Science Torrents, Web App Performance Measurement, and Streaming Data
- Programming Minecraft Pi with Python — an early draft, but shows promise for kids. (via Raspberry Pi)
- Terasaur — BitTorrent for mad-large files, making it easy for datasets to be saved and exchanged.
- Bucky — Open-source tool to measure the performance of your web app directly from your users’ browsers. Nifty graph.
- Zoe Keating’s Streaming Payouts — actual data on a real musician’s distribution and revenues through various channels. Hint: streaming is tragicomically low-paying. (via Andy Baio)
Definitive answers require further testing
The following is from the second issue of BioCoder, the quarterly newsletter for synthetic biologists, DIY biologists, neurobiologists, and more. Download your free copy today.
Within DIYbio, one cannot escape the hacking metaphor. The metaphor is ubiquitous and, to a point, useful. The term connotes both productive play with an existing technology aimed at improvement and, at the same time, play with sinister undertones. In this sense, hacking captures the promise and pitfalls of the dual uses any mature technology might be put to, whether that technology is as dramatic as nuclear power/weapons or as mundane as a free/premium software license. But every metaphor has its limits. Pushed too far, metaphors break down, and instead of illuminating, they obscure. Which brings me to ask: how far can the hacking metaphor be pushed within DIYbio—at least the part of DIYbio falling in line with synthetic biology?
In-Game Economy, AI Ethics, Data Repository, and Regulated Disruption
- $200k of Spaceships Destroyed (The Verge) — More than 2,200 of the game’s players, members of EVE’s largest alliances, came together to shoot each other out of the sky. The resultant damage was valued at more than $200,000 of real-world money. [...] Already, the battle has had an impact on the economics and politics of EVE’s universe: as both side scramble to rearm and rebuild, the price of in-game resource tritanium is starting to rise. “This sort of conflict,” Coker said, “is what science fiction warned us about.”
- Google Now Has an AI Ethics Committee (HufPo) — sorry for the HufPo link. One of the requirements of the DeepMind acquisition was that Google agreed to create an AI safety and ethics review board to ensure this technology is developed safely. Page’s First Law of Robotics: A robot may not block an advertisement, nor through inaction, allow an advertisement to come to harm.
- Academic Torrents — a scalable, secure, and fault-tolerant repository for data, with blazing fast download speeds built on BitTorrent.
- Hack Schools Meet California Regulators (Venturebeat) — turns out vocational training is a regulated profession. Regulation meets disruption, annihilate in burst of press releases.