- Backup Bouncer — software to test how effective your backup tools are: you copy files to a test area by whatever means you like, then run this tool to see whether permissions, flags, owners, contents, timestamps, etc. are preserved. (via Joshua Schachter)
- reVerb — open source (GPLv3) toolkit for learning triples from text. See the paper for more details.
The problem for publishers is that customers don't know what a book is anymore.
The publishing world needs some new language that describes what happens and, more importantly, what is possible when the words are separated from the paper.
STM in Python, Static Web is Back, Cyberwar, and Virtual Language Education
- STM in PyPy — a proposal to add software transactional memory to the all-Python Python interpreter as a way of simplifying concurrent programming. I first learned about STM from Haskell’s Simon Peyton-Jones at OSCON. (via Nelson Minar)
- Werner Vogels’ Static Web Site on S3 — nice writeup of the toolchain to publish a web site to static files served from S3.
- China Inadvertently Reveals State-Sponsored Hacking — if UK, US, France, Israel, or Chinese citizens believe their government doesn’t have malware and penetration teams working on extracting information from foreign governments, they’re dreaming.
- MyChinese360 — virtual foreign language instruction in Mandarin, including “virtual visits” to Chinese landmarks. The ability to get native speakers virtually into the classroom makes the Internet a huge asset for rural schools. (via Lucy Gray)
Illusions, Crowdsourcing, Translations, and Favourite Numbers
- Illusion Contest — every year they run an open contest for optical illusions. Every year new perceptual illusions are discovered, exploiting hitherto unresearched areas of our brain’s functioning.
- Citizen Science Alliance — the team behind GalaxyZoo, who help other researchers in need of crowdsourcing support.
- Ancient Lives — crowdsourced translation and reconstruction of ancient papyri from Oxyrhyncus, already found new gospels (in which the number of the beast is 616, not 666).
- Favourite Number — tell a story about your favourite number. Alex Bellos is behind it, and talked about the great stories he’s collected so far. Contribute now, watch this space to learn more about the stories.
Organising Conferences, Moving to the JVM, Language Crowdsourcing, and Bayesian Computing
- Conference Organisers Handbook — accurate guide to running a two-day 300-person conference. See also Yet Another Perl Conference guidelines.
- Twitter Shifting More Code to JVM — interesting how, at scale, there are some tools and techniques of the scorned Enterprise that the web cool kids must turn to. Some. Business Process Workflow XML Schemas will never find love.
- Louis von Ahn on Duolingo — from the team that gave us “OCR books as you verify you are a human” CAPTCHAs comes “learn a new language as you translate the web”. I would love to try this, it sounds great (and is an example of what crowdsourcing can be).
- Fully Bayesian Computing (PDF) — A fully Bayesian computing environment calls for the possibility of defining vector and array objects that may contain both random and deterministic quantities, and syntax rules that allow treating these objects much like any variables or numeric arrays. Working within the statistical package R, we introduce a new object-oriented framework based on a new random variable data type that is implicitly represented by simulations. Perl made text processing easy because strings were first-class objects with a rich set of functions to operate on them; Node.js has a sweet HTTP library; it’s interesting to see how much more intuitive an algorithm becomes when random variables are a data type. (via BigData)
Internet Access Rights, Statistical Peace, Vintage Jobs, and Errata Etymology
- Right to Access the Internet — a survey of different countries’ rights to access to access the Internet.
- Peace Through Statistics — three ex-Yugoslavian statisticians nominated for Nobel Peace Prize. In war-torn and impoverished countries, statistics provides a welcome arena in which science runs independent of ethnicity and religion. With so few resources, many countries are graduating few, if any, PhDs in statistical sciences. These statisticians collaboratively began a campaign to collect together the basics underlying statistics and statistics education, with the hope of increasing access to statistical ideas, knowledge and training around the world.
- Vintage Steve Jobs (YouTube) — he’s launching the “Think Different” campaign, but it’s a great reminder of what a powerful speaker he is and a look at how he thinks about marketing.
- Anatomy of a Fake Quotation (The Atlantic) — deconstructing how the words of a 24 year old English teacher in Japan sped around the world, attributed to Martin Luther King.
One-Click Zeroed Down Under, Piracy, One Site To Rule Them All, and English Language
- Telsta Scores Patent Win over Amazon (ZDNet) — The delegate of the Commissioner of Patents, Ed Knock, found this week that Amazon’s 1-click buy facility “lacks novelty [and] an inventive step”, making Amazon’s claim unpatentable.
- The Final Answer for What To Do To Prevent Piracy (Jeff Vogel) — His advice is to do the minimum to encourage people to pay, as Anything beyond that will inconvenience your paying customers and do little to nothing to prevent piracy.
- alpha.gov.uk — an experimental prototype of a single interface to all government services. Governments have been trying these for years. This one’s different–it’s not built by the highest bidder, it’s the result of a lean team headed by the stellar Tom Loosemore (ex-BBC). It’s prototyping the idea of using lightweight reusable syndication-friendly components (decision trees, calculators, guides, etc.) to build such a site. My suspicion, though, is that government websites are a people problem not a technology problem.
- A StackExchange for the English Language — what’s the collective noun for pedants?
A Princeton search algorithm uses language indicators to measure importance.
A search algorithm being developed by Princeton University researchers parses language to determine relevance. Academic application is one possibility, but this type of algorithm could also extend to news recommendations.
Big data as a discipline or a conference topic is still in its formative years.
Big data is a massive opportunity, but the language used to describe it ("goldrush," "data deluge," "firehose," etc.) reveals we're still searching for its identity.