"language" entries

The paperless book

The problem for publishers is that customers don't know what a book is anymore.

The publishing world needs some new language that describes what happens and, more importantly, what is possible when the words are separated from the paper.

Four short links: 6 September 2011

Four short links: 6 September 2011

Javascript Primitives, Test Backups, Learn Triples, and Scale Javascript

  1. The Secret Life of Javascript Primitives — good writing and clever headlines can make even the dullest topic seem interesting. This is interesting, I hasten to add.
  2. Backup Bouncer — software to test how effective your backup tools are: you copy files to a test area by whatever means you like, then run this tool to see whether permissions, flags, owners, contents, timestamps, etc. are preserved. (via Joshua Schachter)
  3. reVerb — open source (GPLv3) toolkit for learning triples from text. See the paper for more details.
  4. Patterns for Large-Scale Javascript Architecture — enterprise (aka “scalable”) architectures for Javascript apps.
Four short links: 24 August 2011

Four short links: 24 August 2011

STM in Python, Static Web is Back, Cyberwar, and Virtual Language Education

  1. STM in PyPy — a proposal to add software transactional memory to the all-Python Python interpreter as a way of simplifying concurrent programming. I first learned about STM from Haskell’s Simon Peyton-Jones at OSCON. (via Nelson Minar)
  2. Werner Vogels’ Static Web Site on S3 — nice writeup of the toolchain to publish a web site to static files served from S3.
  3. China Inadvertently Reveals State-Sponsored Hacking — if UK, US, France, Israel, or Chinese citizens believe their government doesn’t have malware and penetration teams working on extracting information from foreign governments, they’re dreaming.
  4. MyChinese360 — virtual foreign language instruction in Mandarin, including “virtual visits” to Chinese landmarks. The ability to get native speakers virtually into the classroom makes the Internet a huge asset for rural schools. (via Lucy Gray)
Four short links: 15 August 2011

Four short links: 15 August 2011

Illusions, Crowdsourcing, Translations, and Favourite Numbers

  1. Illusion Contest — every year they run an open contest for optical illusions. Every year new perceptual illusions are discovered, exploiting hitherto unresearched areas of our brain’s functioning.
  2. Citizen Science Alliance — the team behind GalaxyZoo, who help other researchers in need of crowdsourcing support.
  3. Ancient Lives — crowdsourced translation and reconstruction of ancient papyri from Oxyrhyncus, already found new gospels (in which the number of the beast is 616, not 666).
  4. Favourite Number — tell a story about your favourite number. Alex Bellos is behind it, and talked about the great stories he’s collected so far. Contribute now, watch this space to learn more about the stories.
Four short links: 5 July 2011

Four short links: 5 July 2011

Organising Conferences, Moving to the JVM, Language Crowdsourcing, and Bayesian Computing

  1. Conference Organisers Handbook — accurate guide to running a two-day 300-person conference. See also Yet Another Perl Conference guidelines.
  2. Twitter Shifting More Code to JVM — interesting how, at scale, there are some tools and techniques of the scorned Enterprise that the web cool kids must turn to. Some. Business Process Workflow XML Schemas will never find love.
  3. Louis von Ahn on Duolingo — from the team that gave us “OCR books as you verify you are a human” CAPTCHAs comes “learn a new language as you translate the web”. I would love to try this, it sounds great (and is an example of what crowdsourcing can be).
  4. Fully Bayesian Computing (PDF) — A fully Bayesian computing environment calls for the possibility of defining vector and array objects that may contain both random and deterministic quantities, and syntax rules that allow treating these objects much like any variables or numeric arrays. Working within the statistical package R, we introduce a new object-oriented framework based on a new random variable data type that is implicitly represented by simulations. Perl made text processing easy because strings were first-class objects with a rich set of functions to operate on them; Node.js has a sweet HTTP library; it’s interesting to see how much more intuitive an algorithm becomes when random variables are a data type. (via BigData)
Four short links: 27 May 2011

Four short links: 27 May 2011

Twitter DB, Data Reliance, Open Source Architectures, and Short-Form Bullying

  1. flockdb (Github) — Twitter’s open source scalable fault-tolerant distributed key-store graph database. (via Twitter’s open source projects page)
  2. How to Kill Innovation in Five Easy Steps (Tech Republic) — point four is interesting, Rely too heavily on data and dashboards. It’s good to be reminded of the contra side to the big-data-can-be-mined-for-all-truths attitudes flying around.
  3. Architecture of Open Source Applications — CC-licensed book available through Lulu or for free download. Lots of interesting stories and design decisions to draw from. I know when I learned how Perl worked on the inside, I learned a hell of a lot that I could apply later in life and respected its creators all the more.
  4. Bullying in 140 Letters — it’s about an Australian storm in a teacup, but it made me consider the short-form medium. Short-form negativity can have the added colour/resonance of being snarky and funny. Hard to add colour to short-form positive comments, though. Much harder to be funny and positive than to be funny and negative. Have we inadvertently created a medium where, thanks to the quirks of our language and the way we communicate, it favours negativity over positivity?
Four short links: 19 May 2011

Four short links: 19 May 2011

Internet Access Rights, Statistical Peace, Vintage Jobs, and Errata Etymology

  1. Right to Access the Internet — a survey of different countries’ rights to access to access the Internet.
  2. Peace Through Statistics — three ex-Yugoslavian statisticians nominated for Nobel Peace Prize. In war-torn and impoverished countries, statistics provides a welcome arena in which science runs independent of ethnicity and religion. With so few resources, many countries are graduating few, if any, PhDs in statistical sciences. These statisticians collaboratively began a campaign to collect together the basics underlying statistics and statistics education, with the hope of increasing access to statistical ideas, knowledge and training around the world.
  3. Vintage Steve Jobs (YouTube) — he’s launching the “Think Different” campaign, but it’s a great reminder of what a powerful speaker he is and a look at how he thinks about marketing.
  4. Anatomy of a Fake Quotation (The Atlantic) — deconstructing how the words of a 24 year old English teacher in Japan sped around the world, attributed to Martin Luther King.
Four short links: 12 May 2011

Four short links: 12 May 2011

One-Click Zeroed Down Under, Piracy, One Site To Rule Them All, and English Language

  1. Telsta Scores Patent Win over Amazon (ZDNet) — The delegate of the Commissioner of Patents, Ed Knock, found this week that Amazon’s 1-click buy facility “lacks novelty [and] an inventive step”, making Amazon’s claim unpatentable.
  2. The Final Answer for What To Do To Prevent Piracy (Jeff Vogel) — His advice is to do the minimum to encourage people to pay, as Anything beyond that will inconvenience your paying customers and do little to nothing to prevent piracy.
  3. alpha.gov.uk — an experimental prototype of a single interface to all government services. Governments have been trying these for years. This one’s different–it’s not built by the highest bidder, it’s the result of a lean team headed by the stellar Tom Loosemore (ex-BBC). It’s prototyping the idea of using lightweight reusable syndication-friendly components (decision trees, calculators, guides, etc.) to build such a site. My suspicion, though, is that government websites are a people problem not a technology problem.
  4. A StackExchange for the English Language — what’s the collective noun for pedants?

Smarter search looks for influence rather than links

A Princeton search algorithm uses language indicators to measure importance.

A search algorithm being developed by Princeton University researchers parses language to determine relevance. Academic application is one possibility, but this type of algorithm could also extend to news recommendations.

Big Data: An opportunity in search of a metaphor

Big data as a discipline or a conference topic is still in its formative years.

Big data is a massive opportunity, but the language used to describe it ("goldrush," "data deluge," "firehose," etc.) reveals we're still searching for its identity.