ENTRIES TAGGED "text"

Four short links: 9 September 2014

Four short links: 9 September 2014

Go Text, Science Consensus, Broadcast Fallacy, and In-Browser Swift

  1. bleveA modern text indexing library for go.
  2. Scientific Consensus Has A Bad Reputation—And Doesn’t Deserve It (Ars Technica) — a lovely explanation of how informal consensus works in science. NB for anyone building social software which attempts to formalise and automate consensus.
  3. TiVo Mega — 24TB of RAID storage, six tuners for capturing broadcasts. Which is rather like building the International Space Station and then hitching it to six horses for launch. Who at this point would make a $5k bet that everything you want to see on a TV will be broadcast by a cable company?
  4. runswift — an in-browser client for compiling and running basic Swift functionality.
Comments: 2
Four short links: 27 February 2014

Four short links: 27 February 2014

A Fine Rant, Continuous Deployment, IBeacon Spec, and LaTeX Gets a Collaborative Multiplayer Mode

  1. Our Comrade, The Electron (Maciej Ceglowski) — a walk through the life of the inventor of the Theremin, with a pointed rant about how we came to build the surveillance state for the state. One of the best conference talks ever, and I was in the audience for it!
  2. go.cd — continuous deployment and delivery automation tool from Thoughtworks, nothing to do with the Go programming language. The name is difficult to search for, so naturally we needed the added confusion of two projects sharing the name. Continuous deployment is an important part of devops (“the job of our programmers is not to write code, it is to deploy working code into production”—who said this? I’ve lost the reference already).
  3. Apple iBeacon Developer Programme — info locked up behind registration. Sign an NDA to get the specs, free to use the name. Interesting because iBeacon and other Bluetooth LE implementations are promising steps to building a network of things. (via Beekn)
  4. ShareLaTeX — massively multiplayer online LaTeX. Open sourced.
Comment
Four short links: 16 October 2013

Four short links: 16 October 2013

New Math, Business Math, Summarising Text, Clipping Images

  1. Scientific Data Has Become So Complex, We Have to Invent New Math to Deal With It (Jennifer Ouellette) — Yale University mathematician Ronald Coifman says that what is really needed is the big data equivalent of a Newtonian revolution, on par with the 17th century invention of calculus, which he believes is already underway.
  2. Is Google Jumping the Shark? (Seth Godin) — Public companies almost inevitably seek to grow profits faster than expected, which means beyond the organic growth that comes from doing what made them great in the first place. In order to gain that profit, it’s typical to hire people and reward them for measuring and increasing profits, even at the expense of what the company originally set out to do. Eloquent redux.
  3. textteaser — open source text summarisation algorithm.
  4. Clipping MagicInstantly create masks, cutouts, and clipping paths online.
Comment
Pictures that propel prose

Pictures that propel prose

How illustrations and a clear path can enhance a story.

A clear reading path isn't always a bad thing. Here's an example where imagery advances the narrative and guides the reader along a defined trajectory.

Comments: 2
Keeping images and text in sync

Keeping images and text in sync

Two examples of how digital images and associated text can stick together.

The fluidity of digital content occasionally sends images in one direction and text in another. Here's a look at two design experiments that keep digital assets together.

Comment
The blurring line between speech and text

The blurring line between speech and text

We all say things we regret, and now we all write things we regret.

Recent social media gaffes show that our definitions and thresholds for speech and text must evolve. A third category has emerged: Internet-based updates that marry the ephemeral nature of speech and the archival permanance of text.

Comments: 2
Four short links: 21 April 2011

Four short links: 21 April 2011

Regular Expressions, Mac Git, Open Source Patents, and Pepys Lessons

  1. Rubular — a way to write and test regular expressions interactively. Very cool. (via Adam Fields)
  2. gitx — OSX ui for git. (via Marc Hedlund)
  3. Open Source Critical to Competition (Simon Phipps) — DOJ and German Federal Cartel Office see danger for open source in Novell’s patents being acquired by a consortium of Oracle, Microsoft, Apple, and EMC (fancy!) and are taking steps to ensure open source is protected.
  4. My Talk about Samuel Pepys’s Diary as an Online Story (Phil Gyford) — I love the ways Phil has stretched and repurposed the web’s affects for storytelling. Listen to this talk. (via BoingBoing)
Comments Off
Four short links: 5 November 2010

Four short links: 5 November 2010

Stream Processing, Semantic Web, Location Services, and PDF Extraction

  1. S4S4 is a general-purpose, distributed, scalable, partially fault-tolerant, pluggable platform that allows programmers to easily develop applications for processing continuous unbounded streams of data. Open-sourced (Apache license) by Yahoo!.
  2. RDF and Semantic Web: Can We Reach Escape Velocity? (PDF) — spot-on presentation from the data.gov.uk linked data advisor. It nails, clearly and in only 12 slides, why there’s still resistance to linked data uptake and what should happen to change this. Amen! (via Simon St Laurent)
  3. Pew Internet Report on Location-based Services10% of online Hispanics use these services – significantly more than online whites (3%) or online blacks (5%).
  4. Slate — Python library for extracting text from PDFs easily.
Comments Off
Four short links: 6 August 2010

Four short links: 6 August 2010

Amazon Margins, Crowdsourced Science, Data Tool Opensourced, Document Splitting

  1. AWS: Forget the Revenue, Did You See the Margins? (RedMonk) — According to UBS, Amazon Web Services gross margins for the years 2006 through 2014 are 47%, 48%, 48%, 49%, 49%, 50%, 50.5%, 51%, 53%. (these are analyst projections, so take with grain of salt, but those are some sweet margins if they’re even close to accurate)
  2. Science Pipesan environment in which students, educators, citizens, resource managers, and scientists can create and share analyses and visualizations of biodiversity data. It is built to support inquiry-based learning, allowing analysis results and visualizations to be dynamically incorporated into web sites (e.g. blogs) for dissemination and consumption beyond SciencePipes.org itself. (via mikeloukides on Twitter)
  3. ScraperWiki Source Code — AGPL-licensed source to the ScraperWiki, a tool for data storage, cleaning, search, visualization, and export.
  4. Doc splita command-line utility and Ruby library for splitting apart documents into their component parts: searchable UTF-8 plain text via OCR if necessary, page images or thumbnails in any format, PDFs, single pages, and document metadata (title, author, number of pages…)
Comments Off
Four short links: 17 June 2010

Four short links: 17 June 2010

Statistical Jeopardy Wins, Mobile Taxonomy, Geodata Mystery, and Machine Learning Blog

  1. What is IBM’s Watson? (NY Times) — IBM joining the big data machine learning race, and hatching a Blue Gene system that can answer Jeopardy questions. Does good, not great, and is getting better.
  2. Google Lays Out its Mobile Strategy (InformationWeek) — notable to me for Rechis said that Google breaks down mobile users into three behavior groups: A. “Repetitive now” B. “Bored now” C. “Urgent now”, a useful way to look at it. (via Tim)
  3. BP GIS and the Mysteriously Vanishing Letter — intrigue in the geodata world. This post makes it sound as though cleanup data is going into a box behind BP’s firewall, and the folks who said “um, the government should be the depot, because it needs to know it has a guaranteed-untampered and guaranteed-able-to-access copy of this data” were fired. For more info, including on the data that is available, see the geowanking thread.
  4. Streamhacker — a blog talking about text mining and other good things, with nltk code you can run. (via heraldxchaos on Delicious)
Comments Off