- Scientific Data Has Become So Complex, We Have to Invent New Math to Deal With It (Jennifer Ouellette) — Yale University mathematician Ronald Coifman says that what is really needed is the big data equivalent of a Newtonian revolution, on par with the 17th century invention of calculus, which he believes is already underway.
- Is Google Jumping the Shark? (Seth Godin) — Public companies almost inevitably seek to grow profits faster than expected, which means beyond the organic growth that comes from doing what made them great in the first place. In order to gain that profit, it’s typical to hire people and reward them for measuring and increasing profits, even at the expense of what the company originally set out to do. Eloquent redux.
- textteaser — open source text summarisation algorithm.
- Clipping Magic — Instantly create masks, cutouts, and clipping paths online.
ENTRIES TAGGED "text"
New Math, Business Math, Summarising Text, Clipping Images
How illustrations and a clear path can enhance a story.
A clear reading path isn't always a bad thing. Here's an example where imagery advances the narrative and guides the reader along a defined trajectory.
Two examples of how digital images and associated text can stick together.
The fluidity of digital content occasionally sends images in one direction and text in another. Here's a look at two design experiments that keep digital assets together.
Regular Expressions, Mac Git, Open Source Patents, and Pepys Lessons
- Rubular — a way to write and test regular expressions interactively. Very cool. (via Adam Fields)
- gitx — OSX ui for git. (via Marc Hedlund)
- Open Source Critical to Competition (Simon Phipps) — DOJ and German Federal Cartel Office see danger for open source in Novell’s patents being acquired by a consortium of Oracle, Microsoft, Apple, and EMC (fancy!) and are taking steps to ensure open source is protected.
- My Talk about Samuel Pepys’s Diary as an Online Story (Phil Gyford) — I love the ways Phil has stretched and repurposed the web’s affects for storytelling. Listen to this talk. (via BoingBoing)
Stream Processing, Semantic Web, Location Services, and PDF Extraction
- S4 — S4 is a general-purpose, distributed, scalable, partially fault-tolerant, pluggable platform that allows programmers to easily develop applications for processing continuous unbounded streams of data. Open-sourced (Apache license) by Yahoo!.
- RDF and Semantic Web: Can We Reach Escape Velocity? (PDF) — spot-on presentation from the data.gov.uk linked data advisor. It nails, clearly and in only 12 slides, why there’s still resistance to linked data uptake and what should happen to change this. Amen! (via Simon St Laurent)
- Pew Internet Report on Location-based Services — 10% of online Hispanics use these services – significantly more than online whites (3%) or online blacks (5%).
- Slate — Python library for extracting text from PDFs easily.
Amazon Margins, Crowdsourced Science, Data Tool Opensourced, Document Splitting
- AWS: Forget the Revenue, Did You See the Margins? (RedMonk) — According to UBS, Amazon Web Services gross margins for the years 2006 through 2014 are 47%, 48%, 48%, 49%, 49%, 50%, 50.5%, 51%, 53%. (these are analyst projections, so take with grain of salt, but those are some sweet margins if they’re even close to accurate)
- Science Pipes — an environment in which students, educators, citizens, resource managers, and scientists can create and share analyses and visualizations of biodiversity data. It is built to support inquiry-based learning, allowing analysis results and visualizations to be dynamically incorporated into web sites (e.g. blogs) for dissemination and consumption beyond SciencePipes.org itself. (via mikeloukides on Twitter)
- ScraperWiki Source Code — AGPL-licensed source to the ScraperWiki, a tool for data storage, cleaning, search, visualization, and export.
- Doc split — a command-line utility and Ruby library for splitting apart documents into their component parts: searchable UTF-8 plain text via OCR if necessary, page images or thumbnails in any format, PDFs, single pages, and document metadata (title, author, number of pages…)
Statistical Jeopardy Wins, Mobile Taxonomy, Geodata Mystery, and Machine Learning Blog
- What is IBM’s Watson? (NY Times) — IBM joining the big data machine learning race, and hatching a Blue Gene system that can answer Jeopardy questions. Does good, not great, and is getting better.
- Google Lays Out its Mobile Strategy (InformationWeek) — notable to me for Rechis said that Google breaks down mobile users into three behavior groups: A. “Repetitive now” B. “Bored now” C. “Urgent now”, a useful way to look at it. (via Tim)
- BP GIS and the Mysteriously Vanishing Letter — intrigue in the geodata world. This post makes it sound as though cleanup data is going into a box behind BP’s firewall, and the folks who said “um, the government should be the depot, because it needs to know it has a guaranteed-untampered and guaranteed-able-to-access copy of this data” were fired. For more info, including on the data that is available, see the geowanking thread.
- Streamhacker — a blog talking about text mining and other good things, with nltk code you can run. (via heraldxchaos on Delicious)
Open Facebook, Internet Stats, Handling Interviews, and Textual Relationships
- Don’t Simply Build a More Open Facebook, Build a Better One — Most people don’t care so much about whether technology is “open” or “closed” so long as it works. (Case in point: iPhone.) Rather than starting your plans by picking which “open” standards you’ll use, start by designing a better social networking service and then determine how “open” specs will help you build that service. (via David Recordon)
- Internet Stats from Google — very nice categorized factoids about internet use, technology, trends, etc. 64% of C-level executives conduct six or more searches per day to locate business information.
- Qualitative Methods for IS Research — summary of qualitative methods (interviews, documents, observation data) as applied to IS. Written for academics, so you have to choke back passive voice vomit (sorry, “passive voice vomit must be choked back”) but it’s got lots of useful information on approaches and tools. (via johnny723 on Twitter)
- Social Signaling and Language Use — turns out the stopwords like “to”, “be”, and “on” are the ones that indicate manager-subordinate relationships. In so many fields I see again and again that you keep data at each stage of transformation, because transforming for one use prevents others. (via terrycojones on Twitter)
Digital Texts 2.0 is an interesting application for Facebook that lets you group and share digital material. It's intriguing to see cutting edge development occurring in this space. From the Digital Texts 2.0 about page: Digital Texts 2.0 was undertaken by Dr. Stéfan Sinclair as an initiative to experiment with applying the principles of Web 2.0 to the realm…