ENTRIES TAGGED "data mining"

Four short links: 23 August 2012

Four short links: 23 August 2012

Computational Social Science, Infrastructure Drives Design, Narcodrones Imminent, and Muscle Memory

  1. Computational Social Science (Nature) — Facebook and Twitter data drives social science analysis. (via Vaughan Bell)
  2. The Single Most Important Object in the Global Economy (Slate) — Companies like Ikea have literally designed products around pallets: Its “Bang” mug, notes Colin White in his book Strategic Management, has had three redesigns, each done not for aesthetics but to ensure that more mugs would fit on a pallet (not to mention in a customer’s cupboard). (via Boing Boing)
  3. Narco Ultralights (Wired) — it’s just a matter of time until there are no humans on the ultralights. Remote-controlled narcodrones can’t be far away.
  4. Shortcut Foo — a typing tutor for editors, photoshop, and the commandline, to build muscle memory of frequently-used keystrokes. Brilliant! (via Irene Ros)
Comment: 1

Mining the astronomical literature

A clever data project shows the promise of open and freely accessible academic literature.

There is a huge debate right now about making academic literature freely accessible and moving toward open access. But what would be possible if people stopped talking about it and just dug in and got on with it?

NASA’s Astrophysics Data System (ADS), hosted by the Smithsonian Astrophysical Observatory (SAO), has quietly been working away since the mid-’90s. Without much, if any, fanfare amongst the other disciplines, it has moved astronomers into a world where access to the literature is just a given. It’s something they don’t have to think about all that much.

The ADS service provides access to abstracts for virtually all of the astronomical literature. But it also provides access to the full text of more than half a million papers, going right back to the start of peer-reviewed journals in the 1800s. The service has links to online data archives, along with reference and citation information for each of the papers, and it’s all searchable and downloadable.

Number of papers published in the three main astronomy journals each year
Number of papers published in the three main astronomy journals each year. CREDIT: Robert Simpson

The existence of the ADS, along with the arXiv pre-print server, has meant that most astronomers haven’t seen the inside of a brick-built library since the late 1990s.

It also makes astronomy almost uniquely well placed for interesting data mining experiments, experiments that hint at what the rest of academia could do if they followed astronomy’s lead. The fact that the discipline’s literature has been scanned, archived, indexed and catalogued, and placed behind a RESTful API makes it a treasure trove, both for hypothesis generation and sociological research.

Read more…

Comments: 10
Four short links: 24 May 2012

Four short links: 24 May 2012

Maker Tribe, Concept Mapping, Magic Wand, and Site Performance Matters

  1. Last Saturday My Son Found His People at the Maker Faire — aww to the power of INFINITY.
  2. Dictionaries Linking Words to Concepts (Google Research) — Wikipedia entries for concepts, text strings from searches and the oppressed workers down the Text Mines, and a count indicating how often the two were related.
  3. Magic Wand (Kickstarter) — I don’t want the game, I want a Bluetooth magic wand. I don’t want to click the OK button, I want to wave a wand and make it so! (via Pete Warden)
  4. E-Commerce Performance (Luke Wroblewski) — If a page load takes more than two seconds, 40% are likely to abandon that site. This is why you should follow Steve Souders like a hawk: if your site is slower than it could be, you’re leaving money on the table.
Comment: 1
Four short links: 8 February 2012

Four short links: 8 February 2012

Text Mining, Unstoppable Sociality, Unicode Fun, and Scholarly Publishing

  1. Mavunoan open source, modular, scalable text mining toolkit built upon Hadoop. (Apache-licensed)
  2. Cow Clicker — Wired profile of Cowclicker creator Ian Bogost. I was impressed by Cow Clickers [...] have turned what was intended to be a vapid experience into a source of camaraderie and creativity. People create communities around social activities, even when they are antisocial. (via BoingBoing)
  3. Unicode Has a Pile of Poo Character (BoingBoing) — this is perfect.
  4. The Research Works Act and the Breakdown of Mutual Incomprehension (Cameron Neylon) — an excellent summary of how researchers and publishers view each other and their place in the world.
Comment

Unstructured data is worth the effort when you’ve got the right tools

Alyona Medelyan and Anna Divoli on the opportunities in chaotic data.

Alyona Medelyan and Anna Divoli are inventing tools to help companies contend with vast quantities of fuzzy data. They discuss their work and what lies ahead for big data in this interview.

Comment

Unstructured data is worth the effort when you've got the right tools

Alyona Medelyan and Anna Divoli on the opportunities in chaotic data.

Alyona Medelyan and Anna Divoli are inventing tools to help companies contend with vast quantities of fuzzy data. They discuss their work and what lies ahead for big data in this interview.

Comment
Four short links: 13 January 2012

Four short links: 13 January 2012

Internet in Culture, Flash Security Tool, Haptic E-Books, and Facebook Mining Private Updates

  1. How The Internet Gets Inside Us (The New Yorker) — at any given moment, our most complicated machine will be taken as a model of human intelligence, and whatever media kids favor will be identified as the cause of our stupidity. When there were automatic looms, the mind was like an automatic loom; and, since young people in the loom period liked novels, it was the cheap novel that was degrading our minds. When there were telephone exchanges, the mind was like a telephone exchange, and, in the same period, since the nickelodeon reigned, moving pictures were making us dumb. When mainframe computers arrived and television was what kids liked, the mind was like a mainframe and television was the engine of our idiocy. Some machine is always showing us Mind; some entertainment derived from the machine is always showing us Non-Mind. (via Tom Armitage)
  2. SWFScan — Windows-only Flash decompiler to find hardcoded credentials, keys, and URLs. (via Mauricio Freitas)
  3. Paranga — haptic interface for flipping through an ebook. (via Ben Bashford)
  4. Facebook Gives Politico Deep Access to Users Political Sentiments (All Things D) — Facebook will analyse all public and private updates that mention candidates and an exclusive partner will “use” the results. Remember, if you’re not paying for it then you’re the product and not the customer.
Comment: 1
Four short links: 12 January 2012

Four short links: 12 January 2012

Smart Meter Snitches, Company Culture, Text Classification, and Live Face Substitution

  1. Smart Hacking for Privacy — can mine smart power meter data (or even snoop it) to learn what’s on the TV. Wow. (You can also watch the talk). (via Rob Inskeep)
  2. Conditioning Company Culture (Bryce Roberts) — a short read but thought-provoking. It’s easy to create mindless mantras, but I’ve seen the technique that Bryce describes and (when done well) it’s highly effective.
  3. hydrat (Google Code) — a declarative framework for text classification tasks.
  4. Dynamic Face Substitution (FlowingData) — Kyle McDonald and Arturo Castro play around with a face tracker and color interpolation to replace their own faces, in real-time, with celebrities such as that of Brad Pitt and Paris Hilton. Awesome. And creepy. Amen.
Comment: 1
Four short links: 26 December 2011

Four short links: 26 December 2011

Text Analysis Bundle, Scala Probabilistic Modeling, Game Analytics, and Encouraging Writing

  1. Pattern — a BSD-licensed bundle of Python tools for data retrieval, text analysis, and data visualization. If you were going to get started with accessible data (Twitter, Google), the fundamentals of analysis (entity extraction, clustering), and some basic visualizations of graph relationships, you could do a lot worse than to start here.
  2. Factorie (Google Code) — Apache-licensed Scala library for a probabilistic modeling technique successfully applied to [...] named entity recognition, entity resolution, relation extraction, parsing, schema matching, ontology alignment, latent-variable generative models, including latent Dirichlet allocation. The state-of-the-art big data analysis tools are increasingly open source, presumably because the value lies in their application not in their existence. This is good news for everyone with a new application.
  3. Playtomic — analytics as a service for gaming companies to learn what players actually do in their games. There aren’t many fields untouched by analytics.
  4. Write or Die — iPad app for writers where, if you don’t keep writing, it begins to delete what you wrote earlier. Good for production to deadlines; reflective editing and deep thought not included.
Comment
Four short links: 23 December 2011

Four short links: 23 December 2011

Preview Colourblindness, Commandline Datamining, Open Source Indexing, and Javascript Time Series

  1. See the World as a Colour-Blind Person Would — filters that let you see images as protanopes, deuteranopes, and even tritanopes would see them. I am protanoptic (if that’s a word) and I can vouch that the “after” pix look the same as “before” to me. Care, because about 8% of men have some form of colourblindness and hate you and your “red is bad, green is good” visual cues. (via Flowing Data)
  2. Wafflesseeks to be the world’s most comprehensive collection of command-line tools for machine learning and data mining.
  3. LinkedIn Open Sources Index and Query Services — full-text index and retrieval engine, APIs, and a framework to manage indexes on infrastructure-as-a-service.
  4. Rickshawa JavaScript toolkit for creating interactive time series graphs.
Comment