ENTRIES TAGGED "data mining"

Four short links: 8 February 2012

Four short links: 8 February 2012

Text Mining, Unstoppable Sociality, Unicode Fun, and Scholarly Publishing

  1. Mavunoan open source, modular, scalable text mining toolkit built upon Hadoop. (Apache-licensed)
  2. Cow Clicker — Wired profile of Cowclicker creator Ian Bogost. I was impressed by Cow Clickers [...] have turned what was intended to be a vapid experience into a source of camaraderie and creativity. People create communities around social activities, even when they are antisocial. (via BoingBoing)
  3. Unicode Has a Pile of Poo Character (BoingBoing) — this is perfect.
  4. The Research Works Act and the Breakdown of Mutual Incomprehension (Cameron Neylon) — an excellent summary of how researchers and publishers view each other and their place in the world.
Comment |

Unstructured data is worth the effort when you've got the right tools

Alyona Medelyan and Anna Divoli on the opportunities in chaotic data.

Alyona Medelyan and Anna Divoli are inventing tools to help companies contend with vast quantities of fuzzy data. They discuss their work and what lies ahead for big data in this interview.

Read Full Post | Comment |
Four short links: 13 January 2012

Four short links: 13 January 2012

Internet in Culture, Flash Security Tool, Haptic E-Books, and Facebook Mining Private Updates

  1. How The Internet Gets Inside Us (The New Yorker) — at any given moment, our most complicated machine will be taken as a model of human intelligence, and whatever media kids favor will be identified as the cause of our stupidity. When there were automatic looms, the mind was like an automatic loom; and, since young people in the loom period liked novels, it was the cheap novel that was degrading our minds. When there were telephone exchanges, the mind was like a telephone exchange, and, in the same period, since the nickelodeon reigned, moving pictures were making us dumb. When mainframe computers arrived and television was what kids liked, the mind was like a mainframe and television was the engine of our idiocy. Some machine is always showing us Mind; some entertainment derived from the machine is always showing us Non-Mind. (via Tom Armitage)
  2. SWFScan — Windows-only Flash decompiler to find hardcoded credentials, keys, and URLs. (via Mauricio Freitas)
  3. Paranga — haptic interface for flipping through an ebook. (via Ben Bashford)
  4. Facebook Gives Politico Deep Access to Users Political Sentiments (All Things D) — Facebook will analyse all public and private updates that mention candidates and an exclusive partner will “use” the results. Remember, if you’re not paying for it then you’re the product and not the customer.
Comment: 1 |
Four short links: 12 January 2012

Four short links: 12 January 2012

Smart Meter Snitches, Company Culture, Text Classification, and Live Face Substitution

  1. Smart Hacking for Privacy — can mine smart power meter data (or even snoop it) to learn what’s on the TV. Wow. (You can also watch the talk). (via Rob Inskeep)
  2. Conditioning Company Culture (Bryce Roberts) — a short read but thought-provoking. It’s easy to create mindless mantras, but I’ve seen the technique that Bryce describes and (when done well) it’s highly effective.
  3. hydrat (Google Code) — a declarative framework for text classification tasks.
  4. Dynamic Face Substitution (FlowingData) — Kyle McDonald and Arturo Castro play around with a face tracker and color interpolation to replace their own faces, in real-time, with celebrities such as that of Brad Pitt and Paris Hilton. Awesome. And creepy. Amen.
Comment: 1 |
Four short links: 26 December 2011

Four short links: 26 December 2011

Text Analysis Bundle, Scala Probabilistic Modeling, Game Analytics, and Encouraging Writing

  1. Pattern — a BSD-licensed bundle of Python tools for data retrieval, text analysis, and data visualization. If you were going to get started with accessible data (Twitter, Google), the fundamentals of analysis (entity extraction, clustering), and some basic visualizations of graph relationships, you could do a lot worse than to start here.
  2. Factorie (Google Code) — Apache-licensed Scala library for a probabilistic modeling technique successfully applied to [...] named entity recognition, entity resolution, relation extraction, parsing, schema matching, ontology alignment, latent-variable generative models, including latent Dirichlet allocation. The state-of-the-art big data analysis tools are increasingly open source, presumably because the value lies in their application not in their existence. This is good news for everyone with a new application.
  3. Playtomic — analytics as a service for gaming companies to learn what players actually do in their games. There aren’t many fields untouched by analytics.
  4. Write or Die — iPad app for writers where, if you don’t keep writing, it begins to delete what you wrote earlier. Good for production to deadlines; reflective editing and deep thought not included.
Comment |
Four short links: 23 December 2011

Four short links: 23 December 2011

Preview Colourblindness, Commandline Datamining, Open Source Indexing, and Javascript Time Series

  1. See the World as a Colour-Blind Person Would — filters that let you see images as protanopes, deuteranopes, and even tritanopes would see them. I am protanoptic (if that’s a word) and I can vouch that the “after” pix look the same as “before” to me. Care, because about 8% of men have some form of colourblindness and hate you and your “red is bad, green is good” visual cues. (via Flowing Data)
  2. Wafflesseeks to be the world’s most comprehensive collection of command-line tools for machine learning and data mining.
  3. LinkedIn Open Sources Index and Query Services — full-text index and retrieval engine, APIs, and a framework to manage indexes on infrastructure-as-a-service.
  4. Rickshawa JavaScript toolkit for creating interactive time series graphs.
Comment |

Demoting Halder: A wild look at social tracking and sentiment analysis

You no longer have control over where a first impression occurs.

My short story, "Demoting Halder," was supposed to lay out an alternative reality where social tracking and sentiment analysis had taken over society. As the story evolved, I wondered if the reality in the story is something we're living right now.

Read Full Post | Comments: 2 |
If your data practices were made public, would you be nervous?

If your data practices were made public, would you be nervous?

Solon Barocas on data mining's reputation and the ethics of data collection.

Solon Barocas, a doctoral student at New York University, discusses consumer perceptions of data mining and how companies and data scientists can shape data mining's reputation.

Read Full Post | Comment |

Report from Open Source convention health track, 2011

OSCon shows that open source health care, although it hasn't broken into the mainstream yet, already inspires a passionate and highly competent community.

Read Full Post | Comments: 3 |
Four short links: 14 July 2011

Four short links: 14 July 2011

Microchip Archaeology, OSM Map Library, Feedback Loops for Public Expenditure, and Mind-reading Big Data

  1. Digging into Technology’s Past — stories of the amazing work behind the visual 6502 project and how they reconstructed and simulated the legendary 6502 chip. To analyze and then preserve the 6502, James treated it like the site of an excavation. First, he needed to expose the actual chip by removing its packaging of essentially “billiard-ball plastic.” He eroded the casing by squirting it with very hot, concentrated sulfuric acid. After cleaning the chip with an ultrasonic cleaner—much like what’s used for dentures or contact lenses—he could see its top layer.
  2. Leaflet — BSD-licensed lightweight Javascript library for interactive maps, using the Open Street Map.
  3. Too Many Public Works Built on Rosy Scenarios (Bloomberg) — a feedback loop with real data being built to improve accuracy estimating infrastructure project costs. He would like to see better incentives — punishment for errors, rewards for accuracy — combined with a requirement that forecasts not only consider the expected characteristics of the specific project but, once that calculation is made, adjust the estimate based on an “outside view,” reflecting the cost overruns of similar projects. That way, the “unexpected” problems that happen over and over again would be taken into consideration.
    Such scrutiny would, of course, make some projects look much less appealing — which is exactly what has happened in the U.K., where “reference-class forecasting” is now required. “The government stopped a number of projects dead in their tracks when they saw the forecasts,” Flyvbjerg says. “This had never happened before.”
  4. Neurovigil Gets Cash Injection To Read Your Mind (FastCompany) — “an anonymous American industrialist and technology visionary” put tens of millions into this company, which has hardware to gather mineable data. iBrain promises to open a huge pipeline of data with its powerful but simple brain-reading tech, which is gaining traction thanks to technological advances. But the other half of the potentailly lucrative equation is the ability to analyze the trove of data coming from iBrain. And that’s where NeuroVigil’s SPEARS algorithm enters the picture. Not only is the company simplifying collection of brain data with a device that can be relatively comfortably worn during all sorts of tasks–sleeping, driving, watching advertising–but the combination of iBrain and SPEARS multiplies the efficiency of data analysis. (via Vaughan Bell)
Comment |