ENTRIES TAGGED "data mining"

Demoting Halder: A wild look at social tracking and sentiment analysis

You no longer have control over where a first impression occurs.

My short story, "Demoting Halder," was supposed to lay out an alternative reality where social tracking and sentiment analysis had taken over society. As the story evolved, I wondered if the reality in the story is something we're living right now.

Comments: 2
If your data practices were made public, would you be nervous?

If your data practices were made public, would you be nervous?

Solon Barocas on data mining's reputation and the ethics of data collection.

Solon Barocas, a doctoral student at New York University, discusses consumer perceptions of data mining and how companies and data scientists can shape data mining's reputation.

Comment

Report from Open Source convention health track, 2011

OSCon shows that open source health care, although it hasn't broken into the mainstream yet, already inspires a passionate and highly competent community.

Comments: 3
Four short links: 14 July 2011

Four short links: 14 July 2011

Microchip Archaeology, OSM Map Library, Feedback Loops for Public Expenditure, and Mind-reading Big Data

  1. Digging into Technology’s Past — stories of the amazing work behind the visual 6502 project and how they reconstructed and simulated the legendary 6502 chip. To analyze and then preserve the 6502, James treated it like the site of an excavation. First, he needed to expose the actual chip by removing its packaging of essentially “billiard-ball plastic.” He eroded the casing by squirting it with very hot, concentrated sulfuric acid. After cleaning the chip with an ultrasonic cleaner—much like what’s used for dentures or contact lenses—he could see its top layer.
  2. Leaflet — BSD-licensed lightweight Javascript library for interactive maps, using the Open Street Map.
  3. Too Many Public Works Built on Rosy Scenarios (Bloomberg) — a feedback loop with real data being built to improve accuracy estimating infrastructure project costs. He would like to see better incentives — punishment for errors, rewards for accuracy — combined with a requirement that forecasts not only consider the expected characteristics of the specific project but, once that calculation is made, adjust the estimate based on an “outside view,” reflecting the cost overruns of similar projects. That way, the “unexpected” problems that happen over and over again would be taken into consideration.
    Such scrutiny would, of course, make some projects look much less appealing — which is exactly what has happened in the U.K., where “reference-class forecasting” is now required. “The government stopped a number of projects dead in their tracks when they saw the forecasts,” Flyvbjerg says. “This had never happened before.”
  4. Neurovigil Gets Cash Injection To Read Your Mind (FastCompany) — “an anonymous American industrialist and technology visionary” put tens of millions into this company, which has hardware to gather mineable data. iBrain promises to open a huge pipeline of data with its powerful but simple brain-reading tech, which is gaining traction thanks to technological advances. But the other half of the potentailly lucrative equation is the ability to analyze the trove of data coming from iBrain. And that’s where NeuroVigil’s SPEARS algorithm enters the picture. Not only is the company simplifying collection of brain data with a device that can be relatively comfortably worn during all sorts of tasks–sleeping, driving, watching advertising–but the combination of iBrain and SPEARS multiplies the efficiency of data analysis. (via Vaughan Bell)
Comment
Four short links: 13 June 2011

Four short links: 13 June 2011

Remote Fingerprint Scans, Playdough Circuits, Update-Sync, and Tweet Failage

  1. AIRPrint — prototype box scans a fingerprint from six feet away. (via Greg Linden)
  2. Squishy Circuits — teaching electronic circuits with conductive and insulating playdough. (via Hacker News)
  3. GraphLab — alternative take on Map-Reduce, called Update-Sync, where tasks run on connected sets of nodes rather than on one node at a time.
  4. Tower Bridge Closed — the @towerbridge account was a cute hack from Tom Armitage, whereby the public site for the London Tower Bridge was scraped and connected to Twitter, so you would see tweets like “I am closing after the MV Dixie has passed Upstream” and get a feel for the ambient activity in your city. Twitter turned over @towerbridge to the most tediously vomit-in-your-own-mouth-they’re-so-anodyne beige corporate tweets ever (account description: “Leading tourist attraction situated inside Tower Bridge”, sample tweet: “Looking for something to do it the City this weekend, check out http://www.visitthecity.co.uk/ and you’re always welcome at @TowerBridge”) and deleted the past history of tweets. Way to embrace the community of engaged passionate fans, guys! Welcome to Twitter, try not to step in your social media strategy as you cross the threshold–oh no, too late.
Comments Off
Four short links: 3 June 2011

Four short links: 3 June 2011

Distributed Drug Money, Science Game, Beautiful Machine Learning, and Stream Event Processing

  1. Silk Road (Gawker) — Tor-delivered “web” site that is like an eBay for drugs, currency is Bitcoins. Jeff Garzik, a member of the Bitcoin core development team, says in an email that bitcoin is not as anonymous as the denizens of Silk Road would like to believe. He explains that because all Bitcoin transactions are recorded in a public log, though the identities of all the parties are anonymous, law enforcement could use sophisticated network analysis techniques to parse the transaction flow and track down individual Bitcoin users. “Attempting major illicit transactions with bitcoin, given existing statistical analysis techniques deployed in the field by law enforcement, is pretty damned dumb,” he says. The site is viewable here, and here’s a discussion of delivering hidden web sites with Tor. (via Nelson Minar)
  2. Dr Waller — a big game using DC Comics characters where players end up crowdsourcing science on GalaxyZoo. A nice variant on the captcha/ESP-style game that Luis von Ahn is known for. (via BoingBoing)
  3. Machine Learning Demos — hypnotically beautiful. Code for download.
  4. Esper — stream event processing engine, GPLv2-licensed Java. (via Stream Event Processing with Esper and Edd Dumbill)
Comments Off
Four short links: 24 May 2011

Four short links: 24 May 2011

Kindle List, Insider Knowledge, Google News Archive Archived, and Work Week in Video

  1. Delivereads — genius idea, a mailing list for Kindles. Yes, if you can send email then you can be a Kindle publisher. (via Sacha Judd)
  2. Abnormal Returns From the Common Stock Investments of Members of the U.S. House of RepresentativesWe measure abnormal returns for more than 16,000 common stock transactions made by approximately 300 House delegates from 1985 to 2001. Consistent with the study of Senatorial trading activity, we find stocks purchased by Representatives also earn significant positive abnormal returns (albeit considerably smaller returns). A portfolio that mimics the purchases of House Members beats the market by 55 basis points per month (approximately 6% annually). (via Ellen Miller)
  3. Google News Archive Ends — hypothesizes that old material was “too hard” to make sense of, but that seems unlikely to me. More likely is that it wasn’t useful enough to their machine learning efforts. Newspapers can have their scanned/OCRed content for free now the program is being closed.
  4. Week Report 310 — BERG’s first (that I’ve seen) video report of the week, and it’s a cracker. No newsreel, just some really clever evocation of the mood of the place and the nature of the projects. I continue to be impressed by the BERG crew’s conscious creation of culture.
Comments Off
Strata Week: Overcharging algorithms

Strata Week: Overcharging algorithms

Algorithms go awry on Amazon, the future of Hadoop at Yahoo, and the Supreme Court mulls data mining

In this Strata Week: Algorithm pricing on Amazon pushes the price of a biology book to astronomical levels, Yahoo weighs the future of Hadoop, and the Supreme Court hears arguments about a Vermont law restricting the data mining of prescription records.

Comment: 1
Four short links: 25 February 2011

Four short links: 25 February 2011

Banshee Bucks, Log Mining, Visualization Secrets, and Repression Tools

  1. Canonical’s New Plan for Banshee — Canonical prepare the Linux distribution Ubuntu. They will distribute the popular iTunes-alike Banshee, but instead of the standard Amazon store plugin (which generates much $ in affiliate revenue for the GNOME Foundation) they will have Canonical’s own Amazon store plugin and keep 75% of the revenue (25% going to the GNOME Foundation). They’re legally within their rights, and it underscores for me how the goal of providing freedom from control is incompatible with a goal of making money. Free and open source software gives self-destination with software, and that includes the right to replace your money pump with theirs.
  2. Oluoluan open source query log mining tool which works on Hadoop. This tool provides resources to add new features to search engines. Concretely Oluolu supports automatic dictionary creation such as spelling correction, context queries or frequent query n-grams from query log data. The dictionaries are applied to search engines to add features such as ‘did you mean’ or ‘related keyword suggestion’ service in search engines. (via Matt Biddulph on Delicious)
  3. Information is Beautiful Process (David McCandless) — David’s process for creating his beautiful and moving visualizations.
  4. Facebook for Repressive RegimesThe purpose of this blog post is not to help repressive regimes use Facebook better, but rather to warn activists about the risks they face when using Facebook. (via Justine Sanderson on Delicious)
Comment: 1
Four short links: 26 January 2011

Four short links: 26 January 2011

Identifying Communities, Web Principles, Wiring Library, and Instapaper Interview

  1. Find Communities — algorithm for uncovering communities in networks of millions of nodes, for producing identifiable subgroups as in LinkedIn InMaps. (via Matt Biddulph’s Delicious links)
  2. Seven Ways to Think Like The Web (Jon Udell) — seven principles that will head off a lot of mistakes. They should be seared into the minds of anyone working in the web. 2. Pass by reference rather than by value. [pass URLs, not copies of data] [...] Why? Nobody else cares about your data as much as you do. If other people and other systems source your data from a canonical URL that you advertise and control, then they will always get data that’s as timely and accurate as you care to make it.
  3. Wire Itan open-source javascript library to create web wirable interfaces for dataflow applications, visual programming languages, graphical modeling, or graph editors. (via Pete Warden)
  4. Interview with Marco Arment (Rands in Repose) — Most people assume that online readers primarily view a small number of big-name sites. Nearly everyone who guesses at Instapaper’s top-saved-domain list and its proportions is wrong. The most-saved site is usually The New York Times, The Guardian, or another major traditional newspaper. But it’s only about 2% of all saved articles. The top 10 saved domains are only about 11% of saved articles. (via Courtney Johnston’s Instapaper Feed)
Comments Off