ENTRIES TAGGED "scraping"

Four short links: 13 September 2013

Four short links: 13 September 2013

Remote Work, Raspberry Pi Code Machine, Low-Latency Data Processing, and Probabilistic Table Parsing

  1. Fog Creek’s Remote Work PolicyIn the absence of new information, the assumption is that you’re producing. When you step outside the HQ work environment, you should flip that burden of proof. The burden is on you to show that you’re being productive. Is that because we don’t trust you? No. It’s because a few normal ways of staying involved (face time, informal chats, lunch) have been removed.
  2. Coder (GitHub) — a free, open source project that turns a Raspberry Pi into a simple platform that educators and parents can use to teach the basics of building for the web. New coders can craft small projects in HTML, CSS, and Javascript, right from the web browser.
  3. MillWheel (PDF) — a framework for building low-latency data-processing applications that is widely used at Google. Users specify a directed computation graph and application code for individual nodes, and the system manages persistent state and the continuous flow of records, all within the envelope of the framework’s fault-tolerance guarantees. From Google Research.
  4. Probabilistic Scraping of Plain Text Tablesthe method leverages topological understanding of tables, encodes it declaratively into a mixed integer/linear program, and integrates weak probabilistic signals to classify the whole table in one go (at sub second speeds). This method can be used for any kind of classification where you have strong logical constraints but noisy data.
Comment: 1 |
Scraping, cleaning, and selling big data

Scraping, cleaning, and selling big data

Infochimps execs discuss the challenges of data scraping.

In this interview with Infochimps CEO Nick Ducoff, CTO Flip Kromer, and business development manager Dick Hall, we take look at the technical and legal aspects of data scraping.

Read Full Post | Comments: 3 |
Four short links: 25 January 2011

Four short links: 25 January 2011

Scalable Scraping, iPad Tactility, Emotional Failbots, and Asking Good Questions

  1. node.io — distributed node.js-based scraper system.
  2. Joystick-It — adhesive joystick for the iPad. Compare the Fling analogue joystick. Tactile accessories for the iPad—hot new product category or futile attempt to make a stripped-down demi-computer into an aftermarked pimped-out hackomatic? (via Aza Raskin on Twitter)
  3. Programmed for Love (Chronicle of Higher Education) — Sherry Turkle sees the danger in social hardware emulating emotion. Companies will soon sell robots designed to baby-sit children, replace workers in nursing homes, and serve as companions for people with disabilities. All of which to Turkle is demeaning, “transgressive,” and damaging to our collective sense of humanity. It’s not that she’s against robots as helpers—building cars, vacuuming floors, and helping to bathe the sick are one thing. She’s concerned about robots that want to be buddies, implicitly promising an emotional connection they can never deliver. (via BoingBoing)
  4. Asking the Right Questions (Expert Labs) — Andy Baio compiled a list of how Q&A sites like StackOverflow, Quora, Yahoo! Answers, etc. steer people towards asking questions whose answers will improve the site (and away from flamage, chitchat, etc.). The secret sauce to social software is the invisible walls that steer people towards productive behaviour.
Comments Off |