Four short links: 22 June 2011

Four short links: 22 June 2011

DOM Snitch, Hadoop in Scala, Pregel in Hadoop in Scala, Reflections on the Company

  1. DOM Snitchan experimental Chrome extension that enables developers and testers to identify insecure practices commonly found in client-side code. See also the introductory post. (via Hacker News)
  2. Spark — Hadoop-alike in Scala. Spark was initially developed for two applications where keeping data in memory helps: iterative algorithms, which are common in machine learning, and interactive data mining. In both cases, Spark can outperform Hadoop by 30x. However, you can use Spark’s convenient API to for general data processing too. (via Hilary Mason)
  3. Bagelan implementation of the Pregel graph processing framework on Spark. (via Oliver Grisel)
  4. Week 315 (Matt Webb) — read this entire post. It will make you smarter. The company’s decisions aren’t actually the shareholders’ decisions. A company has a culture which is not the simple sum of the opinions of the people in it. A CEO can never be said to perform an action in the way that a human body can be said to perform an action, like picking an apple. A company is a weird, complex thing, and rather than attempt (uselessly) to reduce it to people within it, it makes more sense – to me – to approach it as an alien being and attempt to understand its biology and momentums only with reference to itself. Having done that, we can then use metaphors to attempt to explain its behaviour: we can say that it follows profit, or it takes an innovative step, or that it is middle-aged, or that it treats the environment badly, or that it takes risks. None of these statements is literally true, but they can be useful to have in mind when attempting to negotiate with these bizarre, massive creatures. If anyone wonders why I link heavily to BERG’s work, it’s because they have some incredibly thoughtful and creative people who are focused and productive, and it’s Webb’s laser-like genius that makes it possible. They’re doing a lot of subtle new things and it’s a delight and privilege to watch them grow and reflect.
Comments Off |
Four short links: 24 March 2010

Four short links: 24 March 2010

Digital Subscriptions, Graph Database, Data Science, and High Speed Compression

  1. Digital Subscription Prices — the NY Times in context. Aie.
  2. Trinity — Microsoft Research graph database. (via Hacker News)
  3. Data Science Toolkit — prepackaged EC2 image of most useful data tools. (via Pete Warden)
  4. Snappy — Google’s open sourced compression library, as used in BigTable and MapReduce. Emphasis is on speed, with resulting lack of quality in filesize (20-100% bigger than zlib).
Comments Off |
Strata Gems: Make beautiful graphs of your Twitter network

Strata Gems: Make beautiful graphs of your Twitter network

Use Gephi and Python to find your personal communities

Using a bit of Python and the Gephi graph tool, exploring your own Twitter network is a great way to learn about analyzing networks: and the results definitely have a "wow" factor.

Read Full Post | Comments: 7 |

Strata Gems: Explore and visualize graphs with Gephi

Powerful open source graph manipulation

A Photoshop for data, Gephi is a powerful tool for exploring and presenting data as a graph. It's easy to get started with sample data sets, then import your own by generating files in a standard graph format.

Read Full Post | Comment: 1 |
Four short links: 1 November 2010

Four short links: 1 November 2010

Crap Phones, HTML Editors, Digital Rights Minimization, and Data Munging

  1. The Most Popular Phone in the World (Gizmodo) — I have a mate who does prototyping R&D type stuff at a telco and this is his phone. “Why’d you carry a crap phone like that?” “Because this is the most popular phone with our customers.” The Gizmodo article talks about an upcoming Nokia that looks very promising: full keyboard, camera, et al. for under $100. (via Andrew Hedges on Twitter)
  2. Aloha Editor — very nice open source (AGPL3) HTML5 text editor widget for web apps. (via Jessy Cowan-Sharp on Twitter)
  3. How Do We Solve a Problem Like Geographic Restrictions — if you’re building a new business in the US around ebooks, digital music, or digital video, then be aware that your international uptake will be absolutely buggerized by rights issues. YouTube is the only US media site that doesn’t suck for overseas users: don’t rave to us about Hulu, it’s inaccessible to the rest of the world. (via Liza Daly on Twitter)
  4. Needlebase — tool with AI-type smarts to help you merge, munge, and export data. Check out Thread, the query language, for an interesting way of querying graphs. Was made by ITA Software, now owned by Google. Wonder what it’ll be wrapped into or released as …
Comments: 2 |