Mining the social web, again

If you want to engage with the data that's surrounding you, Mining the Social Web is the best place to start.

When we first published Mining the Social Web, I thought it was one of the most important books I worked on that year. Now that we’re publishing a second edition (which I didn’t work on), I find that I agree with myself. With this new edition, Mining the Social Web is more important than ever.

While we’re seeing more and more cynicism about the value of data, and particularly “big data,” that cynicism isn’t shared by most people who actually work with data. Data has undoubtedly been overhyped and oversold, but the best way to arm yourself against the hype machine is to start working with data yourself, to find out what you can and can’t learn. And there’s no shortage of data around. Everything we do leaves a cloud of data behind it: Twitter, Facebook, Google+ — to say nothing of the thousands of other social sites out there, such as Pinterest, Yelp, Foursquare, you name it. Google is doing a great job of mining your data for value. Why shouldn’t you?

There are few better ways to learn about mining social data than by starting with Twitter; Twitter is really a ready-made laboratory for the new data scientist. And this book is without a doubt the best and most thorough approach to mining Twitter data out there. But that’s only a starting point. We hear a lot in the press about sentiment analysis and mining unstructured text data; this book shows you how to do it. If you need to mine the data in web pages or email archives, this book shows you how. And if you want to understand how to people collaborate on projects, Mining the Social Web is the only place I’ve seen that analyzes GitHub data.

All of the examples in the book are available on Github. In addition to the example code, which is bundled into IPython notebooks, Matthew has provided a VirtualBox VM that installs Python, all the libraries you need to run the examples, the examples themselves, and an IPython server. Checking out the examples is as simple as installing Virtual Box, installing Vagrant, cloning the 2nd edition’s Github archive, and typing “vagrant up.” (This quick start guide summarizes all of that.) You can execute the examples for yourself in the virtual machine; modify them; and use the virtual machine for your own projects, since it’s a fully functional Linux system with Python, Java, MongoDB, and other necessities pre-installed. You can view this as a book with accompanying examples in a particularly nice package, or you can view the book as “premium support” for an open source project that consists of the examples and the VM.

If you want to engage with the data that’s surrounding you, Mining the Social Web is the best place to start. Use it to learn, to experiment, and to build your own data projects.

tags: , , , , ,

Get the O’Reilly Data Newsletter

Stay informed. Receive weekly insight from industry insiders.

  • I don’t understand why you need a server to view a notebook or code snippets? Many use this web/server based IPython, why? What does it do that Komodo IDE doesn’t?

    • The beauty of IPython Notebook is that it’s great for hackers.

      It fills a sweet spot between being a REPL (interpreter), an IDE, and a fancy rich text editor. It’s perhaps not a tool that you’d use for all of your Python development, but it’s wonderful for data science experiments, hacking, and pedagogy.

      It might seem a little bit lightweight at first for a bona fide software developer, but perhaps that’s the point. I wouldn’t look at it so much as a replacement for an IDE like Komodo or the PyDev plugin for Eclipse. Instead, I’d look at it as a much more accessible interpreter that is great for hacking and collaboration.

      And BTW, you actually don’t need a server to *view* an IPython Notebook, but you do need one in order to interactively work on the code in a notebook. For example, you can render the notebook as HTML (amongst other formats such as PDF and .py files) and serve it up, which can be nice for collaboration or portability. e.g. all of the numbered examples from Mining the Social Web are anchor-linked here in a way that’s convenient to navigate and easy on the eyes: