Restructuring the Web with Git

Can version control manage content?

Web designers? Git? Github? Aren’t those for programmers? At Artifact, Christopher Schmitt showed designers how much their peers are already doing with Github, and what more they can do. Github (and the underlying Git toolset) changes the way that all kinds of people work together.

Sharing with Git

As amazing as Linux may be, I keep thinking that Git may prove to be Linus Torvalds’ most important contribution to computing. Most people think of it, if they think of it at all, as a tool for managing source code. It can do far more, though, providing a drastically different (and I think better) set of tools for managing distributed projects, especially those that use text.

Git tackles an unwieldy problem, managing the loosely structured documents that humans produce. Text files are incredibly flexible, letting us store everything from random notes to code of all kinds to tightly structured data. As awesome as text files are—readable, searchable, relatively easy to process—they tend to become a mess when there’s a big pile of them.

A sketch based on Schmitt’s talk on the command line and Git by Ben Norris @bsndesign, used by permission.

Version control helps sort out that mess, keeping track of who’s done what to which file. Old-school version control had check-in and check-out approaches, avoiding damage by limiting the number of people with access to files. Over time, that became an unnecessary constraint, and the power of diff—seeing what actually changed – became more obvious.

Git takes that logic further, creating a decentralized system that lets people work in loose networks. At first, that seemed like something only a few hardcore programmers working on complex systems—like the Linux kernel, the original driver for this—needed. Schmitt emphasized git branches. There’s still a master, canonical branch, but you don’t have to disturb it. You can create your own branches, check them out, and make them interact with other people’s branches through merges and pull requests.

But, but… conflict! Doesn’t it take programmers to deal with conflicts? No. It just takes someone who knows how to evaluate the files that had conflicting changes. If it’s code, yes, you need a programmer. If it’s HTML or CSS, you may need a web developer instead. If it’s text for a book, you may need an author or editor.

Where’s the database? Where’s the CMS?

Git has its own data structures underneath, but they don’t resemble the databases or even XML files I’m used to. Git doesn’t expect you to understand its structures, though—it just expects you to check material in that has your structures in it. It’ll deal with keeping track of what changed, when, and who changed it.

Unlike most databases, updates to existing information aren’t destructive. Git tracks the history. You may not want to go back, or need to go back, but you can go back again. You can even go back to someone else’s version.

Because documents don’t neatly fit in most databases, the whole field of Content Management Systems (CMS) has grown to bridge expectations. Simple ones are pretty much text blobs plus metadata, while more sophisticated ones manage other assets, workflows, relationships among parts of documents, and more. The level of detail can be impressive, but can also be constraining. Once we teach computers to demand things, it’s hard to change expectations.

For better or worse, Git is much less demanding than a CMS. It has no schemas built in. Its permission structures are less byzantine. That can be a challenge if you need to add all of that control to fit your organization, but it can also be immensely freeing.

Web code management

HTML and CSS may not be the code types Git grew up with, but they fit pretty neatly into its collections of related code. Even ‘ordinary’ web sites are seeming more and more like applications, so that kind of fits.

Most of the sites I’ve worked on, though, have a separation between the code that provides the structure of the site and the content that fills it. Structure is a set of files, whether HTML and CSS, PHP files, or a Rails application. Content lives elsewhere, probably in a database. Git is a great match for the files for the structure, but people using these approaches tend to make sure that their databases are not in the version control.

I suspect that most of the designers Schmitt was addressing today will be using Git and GitHub to manage the parts of their content that feel like code and maybe the related assets. That’s a very reasonable place to start.

Blurring the code/content barrier

Maybe I spent too long in XML, but having code and content in similar-looking files feels completely ordinary. After spending years in relational databases, I’m happy to wander away from them for pretty much all of my document-oriented use cases. My blogs are really collections of documents with simple associated metadata. My books are collections of chapters with a bit more metadata that describes how the pieces go together.

Can we shift to a much deeper Git integration? Can we, even, put all of our content into GitHub to take advantage of its many social tools that extend Git’s solid foundation of information sharing?


GitHub itself offers Jekyll, a toolset for building blogs from static files. Sure enough, you can use it with GitHub Pages to create sites built on GitHub. This is an open door that we’ve just started exploring.

O’Reilly—the place where I work—is using Git as a foundation for publishing books.You can see some of the results, but there are many more books out there built with the technology. It lets us manage multi-author projects where the authors are going in and out of each other’s chapters, and lets authors keep content on multiple computers without worries about how it will all come together.

It’s early yet. I don’t think it’s time for you to throw away your databases or your CMS, unless you want to. It is time, though—whether you’re a programmer, a designer, a content creator, a manager, or all of the above—to ask yourself how you want to store your information. The Git Way may prove to be your way, even if you aren’t a Linux kernel hacker.

tags: , , , ,

Get the O’Reilly Data Newsletter

Stay informed. Receive weekly insight from industry insiders.

Get the O’Reilly Web Ops and Performance Newsletter

Weekly insight from industry insiders. Plus exclusive content and offers.

Get the O’Reilly Programming Newsletter

Weekly insight from industry insiders. Plus exclusive content and offers.

Get the O’Reilly Hardware Newsletter

Get weekly insight and knowledge on how to design, prototype, manufacture, and market great connected devices.

Get Four Short Links in Your Inbox

Sign up to receive Nat’s eclectic collection of curated links every weekday.

Get the O’Reilly Design Newsletter

Stay informed. Receive weekly insight from industry insiders.

Get the O’Reilly Web Platform Newsletter

Stay informed. Receive weekly insight from industry insiders—plus exclusive content and offers.