Open Source

The open source paradigm shift transformed how software is developed and deployed. First widely recognized when the disruptive force of Linux changed the game, open source software leverages the power of network effects, enlightened self-interest, and the architecture of participation. Today, the impact of open source on technology development continues to grow, and O'Reilly Radar tracks the key players and projects. O'Reilly has been part of the open source community since the beginning--we convened the 1998 Summit at which the visionary developers who invented key free software languages and tools used to build the Internet infrastructure agreed that "open source" was the right term to describe their licenses and collaborative development process.

Fri

Nov 20
2009

Carl Malamud

Robots.Txt and the .Gov TLD

by Carl Malamud@CarlMalamudcomments: 7

I'm on the board of CommonCrawl.Org, a nonprofit corporation that is attempting to provide a web crawl for use by all. An interesting report just got sent to us about the use of robots.txt files within the .Gov Top Level Domain, a standard known as the Robots Exclusion Standard.

In examining about 32,000 subdomains in .gov, it turns at least 1,188 of these have a robots.txt file with a "global disallow," meaning robots are excluded from indexing this content. Even more curious, on 175 of these sites, while there is a global disallow, there is a specific bypass that allows the Googlebot to index the data. You can look at the raw data on Factual.

At Public.Resource.Org, we've always felt that the use of a robots.txt file by the government should only be used for purposes of security and integrity of the site, not because some webmaster arbitrarily decides they don't want to be indexed. Indeed, on several occasions we have deliberately ignored government imposed robots.txt files because we felt this was an arbitrary and illegal attempt to keep the public out.

And, needless to say, it doesn't make any sense at all to let in some webcrawlers and not let in others. If this is a reaction to a security/integrity issue, such as limited capacity, the proper thing to do is include in the robots.txt file a comment that can be used by other bots to explain what is going on. For example, it could be perfectly reasonable for a government group faced with limited capacity to ask a robot to limit crawls to a certain number of queries per second and only whitelist crawlers that agree to that condition.

Government webmasters should use the robots.txt file sparingly, and should do so in a non-discriminatory fashion.

tags: gov2.0, open source, searchcomments: 7
submit: Reddit Digg stumbleupon   

 

Fri

Nov 13
2009

Nat Torkington

Four short links: 13 November 2009

Open Source Design, Interesting NoSQL Use, Copyright Documentary, Location Intelligence

by Nat Torkington@gnatcomments: 1

  1. Open Source Enters The World of Atoms -- an academic statistical analysis of open design. We indicated that, in open design communities, tangible objects can be developed in very similar fashion to software; one could even say that people treat a design as source code to a physical object and change the object via changing the source.
  2. Why I Like Redis (Simon Willison) -- coherent explanation of why Simon likes and uses a particular nosql system. I can run a long running batch job in one Python interpreter (say loading a few million lines of CSV in to a Redis key/value lookup table) and run another interpreter to play with the data that’s already been collected, even as the first process is streaming data in. I can quit and restart my interpreters without losing any data. And because Redis semantics map closely to Python native data types, I don’t have to think for more than a few seconds about how I’m going to represent my data.
  3. © kiwiright (Vimeo) -- short documentary about copyright, made to raise awareness of the issues in New Zealand. (just as applicable to the rest of the world)
  4. Your Movements Speak For Themselves (Jeff Jonas) -- Mobile devices in America are generating something like 600 billion geo-spatially tagged transactions per day. Every call, text message, email and data transfer handled by your mobile device creates a transaction with your space-time coordinate (to roughly 60 meters accuracy if there are three cell towers in range), whether you have GPS or not. Got a Blackberry? Every few minutes, it sends a heartbeat, creating a transaction whether you are using the phone or not. If the device is GPS-enabled and you’re using a location-based service your location is accurate to somewhere between 10 and 30 meters. Using Wi-Fi? It is accurate below10 meters. A thought-provoking roundup of the information leakage with modern locative systems. (via TomC on Twitter)

tags: collective intelligence, copyright, data mining, design, geo, location, nosql, open sourcecomments: 1
submit: Reddit Digg stumbleupon   

 

Thu

Nov 12
2009

Nat Torkington

Four short links: 12 November 2009

CRM on Rails, Data Mining on Hadoop, Disappointing Keynotes, The Teapot Effect

by Nat Torkington@gnatcomments: 1

  1. Fat Free CRM -- open source (Affero GPL) Ruby on Rails CRM system.
  2. Bixo -- open source data mining toolkit that runs as a series of pipes on top of Hadoop. Built on Cascading workflow system for Hadoop that hides MapReduce. (via kdnuggets)
  3. Andy Kessler's Keynote at Defrag Stank (Pete Warden) -- I'm sorry to hear it, because I loved Andy's book How We Got Here about the intersecting histories of economics, finance, and technology. Read the book instead of reading about the disappointing keynote.
  4. The Teapot Effect -- the thing I love about geeks is how their passion causes them to explore, ruthlessly and quantitatively, the everyday phenomena that the rest of us take for granted. Such as dribbling teapots: “Previous studies have shown that dribbling is the result of flow separation where the layer of fluid closest to the boundary becomes detached from it. When that happens, the fluid flows smoothly over the lip. But as the flow rate decreases, the boundary layer re-attaches to the surface causing dribbling.” Read the post and the research it talks about to learn how to prevent Dribbling Teapot Syndrome ....

tags: CRM, data mining, economics, finance, hadoop, history, open source, rails, research, sciencecomments: 1
submit: Reddit Digg stumbleupon   

 

Wed

Nov 11
2009

Nat Torkington

Four short links: 11 November 2009

Participation Tools, Open Data Requests, Go Programming Language, Why Open Source is Better

by Nat Torkington@gnatcomments: 0

  1. ParticipateDB -- database of online tools for public participation. Closed alpha now, with 32 tools and 15 projects in the database. (via Sara Winge)
  2. DataTO -- like data.gov, but it's where users request data sets. (In this case, from the Toronto municipal government)
  3. Go -- new language from Bell Labs and Unix central figures Rob Pike and Ken Thompson, who now work at Google. Bits of C, bits of Google, it compiles to native binaries and runs nearly as fast as C. Built with concurrency and memory management as central figures. Not used in production at Google yet, but grew from a 20% project to something worthy of public release.
  4. On Commit Bits (Jacob Kaplan-Moss) -- that day-one-commit-bit is one of the starkest differences between the corporate and the open source development model. [...] Granted, Django’s very conservative when it comes to granting that commit bit, but I’m not aware of a single open source project under the sun that’d give out a commit bit on a contributor’s first day. I’ve seen developers who’ve been hired to work full time on open source work for months without commit access to the project they’re paid to develop! One of several posts that Jacob's made about why open source makes for (on average) better software.

tags: gov2.0, language, multicore, open data, open source, programming, social softwarecomments: 0
submit: Reddit Digg stumbleupon   

 

Sun

Nov 8
2009

Carl Malamud

Unlikely Group Working Happily Together To Solve Patent Problem

by Carl Malamud@CarlMalamudcomments: 4

People following the issue of open sourcing the U.S. Patent Database might have been surprised to read an announcement in the official business opportunities web site of the U.S. Government: Synopsis for Public Data Dissemination Sole Source Contract to Google, Inc.

While the first reaction of many might be "OMG, WTF, how could they," this is actually good news, with an unlikely cast of characters working together including Google, Intellectual Ventures, and the Internet Archive.

In September, the Patent Office announced a rather strange "Request for Information" (RFI). Under this proposed scheme, the Patent Office would receive a substantial (upwards of $10 million!) donation of equipment from a vendor. In return, the vendor would get to be the official distributor of the patent database to the public, and would get to sell "value-added products." Among other things, the vendor would get access to the patents before the public does, allowing them to mine the database, and would be allowed to sell a variety of bulk products.

While the RFI makes a nod to public access, like all these Zero-Dollar deals the government cuts, there would be a lot of limits on what is "public" data as the vendor tries to recoup their investment by selling the so-called "value-added" products. Readers may remember a similar fiasco with the General Accountability Office where the Federal Legislative Histories were given away to Thomson West and now even the U.S. Congress has to pay to access this material.

The patent database is no ordinary database. This is the only database specifically called out in the U.S. Constitution as being the responsibility of the U.S. Executive Branch to run!  A lot of people think this Zero-Dollar deal the Patent Office is contemplating kind of stinks, and I'm really pleased to announce that a broad coalition has come together to make this data more broadly available immediately:

  • Intellectual Ventures, the IP group founded by Nathan Myhrvold, is donating several terabytes of the back file to Public.Resource.Org, the Internet Archive, and a variety of other groups to make available to everybody.
  • Google asked for permission to crawl the public application system (known as "PAIR"). The announcement by the Patent Office of a "sole source contract to Google" was the government's way of saying we have permission to crawl their system and bypass the CAPTCHAs. This is good news, because the PAIR system contains the "binders," which is all the material that supplements the basic applications and grants.
  • The Internet Archive has set aside a boatload of disk drives to serve this data. In addition, Public.Resource.Org will provide the usual rsync and FTP, and we expect a variety of other groups to provide mirrors both for bulk access and end-user systems.

It goes without saying that Google, the Internet Archive, and Intellectual Ventures are 3 groups that don't often work together, and I think this illustrates the compelling public interest in making the patent database more broadly available. We announced this Section 8 Task Force in a letter to Congressman Mike Honda. And, we also sent in a FOIA request to the Patent Office, putting them on notice that we expect any responses to their RFI $0 boondoggle to be made available to the public, as required by law.

In the long-term, Patent Office just needs to fix their system instead of resorting to silly $0 deals. They have 600 staff in Information Technology and spend hundreds of millions of dollars. Surely, they can find a way to serve the public as part of that? Putting a lien on the Patent database in return for $10 million in hardware instead of fixing their 70's-era mainframes just doesn't make sense.

In the meantime, we should have the first 8 terabytes of data up pretty soon. Those interested in learning more about the issue are urged to consult the paper trail on our PTO page which includes letters to and from Congress, and pointers to the Patent Office procurement docs.

tags: gov2.0, open data, open sourcecomments: 4
submit: Reddit Digg stumbleupon   

 

Wed

Nov 4
2009

Nat Torkington

Four short links: 4 November 2009

Electronics Hacking FAQs, Speech-To-Text Democracy, Open Source Column Database, Massive Online Analysis

by Nat Torkington@gnatcomments: 1

  1. ChipHacker -- collaborative FAQ site for electronics hacking. Based on the same StackExchange software as RedMonk's FOSS FAQ for open source software.
  2. Democracy Live -- BBC launch searchable coverage of parliamentary discussion, using speech-to-text. One aspect we're particularly proud of is that we've managed to deliver good results for speech-to-text in Welsh, which, we're told, is unique. I think of this as the start of a They Work For You for video coverage. I'd love to be able to scale this to local government coverage, which is disappearing as local newspapers turn into delivery mechanisms for real estate advertisements.
  3. InfiniDB: Open Source Column Database -- hooks into MySQL, uses MySQL for SQL parsing, security, etc. The commercial enterprise version has multi-server support (parallel scale-out). (via Brian Aker)
  4. Massive Online Analysis -- MOA is a framework for data stream mining. Includes tools for evaluation and a collection of machine learning algorithms. Related to the WEKA project, also written in Java, while scaling to more demanding problems. . (via joshua on Delicious)

tags: big data, collective intelligence, databases, democracy, gov2.0, hardware, maker, open sourcecomments: 1
submit: Reddit Digg stumbleupon   

 

Tue

Oct 27
2009

Nat Torkington

Four short links: 27 October 2009

Digital Art Programming, DIY Construction Set, Open Source Pedant, Design Principles

by Nat Torkington@gnatcomments: 1

  1. Field -- a development environment for "experimental code" and digital art. We think that, for many uses, Field is a better Processing than Processing. Includes Python and Java bridges, goal is to connect to as many different programming systems as possible. OS X only at the moment.
  2. Contraptor -- a DIY open source construction set for experimental personal fabrication, desktop manufacturing, prototyping and bootstrapping. (via Hacker News)
  3. After The Deadline -- open source contextual spelling and grammar checker. (via Hacker News)
  4. Design Principles to Choose the Right Ideas -- Often people ask me how we know which ideas to choose from all the hundreds of ideas we’ve generated during brainstorm sessions. Apart from our gut feelings and experience there’s a method that could help us decide: define design principles. Interesting for the different sets of design principles used by Google and Microsoft teams. (via egoodman on Delicious)

tags: art, design, diy, hardware, language, open source, processing, programmingcomments: 1
submit: Reddit Digg stumbleupon   

 

Mon

Oct 26
2009

Nat Torkington

Four short links: 26 October 2009

Data Exploration, Evidence-Based Coding, API to the English Language, Dual Licensing

by Nat Torkington@gnatcomments: 4

  1. Toiling in the Data Mines -- Tom Armitage describes the process that Berg calls "material exploration". Programmers very rarely talk about what their work feels like to do, and that's a shame. Material explorations are something I've really only done since I've joined BERG, and both times have felt very similar - in that they were very, very different to writing production code for an understood product. They demand code to be used as a sculpting tool, rather than as an engineering material, and I wanted to explain the knock-on effects of that: not just in terms of what I do, and the kind of code that's appropriate for that, but also in terms of how I feel as I work on these explorations. Even if the section on the code itself feels foreign, I hope that the explanation of what it feels like is understandable.
  2. Bits of Evidence -- Slides for a talk, "What we actually know about software development and why we believe it is true". (via Simon Willison)
  3. Wordnik API -- definitions, frequencies, examples APIs. See the announcement from the Web 2.0 Summit.
  4. The Peculiar Institution of Dual Licensing -- Brian Aker eloquently describes why he feels that dual licensing is anti-open source. Brian obviously has considerable experience informing this opinion--his years as Director of Technology for MySQL.

tags: apis, business, data mining, language, mysql, open source, programming, sciencecomments: 4
submit: Reddit Digg stumbleupon   

 

Fri

Oct 23
2009

Nat Torkington

Four short links: 23 October 2009

Beautiful Information, Teen Game Designer, Creative Science Writing, Open Source Schools

by Nat Torkington@gnatcomments: 0

  1. Information is Beautiful -- gorgeous descriptions of the design of infographics. For once, a design discussion that might be useful to mere mortals like me.
  2. Australian Teen Crafts "Sneaky" Games -- video interview with a 16 year-old winner of the IFTF, Sun, and BoingBoing Digital Open. Great to see game design, a topic we've followed on Radar, getting uptake by the people about to enter the workforce. "I love index cards," says Harry, "And I was thinking -- hmm, how can I incorporate them into a project?" So he designed and printed these game cards, and "spread the seeds of sneakiness and espionage" into the unsuspecting pockets, math books, binders and bags and jackets of his schoolmates. (via BoingBoing)
  3. Science Writing Shortlist -- the Manhire Prize is New Zealand's most prestigious award for creative science writing. The shortlisted entries are available via this link, and make for enlightening reading. Interestingly, there are two prizes awarded: one for fiction and another for non-fiction; New Zealand has a tradition of encouraging interaction between the arts and sciences.
  4. Fedena -- an open source school management system, built in India, using Ruby on Rails. (via Brenda Wallace)

tags: design, education, games, open source, science, visualizationcomments: 0
submit: Reddit Digg stumbleupon   

 

Wed

Oct 7
2009

Nat Torkington

Four short links: 7 October 2009

Ongoing Palm Fail, YouTube Numbers, Plugin Patent Pain, Bivalve-Oriented Architecture

by Nat Torkington@gnatcomments: 1

  1. Followup to jwz's Palm App Store Fiasco -- redux: still nothing concrete from Palm, but they're saying they'll create a second-rate app store into which open source apps will go (along with apps that Palm hasn't reviewed).
  2. Schmidt on YouTube -- the interesting bit for me was Every minute, more than 10 hours of video is uploaded to the site.
  3. Company that won $585M from Microsoft sues Apple, Google - The infamous '906 patent granted to Eolas and the University of California was one of the first patents to get the young online tech scene going in 1998. The patent addresses third-party browser plug-ins to run various forms of media as an "embedded program object"—essentially a program that runs within another program. Eolas promptly sued Microsoft for its implementation of ActiveX in Internet Explorer, which set in motion a years-long legal battle between the two companies. and won $585M, now they're suing many large Internet companies. (via Hacker News)
  4. IBM Uses Mussels as Sensor Network -- Concerned with the environmental and revenue impacts of leaks during oil drilling, StatOil sought an innovative and automated way to detect leaks. They wanted to replace a manual process that included deep sea drivers. StatOil’s innovation, they attached RFID tags to the shells of blue mussels. When the blue mussels sense an oil leak, they close which prompts the RFID tags to emit closure events. In response to the events, the drilling line is automatically stopped. And, in case you are wondering, this is of no harm to the blue mussels. (via monkchips on Twitter)

tags: app store, google, open source, palm, patent, sensor networks, web, youtubecomments: 1
submit: Reddit Digg stumbleupon   

 

Tue

Oct 6
2009

Carl Malamud

Questions (and Answers!) About the Federal Register

by Carl Malamud@CarlMalamudcomments: 3

When the White House retweets Cory Doctorow, you know something unusual has happened. As many of you saw, the Office of the Federal Register announced that source code for the Federal Register is now available in bulk—for free—and has been converted to XML. Ed Felten's shop at Princeton created a site called fedthread.org to see what you can do with the data and Public.Resource.Org helped the Government Printing Office in testing early stages of the XML work.

All-in-all, a nice piece of public-private cooperation and an important step towards open source America's operating system, and I figured that was the end of that. So, imagine my surprise when I got a call from the White House saying they were making Raymond Mosley, Director of the Office of the Federal Register (OFR) and Michael L. Wash, the Chief Information Officer of the Government Printing Office (GPO) available just in case there were any technical questions from the net.

I gathered questions from a variety of sources, including on-line discussion groups and twitter, and have been doing email back and forth with both Ray and Mike. Hope this is useful (it certainly has been fun to do)!

(continue reading)

tags: gov20, open government, open sourcecomments: 3
submit: Reddit Digg stumbleupon   

 

Mon

Oct 5
2009

Nat Torkington

Four short links: 5 October 2009

Bozo Cloud Talk, Annotation Fail(ish), Python MySQL Slash, and Infinite Books

by Nat Torkington@gnatcomments: 2

  1. Brown Cloud Marketing -- advertorial "interviewing" GM of a company offering "DNS in the cloud". This might be a worthwhile service, but the way he markets it (by saying open source is "freeware" and the market leader is "legacy") reveals a rich vein of bozo. Freeware legacy DNS is the internet's dirty little secret (actually, it's the reason we have a functioning DNS), Nominum software was written 100 percent from the ground up, and by having software with source code that is not open for everybody to look at, it is inherently more secure. (security through obscurity is equating clothing with being naked yet blind). The Internet kindly did the poor man's homework: screenshot of a cross-site scripting vulnerability in their customer portal, a Nominum security advisory from 2008, and the Nominum web server is running Linux, Apache, and PHP (all legacy freeware yet apparently not the Internet's dirty little secret). (via Bert Hubert and Securosis)
  2. Public Annotations on Healthcare Bill -- using technology from SharedBook, Congressman Culberson hoped to get citizens marking up the healthcare bill. They're using the software but many are just commenting on page 1--turning the hosted annotation platform into a forum with an odd user interface. It's a UI challenge: designing a way to let focused people comment on specific things, while also permitting impatient unfocused people to comment on the general topic. It's like asking for a SmartCar that seats 80. See also OpenCongress and their annotation system which also has hundreds of comments on the first few lines of the bill (including 39 on the one line "111th Congress"--apparently more contentious than you'd think!).
  3. MyConnPy -- pure-Python MySQL client library, useful because it requires no C compilation to install (and thus can work on systems without C compilers installed, e.g. mobile). (via Simon Willison)
  4. The Infinite Book -- design concept for an ebook reader (not a product you can buy yet). Sexy. (via Gizmodo)

tags: cloud, dns, ebooks, gov2.0, marketing, mysql, open source, python, social softwarecomments: 2
submit: Reddit Digg stumbleupon   

 

Recent Posts