Practical Tips for Government Web Sites (And Everyone Else!) To Improve Their Findability in Search

In an earlier post, I said that key to government opening its data to citizens, being more transparent, and improving the relationship between citizens and government in light of our web 2.0 world was ensuring content on government sites could be easily found in search engines. Architecting sites to be search engine friendly, particularly sites with as much content and legacy code as those the government manages, can be a resource-intensive process that takes careful long-term planning. But

two keys are:

  • Assessing who the audience is and what they’re searching for
  • Ensuring the site architecture is easily crawlable

Crawlability Quick Wins
This post is about quick wins in crawlability. In many cases, ensuring crawlability also ensures accessibility (particularly access via screen readers). From this standpoint, many government web sites have an advantage over other sites since they already build in many accessibility features. Creating search-friendly sites also improves usability and user access from mobile devices and slow connections. So forget everything you may have heard about how you have to sacrifice user experience for SEO. SEO done right facilitates deeper audience engagement, makes it easier for visitors to navigate and find information on the site, and provides access to a wider variety of users.

Use XML Sitemaps
Create XML Sitemaps that list all the pages on the site and submit them to the major search engines.

Why is this important? Many government sites have poor information architecture. Ideally each page of the site should have at least one link to it. This helps users navigate the site and helps search engines find all of the pages. Long term, these sites should revamp their navigational structure so that at least one link exists to every page. Since that may take some time to implement, an XML Sitemap can function in the meantime to provide a list of all pages for search engines to crawl.

Government sites have already made great progress in search by using XML Sitemaps.

The Energy Department’s Office of Science and Technology (OSTI) implemented XML Sitemaps protocol with great success. “The first day that Yahoo offered up our material for search, our traffic increased so much that we could not keep up with it,’ said Walt Warnick, OSTI’s director.

If possible, provide an HTML sitemap as well, which provides a browsable navigation to site visitors. Below is a good example of a browsable HTML sitemap on nih.gov:

nih_sitemap.jpg

Don’t block access to content
Make all content available outside of a login, registration form, or other input mechanism. Search engine crawlers can’t access content behind a login or registration. If the content requires the visitor to enter an email address or otherwise provide input before accessing it, it won’t show up in search results.

Avoid dead ends when moving content
When content moves, change the links within the site to point to the new location. Implement a 301 redirect from the old page to the new page. A 301 redirect is a server code that lets browsers (and search engines) know that a page has moved. Some servers send 302 codes by default instead. A 302 redirect is a server code that indicates that the move is temporary, so search engines tend not to index those pages. For instance, the Myelodysplastic Syndromes Treatment page on the NIH site isn’t indexed by Google and the NIH website doesn’t appear at all on the first page of Google results for a search for [myelodysplastic syndromes].

nih_noindex.jpg

This could be in part because links to this page are actually to http://health.nih.gov/viewPublication.asp?disease_id=85&publication_id=869&pdf=no, which then executes a 302 redirect to the destination page. Likely this is the way the site’s content management system works. A database query triggers the actual page that should appear.

This architecture could be made significantly more search engine friendly simply by the change from a 302 code to a 301 code.

Use descriptive ALT text for images
Of course, using the ALT attribute for images is useful for more than search engine purposes. it also improves accessibility overall and makes the site easier to use from screen reader and slow connections. Government web sites generally do a fairly good job with this. But my earlier post on whitehouse.gov describes how full swaths of text are hidden by images.You can see that the Texas tourism site loses its navigation entirely with images turned off:

texashome.jpg

This has resulted in the pages linked to from the navigation not to be indexed (because search engines can’t access the links and follow them):

texas4.jpg

Ensure all links are working and that the server is responsive
This, of course, is good practice for usability as well as for search engine optimization. You can find reports for both broken links and server issues in Google Webmaster Tools and Microsoft Live Search Webmaster Tools.

Ensure each page has a unique title and meta description that accurately describe the page
Again, looking at the Texas tourism site, you can see how lack of these elements creates a poor user experience in the search results:

texas3.jpg

You can also see this with the HSTAT site. The title tag (and heading) of this page is “Key Steps”. Someone viewing that page in search results would have no way of knowing what that page is about. Better would be something like “Preventing Pressure Ulcers: Key Steps | National Library of Medicine”. This both describes the content and indicates the authority of the material.

hstat2.jpg

Ensure links are functional with JavaScript disabled
For example, the links on the left side of the HSTAT pages that enable you to navigate to the content of the book aren’t functional with JavaScript disabled:

hstat1.jpg


Use progressive enhancement best practices to ensure a usable experience with Flash, JavaScript, and similar elements disabled

The nih.gov home page, for instance, uses Flash to rotate through several headlines with descriptive text (such as the Salmonella description shown below) that is invisible to search engines.

nih_home.jpg

When you search for the exact text string from the Flash image, the nih.gov home page with the text isn’t returned. The reason why becomes clear when you view the page in Google’s cache. This enables you to see the page exactly as Google has crawled it.

nihcache.jpg

Instead of descriptions of articles, such as the one about Salmonella, search engines see “This feature requires Flash plugin version 8″ and “This feature of NIH requires that JavaScript be enabled”. This appears to be a pretty common problem.

js.jpg

USA.GOV uses JavaScript to render “Today’s Government News” on the home page.

usagovjs.jpg

Since they’ve built it without progressive enhancement techniques, browsers without JavaScript support don’t see this section. And neither do search engines. If you view text-only version of Google’s cache of the USA.GOV home page, you’ll see that the entire section is missing.

Understand the fundamentals of search engine friendly web architecture
Lots of resources exist, including my sites Jane and Robot and Nine By Blue, and the search engines also provide much of this information for free (see for instance, google.com/webmasters). A quick peek at the medicare.gov robots.txt file, for instance, shows that they’ve implemented directives that the search engines don’t recognize:

Hit-rate: 30
#wait 30 seconds before starting a new URL request default=30
Visiting-hours: 01:00EST-05:00EST
#index this site between 1AM – 5AM EST
Concurrent-hits: 2
#limit concurrent active URLs to 2 for each index server

If they’re worried about the search engines crawling them too much for their servers to handle, they should use the crawl-delay directive and set the crawl rate to slower in Google Webmaster Tools.

This post provides just a few examples of how a more crawlable site architecture can make a big difference in how findable the content is. Government sites face many of the same issues that large corporations do: many sites exist, all managed by different departments, no clear process for cross-department collaboration, and lack of global standards. But government and companies alike can evaluate their sites for crawlability in a step towards a more findable web.

tags: , ,
  • http://radar.oreilly.com/jesse/ Jesse Robbins

    Awesome post, Vanessa!

    -Jesse

  • http://www.jhai.org Lee Thorn

    Very helpful, Vanessa. Thanks,

    Lee Thorn
    Chair, Jhai Foundation

  • Keonda

    I love your case studies!

    Some of your advices didn’t fall on deaf hears. The Texas tourism site has its navigation back with images turned off and the ‘Cities and Regions’ page you mentioned is now indexed.

    Yee-ha Texas! Now, don’t forget to take a look at your canonical versions and decide between using TravelTex.com or TravelTexas.com (yup, both domains are indexed).

  • http://www.GreaterRaleighRealty.com Shane Pollock

    Good information all the way around. Thanks for posting. It is amazing that some sites still miss the basics like unique meta data for each page.

  • http://3donthewebcheap.blogspot.com len bullard

    These are good techniques. A quibble:

    “Make all content available outside of a login, registration form, or other input mechanism.”

    Temper that advice with common sense. There are many operational web sites particularly in health and other government services which must explicitly be behind registration forms and hidden from search crawlers. Don’t implement Fox’s advice without using common sense and reading your requirements doc.

    Open systems are not the same thing as open government. Don’t be fooled.

  • http://www.subprint.com/ Joe McCann

    Nice article.

    Being a UX lead on a state government app/site currently, I’m highly familiar with pretty much everything you mentioned and I agree. One thing that should be noted is that having to actualy point out using “alt” attributes in image tags or making sure links work and/or content shows up with JavaScript-enabled should be fundamental for any government website…because it’s the LAW.

    If a site is not Section 508 compliant, then it is in violation of the law. Period. If you’re site IS Secion 508 compliant, then you can’t help but be SEO-friendly. This does not obviously have anything to do all the dead links and server-side stuff you mentioned, but accessibility is and should be core to all government sites. Proper indexing and ranking is just a consequence of that.

    Take a look http://www.texasonline.com.

    This site works completely without JavaScript enabled and progressive enhances the page when it does. None of the content is “unsearchable” and moreover, the site is Section 508 compliant.

  • http://www.210v.com Ajeet

    The very fact that you have to mention seemingly obvious facts is scary. Making stuff accessible to humans can be a challenge, but to a spider crawl things are different. Seems like there are a lot of orphan pages out there.

  • http://lamammals.blogspot.com len

    Just remember that while making a page ‘sales’ and ‘search engine’ friendly, you can make it inscrutable and confusing to the person.

    Don’t confuse search and use and don’t confuse sales/transparency and operations. It’s a deadly mistake. If I open an operations page and in the first two seconds cannot spot what I am looking for, I fire the web page designer or relegate them to an editor.

    The biggest mistakes made in virtual world markets were made by people who read Neuromancer and Snowcrash thinking them prophetic instead of fantasy. The biggest mistakes make in web page design have been made by CSS experts who failed to understand B.F. Skinner and didn’t use their cognitive psych books for doorstops.

  • http://lamammals.blogspot.com len

    Just remember that while making a page ‘sales’ and ‘search engine’ friendly, you can make it inscrutable and confusing to the person.

    Don’t confuse search and use and don’t confuse sales/transparency and operations. It’s a deadly mistake. If I open an operations page and in the first two seconds cannot spot what I am looking for, I fire the web page designer or relegate them to an editor.

    The biggest mistakes made in virtual world markets were made by people who read Neuromancer and Snowcrash thinking them prophetic instead of fantasy. The biggest mistakes make in web page design have been made by CSS experts who failed to understand B.F. Skinner and didn’t use their cognitive psych books for doorstops.

  • http://www.ninebyblue.com Vanessa Fox

    Len,

    Excellent point – That sentence would be better phrased as “make all content you want your audience to find through search engines available outside of a login…”

    There are definitely lots of content that should not be made available to search engines, and lots of different ways to make sure it’s kept out. Perhaps I’ll talk about that in a later post.

    Len (2),

    You should enjoy my next post, which is about exactly that — the conversion workflow process from search to visit, to conversion. It does no good to rank well for something if visitors abandon the page as soon as they click over.

  • http://3donthewebcheap.blogspot.com len

    Looking forward to it, Vanessa. IME, operations pages are treated differently in many ways, although of course, search behind the firewall of that is important, it takes on less importance as the role-based view-processes organize the workflow.

    Should workflow organized into real-time 3D systems be searched differently?

  • http://www.kmeinternetmarketing.com Ted

    Very good post – there has always been a significant tension on egovernment sites between the need to be legal, highly accessible and 508-compliant, and the need to employ online marketing techniques to attract more viewers…ever since my earlier days implementing sites like nyc.gov, irs.gov, maryland.gov, delaware.gov, etc….online marketing/SEO techniques we almost totally discounted, because, after all, the government really doesn’t focus on “competition”…however, in these heady days of social media, search engine optimization is essential to (1) help public service offerings stand out from the noise and commercialism, and (2) deliver public service alerts and products out “in the wild”, so to speak, where people are engaging in online dialogue. Government most definitely has to compete online for viewer attention, especially in search results across many types of search engines and media channels.

  • http://www.inqbation.com DC web designer

    Great topic, great article. But, I would like to suggest two things:

    First, because government websites MUST be Section 508 compliant (web accessible) they are probably 1,000 times more search engine optimized than commercial websites that rarely pay attention to web accessibility. If a site is Section 508 compliant is it probably 80% search engine optimized already.

    Second, don’t confuse openness with being found online due to a site’s organize SEO. That is irrelevant. Most government websites rank high on Google simply because of the inbound links. Google “ICE” for example and ICE.GOV is the first result on Google’s SERP. Openness means that you put information online that should be made public. It doesn’t matter how search engine optimized is your site, if you don’t have relevant information on the site, it’s not “open” and the organization is not “open”.

  • http://www.osti.gov Cathey Daniels

    Thanks for an informative article, Vanessa. You quote Dr. Walter Warnick, Director of the U.S. Department of Energy Office of Scientific and Techinical Information (www.osti.gov) (the name was incorrect in your post, but that happens to us a lot!)on the importance of the sitemap protocol. OSTI’s implementation of the sitemap protocol revved up public access to research documents that were otherwise difficult to find on the web. Federated search engines hosted at osti.gov (ScienceAccelerator.gov, Science.gov, and WorldWideScience.org) also play a large role in transparency for government science info.

    Cathey Daniels
    Public Information/Outreach
    IIa, DOE OSTI

  • http://www.mercadeoporinternet.com Rafael Montilla

    I would say to use 0% JavaScript, 10% flash.

  • http://www.pagerank-seo.com Robert Visser

    There are over 200 criteria the Google algorithms assess when assigning value to a page. How one builds the elements, tags, and attributes contributes to their value.

    In addition to each page having a unique Title element and Description and Keywords meta tags, it’s important that the long tail keyword phrases placed are repeated in the page content. If not, their contribution will diminish. Some search engines will negate any contribution if long tail keyword phrases that are placed in meta data are not present in the page content.

    One of the most overlooked aspects in generating meta data are character count limits. While the SEO community continues to debate what are recommended limits there are a few ways to measure how many characters a search engine will display on a Search Engine Result Page (SERP).

    For a Title element it’s generally accepted to have 60-80 characters. However, DMOZ permits 100.

    On a Google SERP 160 characters will be displayed. While increasingly, I’m seeing that the text rendered is a mashup of page content, when an appropriate match is available, it’s pulled from the Description meta tag. When entering data in the description field in Google Local Search permits 200 characters.

    On a SERP Yahoo will display 170 characters in the description. Yahoo Local does not offer a Description dialogue window into which to enter meta data.

    On a SERP Live will display 200 characters in the description. While the Live Search Local Listing Center does provide a description dialogue window in which I’ve tested entering in excess of 500 characters, I’ve not seen any instance where this is displayed on either Live Search or Live Search Maps SERPs or provided via a more info button.

    There is, of course, the question of what’s indexed vs. what is displayed.

    In the DMOZ Description dialogue window 300 characters are permitted.

    On a SERP Cuil will display 320 characters.

    All too often when clients request a gap analysis study I find that the length of their Title element and Description meta tags is under utilized. In addition, there are equally as many instances where the text entered has a poor keyword effectiveness value, having neither sufficient contextual relevance to the page content nor does it help to differentiate their page from the competition. The Keyword Effectiveness Index (KEI) is a calculation based on the number of times a long tail keyword phrase is searched in the last 90 days, squared, divided by number of competing mentions.

    Don’t place content where it won’t be read by search engines. This includes arrays in both javascript and JAVA, all things Flash, and textual elements such as buttons or other link trigers that are photos or a PDF rather than text (best to use both a photo and text in a button), etc. Content is King.

    Pay attention to the naming conventions used in file and folder names. While this may be limited by the CMS, it’s best not to place images in a folder titled, “images”, (or worse, “img”). For both file and folder names use two to four word keyword phrases that are contextually relevant to the site’s content.

    Lastly, I would overwhelmingly recommend that everyone validate their pages with one of the World Wide Web Consortium’s Validation tools, http://validator.w3.org/ .

  • Tom J

    Note the first ‘TravelTex’ description in the example is from DMOZ:
    http://www.dmoz.org/Regional/North_America/United_States/Texas/Travel_and_Tourism/Guides_and_Directories/
    Otherwise it would be the same as the sub-root pages.

    The US State Department, the UK State Department, and the World Bank don’t even include their names in the site titles or descriptions (you have to figure that out from looking at the URLs). Google for any country in the world and you’ll see what I mean.

  • Lawrence

    As far as no HTML content (eg. docs, flash and apps) is concerned I suggest that they each be presented on a separate html page.

    The downside is the extra click, but the the benefits go beyond the obvious SEO ones:

    - Inclusion of a text blurb that describes the content before the user accesses it
    - Accessibility – Alternative formats can be offered
    - User / SEO friendly URLs that are easy to promote
    - Allow comments, voting, related content links, etc…

  • indir

    This post is about quick wins in crawlability. In many cases, ensuring crawlability also ensures accessibility (particularly access via screen readers). From this standpoint, many government web sites have an advantage over other sites since they already build in many accessibility features. Creating search-friendly sites also improves usability and user access from mobile devices and slow connections. So forget everything you may have heard about how you have to sacrifice user experience for SEO. SEO done right facilitates deeper audience engagement, makes it easier for visitors to navigate and find information on the site, and provides access to a wider variety of users.

  • Frost

    Wonderful information. Yes, i definetly agree with you. i think a government website has to submit a perfect sitemaps. which is formated in XML doc. Creating Sitemap.xml makes Robots crawling and indexing faster. Also submitting sitemap.xml could increase total number of posts or url of government website indexed by search engine. Sitemap.xml is a sitemap booster. also, don’t for get to pay attention to meta description and meta keywords tag. cause there are some search engine still use them. Thanks | Blogfuel For Inspiring Blogger

  • http://www.ilrea.net John

    Man this forum is long

  • http://www.mediaoptim.com/blog Jerome

    Thank you vanessa for this post,
    The very fact that you have to mention seemingly obvious facts is scary. Making stuff accessible to humans can be a challenge, but to a spider crawl things are different. Seems like there are a lot of orphan pages out there.