|
|
|||||
Practical Tips for Government Web Sites (And Everyone Else!) To Improve Their Findability in SearchIn an earlier post, I said that key to government opening its data to citizens, being more transparent, and improving the relationship between citizens and government in light of our web 2.0 world was ensuring content on government sites could be easily found in search engines. Architecting sites to be search engine friendly, particularly sites with as much content and legacy code as those the government manages, can be a resource-intensive process that takes careful long-term planning. But two keys are:
Crawlability Quick Wins This post is about quick wins in crawlability. In many cases, ensuring crawlability also ensures accessibility (particularly access via screen readers). From this standpoint, many government web sites have an advantage over other sites since they already build in many accessibility features. Creating search-friendly sites also improves usability and user access from mobile devices and slow connections. So forget everything you may have heard about how you have to sacrifice user experience for SEO. SEO done right facilitates deeper audience engagement, makes it easier for visitors to navigate and find information on the site, and provides access to a wider variety of users. Use XML Sitemaps Why is this important? Many government sites have poor information architecture. Ideally each page of the site should have at least one link to it. This helps users navigate the site and helps search engines find all of the pages. Long term, these sites should revamp their navigational structure so that at least one link exists to every page. Since that may take some time to implement, an XML Sitemap can function in the meantime to provide a list of all pages for search engines to crawl. Government sites have already made great progress in search by using XML Sitemaps. The Energy Department's Office of Science and Technology (OSTI) implemented XML Sitemaps protocol with great success. "The first day that Yahoo offered up our material for search, our traffic increased so much that we could not keep up with it,' said Walt Warnick, OSTI's director. If possible, provide an HTML sitemap as well, which provides a browsable navigation to site visitors. Below is a good example of a browsable HTML sitemap on nih.gov: Don't block access to content Avoid dead ends when moving content This could be in part because links to this page are actually to http://health.nih.gov/viewPublication.asp?disease_id=85&publication_id=869&pdf=no, which then executes a 302 redirect to the destination page. Likely this is the way the site's content management system works. A database query triggers the actual page that should appear. This architecture could be made significantly more search engine friendly simply by the change from a 302 code to a 301 code. Use descriptive ALT text for images This has resulted in the pages linked to from the navigation not to be indexed (because search engines can't access the links and follow them): Ensure all links are working and that the server is responsive Ensure each page has a unique title and meta description that accurately describe the page
Ensure links are functional with JavaScript disabled
When you search for the exact text string from the Flash image, the nih.gov home page with the text isn't returned. The reason why becomes clear when you view the page in Google's cache. This enables you to see the page exactly as Google has crawled it. Instead of descriptions of articles, such as the one about Salmonella, search engines see "This feature requires Flash plugin version 8" and "This feature of NIH requires that JavaScript be enabled". This appears to be a pretty common problem. USA.GOV uses JavaScript to render "Today's Government News" on the home page. Since they've built it without progressive enhancement techniques, browsers without JavaScript support don't see this section. And neither do search engines. If you view text-only version of Google's cache of the USA.GOV home page, you'll see that the entire section is missing. Understand the fundamentals of search engine friendly web architecture Hit-rate: 30 If they're worried about the search engines crawling them too much for their servers to handle, they should use the crawl-delay directive and set the crawl rate to slower in Google Webmaster Tools. This post provides just a few examples of how a more crawlable site architecture can make a big difference in how findable the content is. Government sites face many of the same issues that large corporations do: many sites exist, all managed by different departments, no clear process for cross-department collaboration, and lack of global standards. But government and companies alike can evaluate their sites for crawlability in a step towards a more findable web. |
|||||
|
|||||
Comments: 22
Jesse Robbins [15 April 2009 12:58 PM]
Awesome post, Vanessa!
-Jesse
Lee Thorn [15 April 2009 02:48 PM]
Very helpful, Vanessa. Thanks,
Lee Thorn
Chair, Jhai Foundation
Keonda [16 April 2009 02:44 AM]
I love your case studies!
Some of your advices didn't fall on deaf hears. The Texas tourism site has its navigation back with images turned off and the 'Cities and Regions' page you mentioned is now indexed.
Yee-ha Texas! Now, don't forget to take a look at your canonical versions and decide between using TravelTex.com or TravelTexas.com (yup, both domains are indexed).
Shane Pollock [16 April 2009 05:41 AM]
Good information all the way around. Thanks for posting. It is amazing that some sites still miss the basics like unique meta data for each page.
len bullard [16 April 2009 08:14 AM]
These are good techniques. A quibble:
"Make all content available outside of a login, registration form, or other input mechanism."
Temper that advice with common sense. There are many operational web sites particularly in health and other government services which must explicitly be behind registration forms and hidden from search crawlers. Don't implement Fox's advice without using common sense and reading your requirements doc.
Open systems are not the same thing as open government. Don't be fooled.
Joe McCann [16 April 2009 10:38 AM]
Nice article.
Being a UX lead on a state government app/site currently, I'm highly familiar with pretty much everything you mentioned and I agree. One thing that should be noted is that having to actualy point out using "alt" attributes in image tags or making sure links work and/or content shows up with JavaScript-enabled should be fundamental for any government website...because it's the LAW.
If a site is not Section 508 compliant, then it is in violation of the law. Period. If you're site IS Secion 508 compliant, then you can't help but be SEO-friendly. This does not obviously have anything to do all the dead links and server-side stuff you mentioned, but accessibility is and should be core to all government sites. Proper indexing and ranking is just a consequence of that.
Take a look www.texasonline.com.
This site works completely without JavaScript enabled and progressive enhances the page when it does. None of the content is "unsearchable" and moreover, the site is Section 508 compliant.
Ajeet [16 April 2009 01:26 PM]
The very fact that you have to mention seemingly obvious facts is scary. Making stuff accessible to humans can be a challenge, but to a spider crawl things are different. Seems like there are a lot of orphan pages out there.
len [16 April 2009 03:00 PM]
Just remember that while making a page 'sales' and 'search engine' friendly, you can make it inscrutable and confusing to the person.
Don't confuse search and use and don't confuse sales/transparency and operations. It's a deadly mistake. If I open an operations page and in the first two seconds cannot spot what I am looking for, I fire the web page designer or relegate them to an editor.
The biggest mistakes made in virtual world markets were made by people who read Neuromancer and Snowcrash thinking them prophetic instead of fantasy. The biggest mistakes make in web page design have been made by CSS experts who failed to understand B.F. Skinner and didn't use their cognitive psych books for doorstops.
len [16 April 2009 03:00 PM]
Just remember that while making a page 'sales' and 'search engine' friendly, you can make it inscrutable and confusing to the person.
Don't confuse search and use and don't confuse sales/transparency and operations. It's a deadly mistake. If I open an operations page and in the first two seconds cannot spot what I am looking for, I fire the web page designer or relegate them to an editor.
The biggest mistakes made in virtual world markets were made by people who read Neuromancer and Snowcrash thinking them prophetic instead of fantasy. The biggest mistakes make in web page design have been made by CSS experts who failed to understand B.F. Skinner and didn't use their cognitive psych books for doorstops.
Vanessa Fox [16 April 2009 03:08 PM]
Len,
Excellent point - That sentence would be better phrased as "make all content you want your audience to find through search engines available outside of a login..."
There are definitely lots of content that should not be made available to search engines, and lots of different ways to make sure it's kept out. Perhaps I'll talk about that in a later post.
Len (2),
You should enjoy my next post, which is about exactly that -- the conversion workflow process from search to visit, to conversion. It does no good to rank well for something if visitors abandon the page as soon as they click over.
len [17 April 2009 09:55 AM]
Looking forward to it, Vanessa. IME, operations pages are treated differently in many ways, although of course, search behind the firewall of that is important, it takes on less importance as the role-based view-processes organize the workflow.
Should workflow organized into real-time 3D systems be searched differently?
Ted [20 April 2009 05:01 AM]
Very good post - there has always been a significant tension on egovernment sites between the need to be legal, highly accessible and 508-compliant, and the need to employ online marketing techniques to attract more viewers...ever since my earlier days implementing sites like nyc.gov, irs.gov, maryland.gov, delaware.gov, etc....online marketing/SEO techniques we almost totally discounted, because, after all, the government really doesn't focus on "competition"...however, in these heady days of social media, search engine optimization is essential to (1) help public service offerings stand out from the noise and commercialism, and (2) deliver public service alerts and products out "in the wild", so to speak, where people are engaging in online dialogue. Government most definitely has to compete online for viewer attention, especially in search results across many types of search engines and media channels.
DC web designer [21 April 2009 05:32 AM]
Great topic, great article. But, I would like to suggest two things:
First, because government websites MUST be Section 508 compliant (web accessible) they are probably 1,000 times more search engine optimized than commercial websites that rarely pay attention to web accessibility. If a site is Section 508 compliant is it probably 80% search engine optimized already.
Second, don't confuse openness with being found online due to a site's organize SEO. That is irrelevant. Most government websites rank high on Google simply because of the inbound links. Google "ICE" for example and ICE.GOV is the first result on Google's SERP. Openness means that you put information online that should be made public. It doesn't matter how search engine optimized is your site, if you don't have relevant information on the site, it's not "open" and the organization is not "open".
Cathey Daniels [21 April 2009 11:08 AM]
Thanks for an informative article, Vanessa. You quote Dr. Walter Warnick, Director of the U.S. Department of Energy Office of Scientific and Techinical Information (www.osti.gov) (the name was incorrect in your post, but that happens to us a lot!)on the importance of the sitemap protocol. OSTI's implementation of the sitemap protocol revved up public access to research documents that were otherwise difficult to find on the web. Federated search engines hosted at osti.gov (ScienceAccelerator.gov, Science.gov, and WorldWideScience.org) also play a large role in transparency for government science info.
Cathey Daniels
Public Information/Outreach
IIa, DOE OSTI
Rafael Montilla [22 April 2009 01:47 PM]
I would say to use 0% JavaScript, 10% flash.
Robert Visser [25 April 2009 08:20 AM]
There are over 200 criteria the Google algorithms assess when assigning value to a page. How one builds the elements, tags, and attributes contributes to their value.
In addition to each page having a unique Title element and Description and Keywords meta tags, it's important that the long tail keyword phrases placed are repeated in the page content. If not, their contribution will diminish. Some search engines will negate any contribution if long tail keyword phrases that are placed in meta data are not present in the page content.
One of the most overlooked aspects in generating meta data are character count limits. While the SEO community continues to debate what are recommended limits there are a few ways to measure how many characters a search engine will display on a Search Engine Result Page (SERP).
For a Title element it's generally accepted to have 60-80 characters. However, DMOZ permits 100.
On a Google SERP 160 characters will be displayed. While increasingly, I'm seeing that the text rendered is a mashup of page content, when an appropriate match is available, it's pulled from the Description meta tag. When entering data in the description field in Google Local Search permits 200 characters.
On a SERP Yahoo will display 170 characters in the description. Yahoo Local does not offer a Description dialogue window into which to enter meta data.
On a SERP Live will display 200 characters in the description. While the Live Search Local Listing Center does provide a description dialogue window in which I've tested entering in excess of 500 characters, I've not seen any instance where this is displayed on either Live Search or Live Search Maps SERPs or provided via a more info button.
There is, of course, the question of what's indexed vs. what is displayed.
In the DMOZ Description dialogue window 300 characters are permitted.
On a SERP Cuil will display 320 characters.
All too often when clients request a gap analysis study I find that the length of their Title element and Description meta tags is under utilized. In addition, there are equally as many instances where the text entered has a poor keyword effectiveness value, having neither sufficient contextual relevance to the page content nor does it help to differentiate their page from the competition. The Keyword Effectiveness Index (KEI) is a calculation based on the number of times a long tail keyword phrase is searched in the last 90 days, squared, divided by number of competing mentions.
Don't place content where it won't be read by search engines. This includes arrays in both javascript and JAVA, all things Flash, and textual elements such as buttons or other link trigers that are photos or a PDF rather than text (best to use both a photo and text in a button), etc. Content is King.
Pay attention to the naming conventions used in file and folder names. While this may be limited by the CMS, it's best not to place images in a folder titled, "images", (or worse, "img"). For both file and folder names use two to four word keyword phrases that are contextually relevant to the site's content.
Lastly, I would overwhelmingly recommend that everyone validate their pages with one of the World Wide Web Consortium's Validation tools, http://validator.w3.org/ .
Tom J [12 May 2009 09:38 AM]
Note the first 'TravelTex' description in the example is from DMOZ:
http://www.dmoz.org/Regional/North_America/United_States/Texas/Travel_and_Tourism/Guides_and_Directories/
Otherwise it would be the same as the sub-root pages.
The US State Department, the UK State Department, and the World Bank don't even include their names in the site titles or descriptions (you have to figure that out from looking at the URLs). Google for any country in the world and you'll see what I mean.
Lawrence [13 December 2009 04:12 PM]
As far as no HTML content (eg. docs, flash and apps) is concerned I suggest that they each be presented on a separate html page.
The downside is the extra click, but the the benefits go beyond the obvious SEO ones:
- Inclusion of a text blurb that describes the content before the user accesses it
- Accessibility - Alternative formats can be offered
- User / SEO friendly URLs that are easy to promote
- Allow comments, voting, related content links, etc...
indir [ 8 March 2010 07:26 PM]
This post is about quick wins in crawlability. In many cases, ensuring crawlability also ensures accessibility (particularly access via screen readers). From this standpoint, many government web sites have an advantage over other sites since they already build in many accessibility features. Creating search-friendly sites also improves usability and user access from mobile devices and slow connections. So forget everything you may have heard about how you have to sacrifice user experience for SEO. SEO done right facilitates deeper audience engagement, makes it easier for visitors to navigate and find information on the site, and provides access to a wider variety of users.
Frost [17 March 2010 07:24 AM]
Wonderful information. Yes, i definetly agree with you. i think a government website has to submit a perfect sitemaps. which is formated in XML doc. Creating Sitemap.xml makes Robots crawling and indexing faster. Also submitting sitemap.xml could increase total number of posts or url of government website indexed by search engine. Sitemap.xml is a sitemap booster. also, don't for get to pay attention to meta description and meta keywords tag. cause there are some search engine still use them. Thanks | Blogfuel For Inspiring Blogger
John [26 September 2010 09:15 PM]
Man this forum is long
Jerome [20 November 2011 01:15 PM]
Thank you vanessa for this post,
The very fact that you have to mention seemingly obvious facts is scary. Making stuff accessible to humans can be a challenge, but to a spider crawl things are different. Seems like there are a lot of orphan pages out there.