Finding and 'understanding' a website's content
How do you know what a website is about? Simple. You just read it. You use your brain’s highly advanced information and language-processing capabilities to decide what a website is about and whether it is relevant to your needs.
For Google, in spite of the hundreds of millions of dollars spent on Google’s technology, it simply cannot ‘understand’ a site in the same way that humans do: it has no eyes to pick up on how content is arranged, it has no brain to read, understand and assess the words. All it can do is process what’s known as the ‘source code’ for each page that it finds – assuming it can find it in the first place.
And just in case you still think that Google has people looking at web sites to decide what they are about… they don’t. Every single one of the billions of pages you can find in Google has arrived there as a result of a visit by a software program, usually called a ‘spider’. Google’s spider has its own name: the Googlebot.
Google’s software – not people – identify the basics of ‘meaning’
Looking at the source code of an individual page, Google is able to determine which parts refer to the visible content that a human visitor will actually see, and which bits are the ‘other stuff’ that tells the browser what to make the page look like and how to work. From this initial filtering of the content and code, Google then automatically processes the language to establish the subject of the content, page by page. Beyond this, however, the reality is that Google struggles to deduce much more about what your site is about.
Google asks for your help
As a consequence of this struggle to really understand a site’s content, Google explicitly asks you, as a site owner, to help it to understand your site better. It asks for you help in many, many areas, and provides a wealth of tools and information intended to give you insight into what Google needs and what Googlebot has found on your site.
And help you must give it. Producing a highly-visible website means delivering technology that is directly aligned with Google’s advice about how to help it understand content automatically, for example by adhering to Web Standards which allows site owners to apply semantic mark-up to the content within a page.
Beyond Google’s obvious need to ‘understand’ what it is visiting and taking away for its index, it also needs to be able to find the pages of content in the first place. If Google cannot find your pages of content by encountering a URL for them, they will never get into its index and never be shown to its users.
Technical Visibility is deep and complex
As an introduction to the notion of a web site’s Visibility this really is just a scraping of the surface.
Visibility incorporates dozens more issues, and our Visibility audit reports for clients frequently see very in-depth investigation of subjects such as content duplication, URL and domain migration, URL rewriting, domain strategy and progressive enhancement techniques. There are some examples of specific Visibility issues below.
Visibility example 1: Use semantics to help Google
Just one of the numerous ways that you can seek to ensure that your business’ sites are ‘helping’ Google and other search engines is by ensuring that your site’s source code makes use of what is known as semantic markup – a technique which allows you to help Google better identify the relative importance of items of content on your site. Of course, Google still doesn’t actually understand what it is indexing, but by using code with correct semantic markup you help Google to identify with greater precision the most important content and vocabulary on your site. In turn, this will help Google to more accurately determine whether it is relevant to searches carried out by somone searching on Google.
Visibility example 2: Provide normal hyperlinks to all of the content on your site that you want to appear in Google
We frequently come across clients’ sites where the only way for a human visitor to find a particular page is to type some words into a search box and then select the page in question from a results list. Google cannot – and does not – enter content into search boxes. The way in which Google finds content to put in its index is simply to follow normal HTML hyperlinks that it comes across on websites. Pages that are not found via a link are therefore rendered invisible to Google.