Google‘s name is synonymous with internet search and browsing. The Google homepage has become the gateway into the world wide web and its plainly presented, mostly blank starting screen is no accident – it is meant to represent an open page just waiting to be filled with search results based on your area of query.
Google understands that cyberspace is massive and it is expanding just about as fast as the universe itself. Just like our own cosmos there are chunks of matter out there in the form of huge planets of data (let’s call them web portals in this instance) with their own ecosystems inside of which users interact.
The problem is, no one has been able to create an accurate and definitive map of the universe, or cyberspace for that matter, due to the constantly changing dynamic shape of both worlds. But Google is there to guide us with some clever technologies and more than a couple of nifty tricks, which you may not even be aware of as you surf the web.
What’s really clever about Google’s search function is that when most of us use it we think that we’re searching the internet itself. In fact we’re not. We’re searching Google’s index of the web. Google doesn’t have connections to every single corner of the internet, but the company’s indexes are pretty darn good. In fact, they are among the biggest databases on the planet. We’re talking about many billions of webpages stored on thousands of machines around the world.
But how does Google build this index- and how does it ‘populate’ it with accurate and meaningful results data that will be useful to users? Even Google has to start somewhere, so it uses software programs known as spiders, also commonly referred to as crawlers or Googlebots. These useful little crawlers are sent out initially to the most logical places on the web. If you search for ‘Marmite’, most likely the first site the spider will have compiled your search results by visiting will be www.marmite.com, so no rocket science as yet. This first stage of website search is known as the ‘seed’ level.
After we pass the seed level we start to branch out. The spiders will then crawl outwards further and follow links from the initial pages that it finds and start to weave a web of interconnected websites that share relevance in terms of content. The spider builds up a pattern of pages linked to pages, which must be recursively revisited in order to ensure they still contain content relating to the original search. Pages are revisited based on frequency ‘policies’ that are set by software that resides on Google’s servers. But what we need to remember is that the web is so vast and changeable that no spider will ever capture all the information out there.
So let’s start with a search. Say we want to look up ‘toasted cheese sandwiches’. We type in those three words and press Return. Google’s query processor software then gets to work filtering through its indexes to decide which links to present. But hang on – what’s to stop us getting results on cheese making, results on toaster-buying advice and results on the Earl of Sandwich? Well, Google asks questions. More than 200 questions in fact. You could say that Google’s software uses a little artificial intelligence fiere because it tries to apply human logic to the vast lumps of raw data that it has to wade through.
To decide which ‘toasted cheese sandwich’ website to present to us, Google asks whether the words appear in the website’s title or URL. Google asks how many times the words appear in the correct order on any given website. Does the page include synonyms for ‘toasted cheese sandwich’ such as ‘grilled Cheddar buttie’ or ‘hot cheesy panini’. Discussing the mechanics of how to describe a toasted cheese sandwich might sound silly, but it’s all logical to the guys who run Google’s data centre.
As well as checking for poor design or poor content quality, Google will also check for the presence of obvious spam, viruses and malware. Our search process will then start to classify pages by their ‘page rank’, which is a formula-based score derived from Google’s own calculations. A page rank score is obtained by analyzing how many external pages point to a particular website or cite it as a reference or authority on a subject. All this is done in roughly half a second and your search term results will, depending on your web-connection speed, come back to you nearly instantaneously.
Now, of course Google could be on the pay roll of the international cheese sandwich society (for example) and so therefore be quite keen to present you with certain pages relating to that organisation’s own interests. But it’s not. Google’s results are impartial and the company will not take payments from companies who want to push up their page-ranking results. Although there will be ‘Google Ads’ down the right-hand side of the page and sometimes on top, which have been classified as ‘relevant’ and ‘supporting’ to your search term.
So, just how should you read a Google search result? Is it as simple as just clicking the top result on the page? Have you actually read down below the headline to look at some of the other information that the search engine is presenting to you? Right underneath your highlighted blue link you will see a short description of the website’s content. This is part of the metadata of the website itself – or to put it another way, it is information about information. Either way you look at it, it’s your fastest route to getting a handle on what you’re likely to find if you decide to click onwards.
Right underneath the website description is the site’s cached results, which can be displayed if you want to be able to cross-reference exactly when the last time the Googlebots dipped onto the site in question for an update.
This version of the page will also give you colour-coded highlighted mark-ups of your search terms showing you exactly where they have been used. The cached version of the page is actually stored on Google’s own servers and it is this content that the company uses to calculate and establish the site’s page ranking. If the web server that hosts the ‘live’ version of the site you want to visit is acting up or working too slowly, you might like to remember that Google’s servers are generally set to run pretty fast, so you could always use this version of the page instead.
Beside the ‘Cached’ link you’ll see the ‘Similar’ link and this is pretty self explanatory. Part of the concept of web search is that we often don’t really know what we’re looking for until we find it. Google can help here with some tangential results that you may not have considered searching for. Google uses its own server-based software to apply different logic to the initial search and find words that are related to the initial terms. You can play with this function more extensively if you use Google’s Advanced search function or if you type “related: URL’ where URL is the full website address that you want to examine, as in related: http://www.someinterestingfacts.net.
Once you really dig into Google search you can start playing around with the user options, which are accessible right from the Google homepage. Not only can you change the language interface that is presented to you, but you can also change the native language that you are searching in and that you want results delivered back to you in.
You can also set the SafeSearch option to block pages with sexually explicit content and you can even change your search format to display preferences more suited to a mobile smartphone or PDA.