In short, a search engine works by taking a “snapshot” of each page it comes across. When you search for something, the search engine sifts through its list of snapshots to find the most relevant pages. It then sends back the results in order of relevance.

Now, if you’re searching for how search engines work, you’re probably looking for a more in-depth explanation than this.

Luckily, this a whole article on the subject!

There are 2 different aspects to search engines and this post covers each one separately. These two aspects are:

  • Crawling/indexing
  • Delivering results

There’s no one better to explain this topic to you than Google. Here’s a link to a video they have created covering this subject!

 

Image of google logo's, spiders and a web to indicate how search engines crawl the web

How search engines work: Crawling

Finding a web page is the first thing that search engines need to do. To do this, most search engines use spiders (also called crawlers). These programs travel around the web clicking links on different web pages.

The search engine’s crawler then takes a snapshot of each webpage they come across. This is how a search engine like google records a web page.

Why does Google crawl sites?

There are millions of web pages created every day. If it had to find your results in “real-time,” it would not only be much slower but cost a lot more energy.

Taking snapshots of sites beforehand, and then basing the order of the search results on these snapshots is much more efficient.

So, now we know that Google and other search engines crawl web pages to discover new content. But, how do they decide where to position a web page?

 

Graphic with the numbers 1, 2 and 3 to indicate ordering search results

How do search engines order search results?

Creating a computer program to view and store web pages isn’t all too difficult. But, positioning web pages in the correct places is a whole other challenge.

To do this, Google and other search engines use algorithms. An algorithm is a set of rules that are followed to solve a problem.

If you take standardized steps to solve a math problem, or to get dressed, this is classed as an algorithm too.

In Google’s case, their algorithm is a set of rules to determine how to find relevant web pages, and then how to rank them in the correct order.

To understand this better, let’s go into a little more detail on the challenges search engines face.

 

 

 

Understanding context and errors

 

  • Spelling errors and variations

Man utd, Manchester United, Man U… you get the point

 

  • Homonyms

When Billy searches for: “How to find a date,” he probably doesn’t want suggestions about the best route to the supermarket’s fruit section. He doesn’t want the cheapest flight to the date palm fields of Egypt either.

Then, when he searches for “date palms,” he probably isn’t looking for palm reading about his future dates. He most likely is not diving into a deep soul-searching session either.

When Susan googles “sole searching,” she doesn’t want 4 YouTube videos of people playing hide and seek with their shoes… (you find the craziest stuff online nowadays!)

  • Context

When Susan and Billy are in Spain, there’s no use in showing them restaurant suggestions in Manchester when they search “best local restaurants.”

This is already challenging for an algorithm to deal with. But there’s more!

Search results for "man u" to show how search engines work and deal with homonyms and other language difficulties

 

Dealing with website owners

There are a huge number of people who use search engines like Google each day. Any marketer knows that a large number of eyes means there’s money to be made.

A website can turn website visitors into money in a variety of ways. Some of these ways are displaying ads and selling products on their site.

If they can get to the top of every search engine, that means millions of page views per day. That results in serious money for the website owners!

As you can tell, this brings about a conflict of interest.

Google’s main focus is providing the best results for each searcher. However, website owners’ main priority is to get the most traffic.

It’s often tempting for website owners to manipulate the search engine rankings.

For example… Say Google’s algorithm just focused on the greatest number of on-topic words on a web page. Then, the page with the most on-topic words would win the highest position.

A lot of business owners would just put their chosen words on each web page thousands of times. I don’t think the algorithm was ever quite that simple, but spamming your words onto a page used to work.

 

Sorting the relevant from the irrelevant

When you look at a basic search term, like “how to use a ladder,” you see there are more than a quarter of a billion search results.

Don’t forget; These are only the pages that Google sees as on-topic, and Google has only crawled a portion of the web.

Sorting through these pages and delivering the best results is a challenge.

Take a moment to think about it.

Search engines need an algorithm that will return the top 10 results out of almost 300 million results.

This is a huge task, especially when considering that some webpages may have malware on them, others might try to scam Google’s users and some are irrelevant to this search term.

This also comes with a lot of ethical issues, but more on the ethics of search engines in this post!

 

Image of someone searching "solve this please"

How do search engines solve these problems?

To start with, they need to bring down the total number of possible results. They do this by using keywords.

If a page is about computers, then it will obviously have the word computer or PC somewhere on the page.

So, by using a deduction like this, search engines like google go from billions or trillions of pages to tens or hundreds of millions!

(Please note that no one knows for sure what googles exact algorithm is. These examples are simplified so you can get a better idea about how search engines work)

After this, it follows about 200 other rules to order the pages. One of these rules that Google came up with is “page rank.”

Page rank was created by Larry Page and is very simply put the number of quality websites that link to your website.

His theory: The more high-quality pages that point to your page, the more trustworthy your page is. This didn’t always work so great when it was first invented, but Google’s algorithm has improved a lot in the past 10-20 years!

Page rank, or at least backlinks, is/are still a part of the Google algorithm, but there are many new factors that are being incorporated too. As someone who works with their algorithm quite a lot, I can say that there is still a lot they can do to prevent websites from spamming their way to the top. They have done a magnificent job up until now though!

After all of this (this usually takes less than a second), you “finally” have your search results.

Conclusion

We take search engines for granted every day. How search engines work isn’t something that’s often thought about but is an interesting topic none the less. Luckily, after reading this article you have a good idea about how search engines work and some of the problems that search engines face!

Share This