Junior Data Scientist
So I decided to construct a Search Engine Optimization (SEO) handbook for beginners, with a focus on organic search results.
Why Did I Write That? Because I wanted to understand what all the hype around SEO is about. I then kept notes for myself, and finally, I thought to share!
As Seth Stephens-Davidowitz has stated:
“Google searches are the most important dataset ever collected on the human psyche.”
By that he meant that, in the context of Big Data and by collecting and analyzing data derived from Google searches, we can analyze -and even predict- people’s behavior, needs, trends, motivations, and the list goes on.
- You probably won’t have your webpage optimized if you first don’t know which group of keywords you should be focusing on.
- If you have a high impression rate but low click-through-rates (CTRs) this means you are showing up at the search results but viewers don’t click your website.
- If the search volume is high and competition/difficulty is low, this means there is large potential traffic at the lowest levels of competition, i.e. demand is high and supply is low, a fact which may suggest a market opportunity.
- You want to focus on keywords that are relevant to your content, that have high search volume, and keywords with medium or low competition (since it would be difficult for you to be discriminated against others if there is a lot of competition around those keywords).
- If a specific keyword is being aggressively bid on in Cost-Per-Click (CPC) markets, this is an indicator of how difficult it will be organically. I.e. if a keyword is being aggressively bid on, this means that significant competition from paid ads exists, and developing an alternative keyword might be necessary.
Definition: In simple terms, SEO consists of the steps and processes undertaken so that the search visibility and ranking of your website by the search engines can be increased. In other words, you can use SEO to show up in the Search Engine Result Pages (SERPs) at a high-rank position, and so your potential customers/stakeholders can find you.
Types: There are two main SEO types: Organic (natural) SEO, and Non-organic (paid / artificial) SEO.
- In the former case, you try to increase your organic search results by focusing on factors such as content creation and optimization, keyword research, link building, and meta-tag optimization. The subcategories of this SEO type are: On-Page SEO (e.g., keyword research, keyword optimization, content creation), Technical SEO (e.g., site speed, mobile-friendliness, site architecture, website’s security), and Off-Site SEO (e.g., link building, backlinks).
- Whereas via non-organic SEO, you essentially pay for ads, such that of Google’s pay-per-click (PPC) advertising solution, and you rely on paid search marketing approaches. One of the basic performance metrics here is the cost-per-click (CPC) evaluation. Examples can be: Google ads, Google Product Listing Ads, Google Shopping Ads, Bing Ads.
Comparison: The main advantages of organic SEO are that you don’t have to pay for ads and at the same time you attract relevant users, i.e. users that are really trying to search for similar content, product, or services of yours. On the other hand, by PPC you can see instant results by attracting “ready-to-buy” users, however, it might not be a good long-term strategy move.
SEO in the context of Information Retrieval (IR)
A successful SEO strategy can be achieved by optimizing both for the search engines and the surfers/consumers. By “search engines” it is meant the technical part which is interrelated with “Information Retrieval”. By “consumers”, we are interested in the human perspective and element which can be studied by Information Behaviour and Information Seeking theories. Specifically for the latter case, we want to answer questions like “how do people start a search”, “how do users seek information, and how do they utilize it”, and finally, “what types of search engines require different solutions”.
There are lots of things to study around IR, and someone can start by exploring the PageRank algorithm, Zipf’s law, and by understanding the concepts around stopwords and stemming. However, I would like here to point out the terms “description” vs. “discrimination” of a document, and the main recommender systems evaluation metrics which are precision and recall. Why is that important? Because they are correlated with the website’s relevance and authenticity, the latter of which, subsequently, affects your search engine rankings and your position at SERPs.
It is determined by various factors such as:
- Content and code implementation
- Thematic and semantic (metadata) connections between user’s query and your website’s content
A tricky concept about relevance in IR is that we want our document to have a good description regarding its content (
), but at the same time we also want that document to be discriminated against other documents (
). The problem here is that if we try to describe our document with ‘common sense’ (i.e. in a way that everybody would describe it) then we would probably not achieve satisfactory document discrimination because this is how everybody described similar documents as well!
Authenticity in the context of SEO is known as Domain Authority. Essentially, it is a measure of how authoritative your domain is. Contributing factors include:
- Reviews about your website
- The quantity and quality of links (hyperlinks) pointing to you from other websites (third-party domains), known as inbound links (backlinks)
Precision in the context of IR:
It tells us how useful the results are (effectiveness in terms of the given results).
- A perfect precision score of 1 means that every result retrieved was relevant, but it says nothing about if all relevant documents were retrieved.
- You might prefer higher precision than recall, for instance, in legal and medical queries where there is a substantial need for high precision and correct results.
It tells us how complete the results are (completeness in terms of the given results).
- A perfect recall score of 1 means that all relevant documents were retrieved, but it says nothing about how many irrelevant documents were also retrieved.
- You might prefer higher recall than precision when there is a need for a plethora of information results retrieved, even if some of them might be irrelevant to some extent. Example: YouTube recommendations, recommendations for online library collections / scientific articles.
Ok, let’s now dive into SEO in more detail. We initially discriminated the main SEO types, and in the following part, I will be focusing on Organic SEO. It is important to first understand the ranking factors on search results. Some of them can be depicted below hierarchically based on their importance:
During the implementation of an organic SEO strategy, you will most likely find yourself focusing around: keywords and content.
The main themes here are: keyword attributes, keyword research, and keyword distribution.
1. Keyword Attributes
- Relevance: You are looking for relevant and descriptive keywords. For instance, if you are selling cars or bicycles, don’t just focus on those exact keywords (this is what everybody includes in their car/bicycle websites anyway). You should write about the specific brands you sell as well as about more car/bicycle attributes.
- Search Volume: The number of searches of a particular keyword. Indicative tools: Moz Explorer, Wordstream, Ahrefs, Semrush, Google Trends.
- Competition (difficulty): If what you are selling is already on the web and sold by others as well, this inevitably means that there is already a lot of content around a group of keywords describing your product. On the other hand, when competition is low, then the keyword difficulty is low, and this means that users don’t find many available websites for their given query.
2. Keyword Research
Here you try to extract insights about your website (e.g., Google Search Console), and you want to discover search volume metrics by giving answers to questions such as “What is the current state of demand for my particular keywords?”. It might also help your research to proceed to keyword categorization by clustering your keywords into their main topics.
3. Keyword Distribution
It is the procedure of how you will assign and distribute your specific keywords across your website’s pages.
Let’s have a look at an example with the query “Data Science”. The following image is derived from https://answerthepublic.com/, the latter of which can produce four analytics insights being “questions”, “prepositions”, “comparisons”, and “related”. The below image is with regard to the “questions” category. The greener the dot, the higher the search volume for those queries.
One other thing you can do is to compare multiple queries together. Below you can see the comparison between the terms “Data Science”, “Machine Learning”, and “Artificial Intelligence” generated by Google Trends. You can easily notice that, contrary to Data Science, AI was more popular at the beginning of the timeframe (the year 2004), whereas in the last years the popularity of AI has plummeted compared to “Data Science (2nd)”, and “Machine Learning” (1st).
Nevertheless, you should be careful not to reach a conclusion so fast! I couldn’t believe that the search interest in AI has been reduced, especially during the last years. Hence, although there are lot’s of parameters you can play around with and tweak in the Google Trends platform (location, timeframe, web search type (images, news, google shopping)), I finally found out that if you replace “Artificial Intelligence” with “AI” you will find that “AI” had always been in the first place!
Content is everywhere and you can optimize it both with On-Page and Off-Page SEO.
- Social media
- Link Building: getting backlinks from other sources (authoritative and popular)
- Creating quality and sharable content regularly
- Title tag of the page (meta title tag optimization)
- Keyword optimization
- Handle duplicate content correctly: Use link rel=”canonical” before href=” “, to resolve the issue on the occurrence of providing the same content but with different URL links. The canonical tag can be used to indicate which is the primary URL for duplicate content across your website pages. Another way to indicate that is via the crawl URL parameters section of the Google Search Console, and also at the Bing Webmaster Tools.
- Images – Audio – Video:
1. Improve your “src” and “alt” HTML attributes
- More technical part:
1. Construct HTML and XML sitemap
1. Descriptive but short, and concise as possible
2. Fix your redirect issues (suitably use the 301 and 302 redirections)
- Header Response Code (HTTP status)
- Page speed
- Qualitative and informative content keeping it up-to-date
- Internal links
- Lastly, you can optimize the server-side to improve your website’s speed, visibility, cashing, and server reliability.
I hope your website gets search engine optimized!
Also published here
Create your free account to unlock your custom reading experience.