Quickly Identify Words On Large Websites That Are Throwing Off Your Pages’ Relevance

Google is usually very good at detecting the overall topic of your pages, so the following tutorial will typically apply to less than 2% of your pages. But if it only takes us 5 minutes to implement it, it could be more than worth it. Whenever we write an article, we include all kinds of keywords that are related to the topic, but sometimes (rarely) Google will get confused as to what is the overall topic of the article.

For example, if you are mentioning “Baseball steak” on a recipe page multiple time, and Google starts ranking the page for sports related “baseball +some other terms”. This may not be the best example, because in this case Google will certainly recognize that the page is about recipes and not sport, but you get the idea…

On one page (not on this site) I removed 3 instances of one word that was irrelevant to the topic (ranking around 40th position), and “pinged” the page. Google stopped ranking the page for combinations that included that word, and slowly started ranking it for the main topic of the page:
Change after removal of the words

The short version algorithm for finding those words is:
  1. For every page of your site get the title of the page
  2. For every page of your site get the query with the most search impressions
  3. Go through the list manually and see if the query is relevant to the title of the page
  4. OR…ask your favourite AI to help you with it.

Now, let see how we can quickly implement all of those steps. There are many ways to accomplish this. The fastest way to get queries with the most impressions for every page is to use Looker studio and connect it to Google Search console. This tutorial may seem long, but it should take you less than 5 minutes to finish, especially if you have any experience with Excel/Google Sheets and Looker Studio.

  1. Go to https://datastudio.google.com/ and login with your Google account
  2. You should end up on the Looker Studio homepage
    looker studio homepage
  3. Click Blank Report.
  4. You will be taken a page of data sources that you can connect to. Scroll down and click on Search Console
    Google search console connector
  5. You will be presented by a list of websites. Select the website you want to explore and click URL impression in the Tables column.
  6. Click on Web in the “Search type” column. Then click Add.
  7. You will be taken to the report page. On the right side, in Chart column under Dimension click on Add dimension.
  8. Click on Landing Page. Click on Add dimension again and add Query.
  9. Remove Google Property from that list.
  10. Under Metric section click Add Metric and then select impressions
  11. You can also remove URL clicks in this section.
  12. Your chart should now display all URLs and queries that they had impressions for. It should look something like this:
    Metricsrule queries chart
  13. You can keep the default date range of 28 days.
  14. Now we will export the chart into Excel for sorting.
  15. Right click on the table and select Export, then choose your favourite spreadsheet (I used Excel this time).
  16. In Excel load the file you just exported and click on the Impressions column, then Data tab, then Z-A under Sort.
  17. Click on Landing Page column, then Z-A under Sort.
  18. This will sort queries for every page by their impressions descending
  19. Highlight all 3 columns and click on Remove Duplicates (Data Tab)
  20. Then just select Landing page.
  21. You will end up with a list of URLS and queries with the highest number of impressions for that page. Using this site as an example, my table looks like this:
    Excel queries and their landing pages
  22. Next step is to get the title tag for each query. There are multiple ways to do this on scale, including WordPress plugins, Excel VBA, Google Sheets Command: =HYPERLINK(A1, IMPORTXML(A1,”//head//title”) ), but I used Screaming Frog since I scanned the site anyways.
  23. Export the list of URLs and their titles on the same sheet, and use =VLOOKUP to match URLs queries to their titles.
  24. Now you have a list that looks something like this:
    final table
  25. Now you can take the list to ChatGPT (GPT 4 works the best) and ask it to identify queries:
    I will give you a list of page title, queries and urls separted by ",". go through the list and print out all lines where query is not semantically related to the page title. This is the list: Prioritized SEO Audit Template,seo audit template,https://www.metricsrule.com/tips/prioritized-seo-audit-checklist/...
  26. In my example it printed out one line: How to analyze large web sites to prioritize content for update, redirection, merge or deletion,screaming frog looker studio,https://www.metricsrule.com/tips/analyze-content-of-large-web-sites-to-prioritize/. But I know that “screaming frog” and “looker studio” are closely related to the topic and need to be on that page so I won’t be removing them in this case.

You can use the same method to go through ALL of the queries for any page, but this was just a quick example.