Audit Content From A To Z

If you mention content, you will not be strangers to what content is anymore.

However, for many people, including SEOer and “newbie” Markers, what is an audit, and what is audit content, how important it is, not everyone can seize it.

In this article, I will show you the process of making a content audit for a website in the most detailed way.

The first part of the article includes methods to get data and how to filter data, and fill out excel files. Next part 2 will cover how to classify and provide actionable solutions.

But first, I will help you quickly understand the concepts related to audit content. Let’s start!



1/ The concept of Audit Content

As you probably know, SEO Audit is the process of checking and assessing the reality of a website.

SEO audit helps identify problems that need to be improved, offers solutions, and orients the website development strategies.

Content Audit, or similar audit content, is the process of analyzing the overall content of a website.

The audit content helps to comprehensively change the quality of the website’s content and provides more value for readers while increasing the quality of the website, and improving the search engine rankings.

So, is audit content difficult? How to audit content?

I will accompany you step by step and as detailed as possible.

Let’s go!

2/ Identify content that needs improvement

Depending on the product and user intent, each site will choose different content styles. However, in general, there are 5 types of content that every website should avoid:


Poor quality content

What is poor quality content, I will answer you shortly:

  • Content has no visitors to watch for long periods (over 4 months) or no keyword ranking at all.
  • Content duplication of content will lead to cannibalization – articles with the same topic of SEO services compete with each other.
  • The content has not been well optimized because you have not researched the user, the outline is not good, and the user intent has not been determined correctly.
  • The content target is not the correct keyword. For example, the article information targets service keywords again.

Thin content

  • Duplicate internal content when copying one or several articles on your domain.
  • Duplicate external content when copying one or several articles on someone else’s domain.
  • Not exactly a 100% duplicate but a 70-80% duplicate.
  • The page has almost no content but only the menu, footer, and sidebar.
  • The page has more ads than content.

However, some e-commerce product pages are forced to duplicate content or write short content, such as the computer market, the mouse, and the keyboard will have content that is the correct parameter. Many contents are required to duplicate large (less content) such as business content such as contact pages, and recruitment.

Unrelated content

Usually, the website has 3 main content types:

  • Main content: 75%
  • Additional content: 20%
  • Content rising (trending in the field): 5%.

For example, Hudareview’s website has 75% of the main content in SEO and inbound marketing, support content related to social media marketing and businesses makes up 20%, and other topics like blockchain.

So the content is irrelevant when:

  • Content is not related to the topic that your business is interested in.
  • The ratio of complementary content and content is rising too much.
  • Content does not bring value to businesses.

Underperformance content

Is the content:

  • Being in the top 6-20 (sometimes 6-25)
  • Previously there was good traffic, but for reasons like Google updates or strong competitors, the traffic dropped compared to before.

High traffic content

Strange, why content with high traffic needs an audit?

Simply, good is not perfect. High-traffic content is having a lot of traffic and if it is well-optimized, there will be more traffic.

Or pages with high-traffic content but a high bounce rate, you should also have some solutions to improve.


1/ Enter the data

First, you need to access Screaming Frog and purchase accounts to be able to use the important features to help audit content.


After the account purchase is complete, you just need to download Screaming Frog and proceed to log in.

To set the standard settings for Screaming Frog, select Configuration → Spider → Basic and click the settings as shown below:


On the Render tab, select Old AJAX Crawling Scheme. Continue to set the Advanced tab settings as shown below:


The remaining tabs are for default, and some other settings:

  • Configuration →txt → Setting → Respect robots.txt → Show internal URLs blocked by robots.txt → OK.


  • Configuration → User-Agent → Googlebot Smartphone (because Google’s algorithm will prioritize scratching the mobile version first)


So you’ve completed the basic installation of Screaming Frog.

After that, we will extend the tool’s functionality by connecting to the Search Console API (formerly Webmaster tool) and Google Analytics.

If you do not know then I will briefly tell you what is Google Search Console.

Search Console is a tool for website status and performance, while Google Analytics focuses on analyzing user and traffic objects.

To connect to the API, go to Configuration → API Access → Google Analytics → Enter an account in the existing account box → Connect to a new account → Select GA management account → Allow.


So Screaming Frog has been successfully connected to GA. You can choose to continue the project in the Property, Views, and organic traffic sections.

Do the same when you want to connect to Search Console.

In addition, in these two tools, you should choose the time period in the Date Range tab. Here you should choose a survey period of 3 months or more to have enough analytical data.


So, you completed the connection step.

To get data from Screaming Frog, enter the domain website into the search bar and select Start.


You can track the progress of the rake through the Crawl bar.

The crawl speed is fast or slow depending on the device configuration and Wifi quality.

After the tool has finished running, you can proceed to export the excel file to all data.

At this time, the filter tool of excel will be an effective aid to help you classify and narrow the scope of the survey. Please filter by the following basic criteria:

  • Content column: select the analysis object that is images or text content. For example, keep cells that contain text when you only want to audit content.


  • Status column: retains cells with status 200, because error URLs 404, 500, or 301 redirects are not the main objects for analyzing content.


  • Indexability column: Delete non-index cells.


After filtering the data, you leave only the following columns:

  • Address
  • Title
  • Meta description
  • H1
  • Word count
  • GA Session
  • GA New User
  • Bounce rate
  • GA Avg Session
  • Clicks
  • Impressions 
  • Position.

Moving on to the Content Classification sheet, you need to know the following:

⁕ URL Thin Content

After filtering data for the second time, organize the page according to the Word count from low to high.

⁕⁕⁕Articles with 800 words or less will be rated as Thin Content.

That is, the content is too short, does not guarantee the quality, and needs improvement. Unless the number from the home page is low, it’s not a big deal.

Note: The word count in Screaming Frog is based on the number of words in the code, so it will count all the words on the menu bar, sidebar, and footer … on the website.

Therefore for article-quality content unique to 800 words or more, the word count must be over 1000 words.

However, you also need to consider the user intent because not all websites need content too long.

⁕ Duplicate content

Duplicate content is one of the serious content errors affecting the SEO effectiveness of the web.

Screaming Frog can detect duplicate errors in the title, meta description, and H1.


⁕Content underperformance

Another criterion to include in the data file is content underperformance, to filter articles with good ranking potential keywords.

This data can be exported from Ahrefs and Search Console, but I still prefer it from Search Console.

So how is the data from Ahrefs and Search Console different?

With Ahrefs, for example, any URL A of you is ranking 372 keywords but the results will only show the highest keyword performance.

While Search Console will average the performance of 372 keywords to get top page results, so it will be more objective.


To select the Top pages in Search Console, you will filter the data by the last Position column in the sheet, taking only 5-20 rankings.


Number Filter> Between and enter from 5 to 20 to select the content underperformance.

⁕The URL is trending down

After observing the traffic results on Ahrefs and Google Analytics, you filter out the URLs that tend to be analyzed further and offer improved solutions to return to the top.

In the content audit sheet, you start importing the exported and filtered specific information including URL, Action, Content-Type, Title, Word count, RD, GA Session, GA Bounce rate, GA time onsite, Clicks, Impressions, and Position.

RD is referring domains, you can get data from Ahrefs → Best by linking → Export → Use Vlookup to find RD with the corresponding URL in the Content audit sheet.

2/ Content filtering needs improvement

You already know how to identify problematic content that needs improvement. But with a website that has many articles, how to filter out that content without having to go through each article?

Please open the data file from Screaming Frog and follow the steps below!

Import data exported from Screaming Frog into the audit sheet as follows:


* Note: only select the analysis of content pages (text), status 200, and good index.

In the audit content process, depending on each project, you have the flexibility to remove some unnecessary columns.

For example, this time, I only retained the data including the URL, action, content type, title, word count, GA session, Bounce rate, and average session duration.

As mentioned we will have 5 types of content.

Before going into the content classification, you absolutely can rely on the URL that has selected the poor quality and fast processing by deleting posts, 301 redirects, or noindex.

For example, in the case of pagination in the category, the best way is no index.


⁕How to filter Thin Content

From word count, you can filter out the thin content type.

However, depending on the market segment, the number of words assessed as thin content varies.

In fact, some projects do not need too much content but mainly need images such as fashion, electronics, appliances …

For example, usually, you convert the article to 700 words as thin content, for these markets an article about 500 words is qualified.

In the case of thin content being an enterprise entity article, the action will be “stay the same” or “do nothing”, because as I said above, this is the feature of each page, and cannot require writing too much.

⁕How to filter high-traffic content

Next, based on the GA session column you can further categorize content as high traffic, this metric is also based on each field that is rated as high or low.

To further categorize, you rely on URLs or titles to categorize non-business-related content.

Now I will show you two ways to filter underperformance content.

⁕How to filter Under Performance Content

Method 1:

Go to Ahrefs → Organic keywords → Movement → Export file to observe the full movement of the keyword up or down the past time.

The output data will include specific dates and times, now you rely on the date column to filter out the URLs after the website milestone has relegated or dropped sharply for analysis.

For example, before July 2019, your organic search is still good, but from July 1, 2019, the traffic decreases.

So you will choose to get the URL from 1/7/2019 onwards. After removing duplicate data, you will have a list of URLs, which will be compared to the URL being analyzed to filter underperformance URLs.

You do not have Ahrefs?

Don’t worry I’ll show you another way to find underperformance.

Method 2:

Google Analytics → Conversion → Channel → Organic search.

To compare which articles tend to decrease compared to the previous period, choose the highest traffic time frame and the most decrease.

Note: It is important that these two periods have the same number of days, for example, take the same 30-day or 31-day data.

For example, with, I will choose 61 days of the most decreased traffic (March 1 to 30 April) and 61 days of growth again (August 1 to 30 September).

However, you should refrain from choosing a number of time frames for which traffic in your field will grow dramatically. For example, if you are selling moon cakes, your traffic (should be) tends to increase sharply around August and September.

Based on the data in the Change section you can see how traffic is increasing and decreasing. You can easily export this data similar to Ahrefs.

Particularly traffic results in March and April, you just VLOOKUP.

To find the URLs with poor performance, you just need to use the simple IF formula as follows: = IF (C2 <B2; true).

After filtering out the URL with a true result, this is the list of URLs that drop traffic or underperformance.

Using the VLOOKUP function, compare this list with data from the previous Screaming Frog to mark on the audit file those URLs whose content type is underperformance.

Articles classified as poor quality content will be those that do not output session data, bounce rate, duration (usually due to the newly created URL), or less traffic.

So you have categorized all 5 types of content that need to be improved.

3/ Solution

After classifying content, I will give solutions to each content problem:

a/ Poor quality content

Case 1: Little traffic, no backlinks, cannibalization phenomenon

For posts with little traffic, no backlinks, and cannibalization (ie, the same topic or target of the same keyword) you can find ways to combine these articles and optimize the content into a complete post.

Case 2: Duplicate content

For poor-quality content due to duplicate content, it’s best to delete those posts and 301 redirects to the most relevant page. Don’t forget to adjust the internal links since the deleted post will be 404.

To detect broken internal links, you can use a powerful tool like a screaming frog or a website auditor.

The screaming frog can help you crawl 404 links → click on any A link → in link → the tool will display the pages pointing to link A. So you just need to review these links to remove link A is done.

Case 3: The wrong target keyword enters the inappropriate landing page

If from the beginning, you have targeted the wrong landing page due to the wrong group of keywords.

For example, with SEO keywords, you should group them with keywords that define SEO, and SEO to write an overviewed article to introduce this topic.

But you go in groups with keywords SEO services.

When grouping the wrong keywords in the first place, you can delete the wrong article and rebuild the content from the beginning or consider optimizing the old article to save time and effort.

Case 4: Target is great, but it doesn’t bring in traffic and has no backlinks

Now you will check if the content has met the criteria for the outline, and image … or not.

In this step, you can refer to the video series of GTV content to ensure optimal optimization of important factors such as satisfying user intent and creating superior and unique content compared to competitors.

After checking the content standards, you continue to check the on-page. If the page has been well-optimized, continue to look at the topic cluster.

If you’ve been following GTV for long enough, you’ll know I often apply a topic cluster following a silo structure to build a network of sub-posts that support the main posts at the top.

However, before starting to build a support article, you need to consider whether the keyword SEO is too difficult to necessarily do or not.

Because when building content downstairs, you also have to invest in outlining, writing articles, editing, linking… just like the main article.

If you think this investment is completely worthwhile, do not hesitate to consult GTV’s articles about researching support content.

Or you can review the content on the website to see whether you can take advantage of any articles included in the cluster content.

Do not forget to build internal links for cluster content and optimize on-page to ensure the best results.

Once you have built content in a topic cluster, you need to wait about 5 months for Google to understand the content on your website and give better rankings.

After that time, if the results are not satisfactory, you can optimize off-page / entity to push these articles up.

b/ Thin content

Case 1: The page has no traffic, no backlinks, and no good keyword targets

In case the page does not bring any value, it is best to remove the article from the website by redirecting 301 and removing the internal points to that article (the same way as the instructions above)

Case 2: No traffic, no backlinks but a good target

If the target page has good keywords with high search volume, you should check your cannibalization ability. I mentioned how to handle cannibalization in poor-quality content.

You will add up the same target keyword articles into the strongest post, then optimize the outline, content, and on-page.

If the article targets the keyword is not the same as any other article on the website, you should find ways to optimize content such as reviewing the outline, and optimize on pages like internal links, and related content …

Case 3: Many traffic, with or without backlinks, targets well

This is the simplest case, you just need to add the content to not be judged as thin content anymore because this can become a time bomb for Google to sentence your website for penalties at any time.

Case 4: Entity

Entity content is the contact page posts, recruiting, referrals, and privacy policy … cannot write content that is too verbose. With the content required to do so, you can keep it. But try to optimize as much as possible.

For example, in the photo galleries of business photos, you can also add a few lines of text to thicken the content.

Or if you have many articles in the form of business entities on the same topic, you can include these articles to ensure quality length for content.

c/ Unrelated content

Case 1: Running ads

If the content is irrelevant for the purpose of running the ad, you should tag the page with no index so that Google doesn’t crawl it.

Case 2: There is the conversion

The page has irrelevant content but has a conversion rate, meaning that users who purchase on it should be kept, so you should or even optimize it if possible.

Case 3: No traffic, no backlinks

You can delete the post and 301 redirects to the most relevant article and delete the internal link as instructed above.

Case 4: There are backlinks and traffic

Some unrelated posts pull in a lot of traffic but are of poor quality.

For example, you sell watches but write about how to crack Windows 10. This article is what a lot of interested people looking for, but it does not help you find customers who buy watches.

In this situation, you check if the amount of traffic generated converts, then rewrite more relevant content for the page.

On the other hand, if this traffic is of no value, you should add an evaluation step to see if you can navigate to other related content before deleting and 301 redirecting.

d/ Underperformance content

* Note: only applies to posts published> 4 months.

Case 1: Keywords top 6-20

With these keywords, you can find ways to promote higher rankings by optimizing on page or content.

In terms of content, you can try applying a new GTV trick called re-usage content.

Case 2: There is used to have high traffic and have backlinks

You should review the content and update it if necessary. Or you can use the re-usage content process in this case.

After republishing the article, you need to adjust the latest content date on the website to push the article to the top pages of the category.

Also, don’t forget to optimize the on-page and apply the internal link theme using the related anchor text.

e/ High Traffic

If “engage” is not good (time on site, bounce rate) then you need to find ways to improve these metrics.

To optimize the high-traffic pages even better, you can still apply the re-usage content process as above.

And finally, to sum up, here’s a Flowchart about the Audit Content Process.

On average, every 3 months, the website needs to audit content once to ensure comprehensive web quality.

Leave a Reply