Skip to main content

How to Evaluate an XML Sitemap

Picture of Jonny

By Jonny

08th November 2024

graphic of an orange and blue sitemap

What is an xml sitemap?


An XML sitemap is a roadmap that guides search engines through your website, helping them index important pages more efficiently. It’s a file containing URLs for your site, along with information like when each page was last updated. Having a well-structured XML sitemap is essential for improving visibility in search engines and ensuring that all your important pages are indexed. Without one, search engines might overlook key content, which can hurt your site's ranking.

Regularly auditing your XML sitemap ensures search engines index your most valuable content while avoiding issues like broken links or unnecessary URLs. Follow the steps below to evaluate your website's XML sitemap. 

Graphic of a speech bubble with a tick over the word 'submit'

1.Check Sitemap Submission in Google Search Console

 

Submission Status: Ensure the sitemap has been submitted to Google Search Console and Bing Webmaster Tools. This ensures search engines are aware of your sitemap.
Check for Errors: Review any sitemap-related errors or warnings in Google Search Console’s Sitemaps section. These can include problems like unreachable URLs or parsing errors.

2. Verify Sitemap Structure

 

Proper XML Format: Ensure the sitemap follows correct XML formatting and structure. Each <url> element should include essential tags like <loc>, <lastmod>, <changefreq>, and <priority>.
URL count: Ensure the sitemap doesn’t exceed the maximum size limits:
- Maximum 50,000 URLs per sitemap.
- Maximum file size of 50MB (uncompressed).

3. Check for Indexed URLs

 

Compare the URLs in the sitemap with the URLs in Google’s index (using the Index Coverage Report in Google Search Console). Ensure that all important pages in the sitemap are indexed.
Use tools like Screaming Frog or Ahrefs to crawl the sitemap and check the status of each URL.

Graphic of a speech bubble with smaller graphics inside of it representing service, product, blog, and missing pages

4. Ensure All Important Pages Are Included

 

All important pages should be listed: Ensure that key pages like service pages, category pages, product pages, and important blog posts are included. Check that URLs important for SEO and conversions are present.
Check for missing pages: Sometimes, important pages can be inadvertently left out of the sitemap, especially if the site has a dynamic structure (e.g., large e-commerce or blog websites).

5. Check for Unwanted or Duplicate URLs

 

Avoid inclusion of non-canonical URLs: Make sure the sitemap doesn’t include URLs with duplicate content, canonical tags, or multiple versions (e.g., HTTP vs. HTTPS, www vs. non-www).
Exclude low-value pages: Pages such as search results, tag pages, archive pages, login pages, or admin URLs should not be included in the sitemap.
Ensure that URLs blocked in robots.txt or marked with noindex meta tags are not included in the sitemap.

6. Ensure Correct Use of HTTPS

 

If your site uses HTTPS, ensure all URLs in the sitemap use the HTTPS protocol. Any HTTP URLs should either be redirected to the HTTPS version or excluded from the sitemap.
Mixed protocols in the sitemap can confuse search engines and create indexing issues.

Graphic of a speech bubble with the forward/back symbols and a spread of pages

7. Check for Dynamic or Pagination Issues

 

Pagination: Ensure that if the site has paginated content (e.g., blog archives), the sitemap includes only the main pages or categories and avoids unnecessary listing of all paginated pages.
Dynamically generated URLs: Some sitemaps include unnecessary dynamic parameters (e.g., session IDs, filters, or search parameters) that don’t need to be indexed. Check if such URLs are included and exclude them if not needed.

8. Check URL Consistency and Cleanliness

 

Consistent URL structure: Ensure that all URLs in the sitemap follow a consistent URL structure. Avoid URLs with trailing slashes mixed with non-trailing slashes or case inconsistencies (e.g., /About-Us/ vs. /about-us/).
No unnecessary query parameters: Exclude URLs with unnecessary query parameters like tracking codes, filters, or session IDs unless they are critical for indexing.

9. Use of Last Modified Date

 

<lastmod> Tag: This tag informs search engines when a page was last modified. Ensure that the <lastmod> tag is accurate and updated for URLs that frequently change (e.g., blog posts, product pages). This helps search engines prioritise crawling those URLs.

Graphic of a speech bubble with relevant tags

10. Frequency and Priority Tags

 

<changefreq> and <priority> Tags: These tags are optional but can be useful to guide search engines. However, they should be used logically:
<changefreq>: Ensure this tag reflects the actual update frequency of the page (e.g., daily for news articles, monthly for static pages).
<priority>: Ensure the priority tag reflects the relative importance of the page (e.g., home page or key landing pages should have higher priority than lesser-important pages).

11. Sitemap Index File (if applicable)

 

For large websites with multiple sitemaps, there should be a sitemap index file that links to all individual sitemaps (e.g., blog, product categories, etc.). Ensure this index file is properly configured and that each listed sitemap is working and up-to-date.

12. Check for Broken or Redirected Links

 

Use a tool like Screaming Frog to crawl the sitemap and check for broken URLs (404 errors), redirected URLs (301/302), or server errors (5xx). All links in the sitemap should lead to valid pages without redirects.
Fix or remove any URLs that return errors or redirects to avoid wasting crawl budget.
 

13. Human-Readable Location

 

Ensure the sitemap is accessible to users and search engines at https://example.com/sitemap.xml. This is the standard location and makes it easier for search engines to find the sitemap if it’s not submitted via Search Console.

Graphic of a speech bubble with tools, search, and warning symbols inside

Common Issues to Look for in an XML Sitemap:
 

  1. Missing or Incorrect URLs:
    Important pages like category pages, product pages, or blog posts missing from the sitemap.
    URLs with incorrect status codes (e.g., 404s or 301s).
    Non-canonical or duplicate URLs included.

  2. Including URLs with Noindex Tags:
    URLs that are marked as "noindex" in the meta tags should not be in the sitemap, as they are not intended for indexing by search engines.

  3. Inclusion of Unnecessary URLs:
    Including low-value pages such as admin pages, search results, or pagination in the sitemap.

  4. Mixed Protocol Issues:
    Mixing HTTP and HTTPS URLs in the sitemap, or mixing www and non-www versions.

  5. Exceeding Sitemap Limits:
    Having a sitemap that exceeds the 50,000 URLs or 50MB uncompressed file size limit. If exceeded, the sitemap should be split into multiple smaller sitemaps, with an index file to link them.

  6. Outdated Lastmod Tags:
    Incorrect or outdated <lastmod> dates can lead search engines to deprioritize crawling important updates.
     

Graphic of a speech bubble with an increasing graph inside of it

Optimise Your Sitemap for Long-Term SEO Benefits

 

By following these steps, you can maintain a healthy, high-performing sitemap that boosts your website’s visibility and search ranking.

At JL Creative, we specialise in ensuring your website is optimised for search engines. Contact us today to schedule a full technical SEO audit and let our experts handle the details, so your website performs at its best!

Speak to an expert

Enter your details below and we'll call you back within 1 working day.

What do you need help with?
Tell us more below
For Example: We are looking for help generating more leads from our website
By submitting this form you agree that a JLCreative Ltd representative will contact you on the email provided to discuss your business plans. See our privacy policy for more information.
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.