There are multiple reasons for removing a page from Google’s index. Examples include pages with confidential, premium, or outdated info.
Here are options for removing a web page from Google.
For it to disappear altogether, remove or delete the page from your web server. Setting up an HTTP status code of 410 (gone) instead of 404 (not found) will make it clear to Google. And Google discourages using redirects to remove spammy pages as it would send the poor signals to the surviving redirected page.
Google Search Console no longer includes the URL removal tool. Once the page is moved, there’s no further required action. Allow a few days for Google to recrawl the site, discover the 410 code, and remove the page from its index.
As an aside, Google does offer a form to remove personal info from search results.
Search engines nearly always honor the noindex meta tag. The search bots will crawl the page (especially if it’s linked or in sitemaps) but will not include it in search results.
In my experience, Google will immediately recognize a noindex tag once it crawls the page. Adding the noarchive tag instructs Google to also delete its saved cache of the page.
Consider adding a password to retain the page without it being publicly accessible. Google cannot crawl pages requiring passwords or user names.
Adding a password will not remove the page from Google’s index. Use the noindex tag to exclude the page from search results.
Remove all internal links to non-public pages you want deindexed. Moreover, internal links to password-protected or deleted pages hurt the user experience and interrupt buying journeys. Always focus on human visitors — not just search engines.
Many people attempt to use the robots.txt file to remove pages from Google’s index. But robots.txt prevents Google from crawling a page (or category), not removing it from the index.
Pages blocked via the robots.tx file could still be indexed (and ranked). Furthermore, since it cannot access those pages, Google will not encounter noindex or noarchive tags.
Include URLs in the robots.txt file to instruct web crawlers to ignore certain pages or sections — i.e., logins, personal archives, or pages resulting from unique sorting and filtering — and spend the crawl time on the parts you want to rank.
Copyright © 2005 – 2023.
Practical Ecommerce® is a
registered trademark of
Confluence Distribution, Inc.