Sometimes when we go to a (pre-existing) page we get 404
error - page not found.
This page has been deleted, the site is not accessible, etc., but how to view the deleted page
? I will try to answer this question and offer four ready-made options for solving this problem.
Google Search
Browsing cached pages in Google begins the same way as any other Google search. Once you've entered a search query and see the results, click the arrow next to the URL and select the Saved Copy to view Google's most recently saved versions of the pages.
When the site loads, Google notifies you that this is an outdated version and indicates the date it was created. There is also an option to view only the text version of the page and the source code. You will not be able to navigate to other pages and still remain in the cache version. If you try to follow the link, the current version of the site will open.
How to find deleted pages linked to by other sites
Hi all! Recently, I received one rather non-trivial question by email - “How to find deleted pages of a site that are linked to by other resources?” That is, there was once a document that was referenced by another project. Then the page was deleted (accidentally or on purpose) or the URL was changed, and the link began to lead to a document with a 404 error. What to do in such a situation?
Before continuing the post, I want to thank everyone so much for the birthday wishes and comments on this post! Pleasant and life wishes - very nice
! The results of the mini-competition will be summarized at the end of the article.
Young sites, as a rule, do not suffer from the problem of missing pages. This applies more to older projects where something was deleted after a long period.
So, finding remote pages with backlinks will be useful for several reasons (especially since it is free or at low cost). First, you will find out which documents were deleted. From now on, it is better not to delete such pages (after all, they collect links). Secondly, most likely, natural links were placed on these documents (since you didn’t know about them; if you had known, you wouldn’t have deleted anything
). Thirdly, you will find incorrect links that were placed on your resource (you will be able to correct the situation). Fourthly, it will be possible to find out whether no one is deliberately placing return links on various non-existent pages of the project.
Finding pages with backlinks
At first I asked myself the question: “How to do all this?” Obviously, you need to analyze the link mass, or rather the pages to which at least 1 link leads. There are several tools for this. You can’t do everything manually!?
1. Yandex.Webmaster. Go to “Indexing” -> “Incoming Links” and download the archive with data on the incoming link mass.
There will be both donors’ documents and yours. The only thing is that the file is in txt format, so for ease of work you need to copy everything from it and paste it into a table, for example, excel.
2. Google Webmaster. We do almost the same thing with the tools that Google provides. Go to “Search Traffic” -> “Links to your site”. Next, click “Advanced” in the “Your most frequently linked pages” block.
We display 500 pieces each. We select all the lines, copy and paste them into the same excel, then remove everything unnecessary. There will be an inconvenient moment with substituting the domain name for parts of pages (Google Webmaster only shows unique parts of links). Probably, somehow you can conveniently substitute the main domain through a macro in the same Excel. If anyone knows, please tell me in the comments
. Thanks Profitcore for the simple solution!
As a result, we get an excel file with 2 databases. Most webmasters can stop at this point. Perfectionists can go further by paying a little for more information.
3. Ahrefs.com. I think that many people are familiar with this service. Unlike the first 2, it is paid. Ahrefs can also provide similar information. It is likely that the Ahrefs database will contain pages that neither Yandex nor Google have shown.
4. Backlink from Page Weight. This is a link to a blog post that describes how the service works. In a nutshell - a budget analogue of ahrefs for those who do not have a paid account there. The database is the same, but a standard subscription costs 500 rubles.
As a result, we get a database with pages to which links from different sources lead. Most likely, it will contain duplicates. In order not to burden yourself, yours and other people’s computers with unnecessary data, you need to delete them. Excel 2007 makes this very easy. First, select all the lines (you can use Ctrl+A). Then go to the “Data” section and click on “Remove duplicates”.
In the window that appears, click “Ok”. All duplicates have been removed. Very useful feature
.
Bulk verification of server response
Now we need to find out whether all the pages of our site that have a backlink mass work as they should, or whether there are those that give a 404 error (document not found). To do this, let’s add our list to one of the following services (thanks to the creators for their development):
- https://4seo.biz/tools/31/ - free, fast and understandable.
- https://coolakov.ru/tools/ping/ - longer.
- https://www.seolium.com/seo/tools/http-status-checker/ - also not very fast.
Using the first service, I received 1 page with a 404 status.
I'll switch to it. Indeed - “Nothing found.” From the URL I understand that this relates to this post. It's just that the link is wrong. Not 10,000, but 1,000. Using the file from Yandex.Webmaster, I look where this link leads from.
I receive 4 returns from grabr.ru. Probably, I once indicated the URL incorrectly when I gave an announcement on this social network for webmasters
.
Further actions
There are several action scenarios that depend on different situations:
- In my case, a 301 redirect would be appropriate (link to an existing document at a different URL). So I did.
- If the owner of the site made a mistake, you can write to him and ask him to change the address to the correct one.
- Restore (if appropriate) or create (if, for example, the link is from a quality site, and the owner does not respond to emails) a page using a URL that gives a 404 error.
In this simple way you can restore some links that can be useful when promoting your site. An excellent item for a todo list for a project over 2 years old. Is not it
? This event can be held once every 1-2 years, both for your resources and for client sites.
If you know an option that makes it easier to find similar pages, please write in the comments. I'll be glad to read it. Or maybe he invented the wheel
.
Results of the birthday mini-competition
Thanks again for your comments and congratulations! I am summing up the results of the mini-competition. As many people know, the blog is pre-moderated for commentators who do not have at least one approved comment (protection from spammers). In this regard, the picture that was shown yesterday is different from the one that is shown now: today I approved all the reviews for the post.
Firstly, it held some intrigue. Secondly, it did not set an example of very persistent commentators
. So, here are the winners of the competition (comment number and name):
13 — Bulbash 26 — Sergey 39 — Alexander 52 — Alexey 65 — fktrc 78 — albedo
I'm waiting for your R-wallets, sent from the same email from which the review was left
. That's all for today - see you soon! I'm 28 already
Wayback Machine
There are organizations that are trying to preserve the history of the Internet. The best-known such organization is the non-profit Internet Archive, which stores websites, text, video, audio, software and images that are difficult to find elsewhere. You can also view older versions of the website on the Wayback Machine.
Enter a URL and the archive search engine will display a calendar showing when the Wayback Machine saved the page. Click on a date on the calendar to see what the site looked like on that day. The Wayback Machine is a great way to study internet history.
Down Or Not
If you need a cache of sites on the Internet due to the unavailability of a particular resource, but searches do not lead to anything, it is worth checking to see if there is a problem near you. For example, an Internet provider performs technical work or replaces outdated equipment. To check who is to blame, it makes sense to use the Down Or Not service.
Enter the address of the portal you need in the search bar and press the ENTER button. After a short analysis, the service will display the result. The word DOWN indicates the unavailability of the resource (temporary or permanent), but if the word UP appears on the screen, it means that everything is in order with the portal.
Down Ot Not acts as a third party and unbiased expert to determine what exactly is causing the problem.
Browser extensions
There are browser extensions for all occasions, including for accessing a cached version of the site.
Add the Web Cache Viewer extension to Chrome and right-click on any page to view the Google or Wayback Machine version. An extension called View Page Archive & Cache for Chrome or Firefox goes even further and allows you to view cached versions of web pages from numerous search engines such as Bing, Baidu, Yandex.
Dead URL
Dead Address provides similar options for users. Copy the broken URL from the address bar and paste it into the input field on the site. The service will think a little and produce several results. Some of them will link to a Google resource. The other part will take the user to the Archive pages. What’s important is that the site cache is sorted by date, which is very convenient.
Browser cache when all else fails
You can’t view the entire page this way, but images and scripts from some sites are stored on your computer for a certain period of time. They can be used to search for information. For example, using a picture from the instructions, you can find a similar one on another site. Briefly about the approach to viewing cache files in different browsers:
Safari
We are looking for files in the ~/Library/Caches/Safari .
Google Chrome
In the address bar type chrome://cache
Opera
In the address bar we type opera://cache
Mozilla Firefox
Type about: cache and find the path to the directory with the cache files.