Correct methods for removing the sheets of pages. DOUBLE pages, solution problem correct work with double pages

Duplicas are pages on the same domain with identical or very similar content. Most often appear due to the features of the work of CMS, errors in Robots.txt directives or in setting 301 redirects.

What is the danger of doubles

1. Incorrect identification of the relevant page of the search robot. Suppose you have one and the same page available on two URLs:

Https://site.ru/kepki/

Https://site.ru/catalog/kepki/

You have invested money in the promotion of the page https://site.ru/kepki/. Now it refers to thematic resources, and it ranked positions in the top 10. But at some point, the robot eliminates it from the index and in return adds https://site.ru/catalog/kepki/. Naturally, this page is ranked worse and attracts less traffic.

2. Increasing the time required for crossing the site by robots. On the scan of each site robots allocated limited time. If a lot of doubles, the robot may not get to the main content, because of which the indexing will be delayed. This problem is particularly relevant for sites with thousands of pages.

3. Overlaying sanctions from search engines. By themselves, the duplicas are not a reason for the pessimization of the site - as long as search algorithms do not count that you create a swill intentionally in order to manipulate the issuance.

4. Problems for webmasters. If the work on eliminating the doubles to postpone in a long box, they can be accumulated by such a quantity that the webmaster is purely physically it will be difficult to process reports, systematize the reasons for the dubs and make adjustments. Large work increases the risk of errors.

Dupils are conventionally divided into two groups: explicit and implicit.

Explicit duplicas (page available on two or more url)

There are many options for such doubles, but they are all like their essence. Here are the most common.

1. URL with a slash at the end and without it

Https://site.ru/list/

Https://site.ru/list.

What to do: Configure server response "HTTP 301 MOVED PERMANENTLY" (301th Redirect).

How to do it:

- find in the root folder of the site file.htaccess and open (if there is no - create in TXT format, call.htaccess and put in the site root);
- prescribe in the file file for redirect with the URL with a slash on the URL without a slash:

RewriteCond% (Request_FileName)! -D
RewriteCond% (Request_uri) ^ (. +) / $
Rewriterule ^ (. +) / $ / $ 1

- reverse operation:

RewriteCond% (Request_FileName)! -F
RewriteCond% (Request_uri)! (. *) / $
Rewriterule ^ (. * [^ /]) $ $ 1 /

- if the file is created from scratch, all redirects must be prescribed inside such lines:

…

Configuring 301 redirect with .htaccess is suitable only for apache sites. For NGINX and other servers, redirect is configured in other ways.

What url is preferred: with or without slam? Pure technically - no difference. Look in the situation: if more pages are indexed with a slash, leave this option, and vice versa.

2. URL with www and without www

Https://www.site.ru/1.

Https://site.ru/1.

What to do: Specify the main mirror of the site in the webmaster panel.

How to do this in Yandex:

- go to Yandex.Vebmaster

- select the site in the panel from which the redirection will go (most often redirected to the URL without www);
- go to the "Indexing / Site Moving" section, remove the checkbox in front of the "Add www" item and save the changes.

Within 1.5-2 weeks of Yandex, the mirrors will reinperse the pages, and only the URL without www will appear in the search.

Important! Previously, to specify the main mirror in the Robots.txt file, it was necessary to prescribe a HOST directive. But it is no longer supported. Some Webmasters "for the Safety" still indicate this directive and for even greater confidence set 301 redirect - this is not necessary, it is enough to adjust the gluing in the webmaster.

How to glue mirrors in Google:

- go to Google Search Console. and add 2 versions of the site - with www and without www;

- select the site from which the redirection will go from the Search Console;
- click on the gear icon in the upper right corner, select the "Site Settings" item and select the main domain.

As in the case of Yandex, additional manipulations with 301 redirects are not needed, although it is possible to implement gluing with it.

What should be done:

- unload a list of indexed URLs from Yandex.Webmaster;
- download this list into the Seopult list tool or using the XLS file (detailed instructions for using the tool);

- run the analysis and download the result.

In this example, the Phagination page is indexed by Yandex, and Google is not. The reason is that they are closed from indexing in robots.txt only for the Bot Yandex. Solution - set up canonization for pagination pages.

Using the parser from Seopult, you will understand, duplicate pages in both search engines or only in one. This will allow you to choose optimal problem solving tools.

If you do not have time or experience to deal with the doubles, order an audit - in addition to having a double you get a lot useful information About your resource: the presence of errors in HTML code, headlines, meta tags, structure, internal passing, usability, content optimization, etc. As a result, you will have ready-made recommendations on your hands, which will make the site more attractive for Visitors and increase its position in the search.

The owner may not suspect that on its website some pages have copies - most often it happens. Pages are open, with their contents are all in order, but if you only pay attention to the URL, then you can see that the addresses are different at the same content. What does it mean? For live users, nothing, since they are interested in information on pages, but the soulless search engines perceive such a phenomenon completely differently - for them it is completely different pages with the same content.

Are DOUBLE pages are harmful?

So, if the ordinary user can not even notice the presence of a double on your site, then search engines will immediately determine. What reaction from them to wait? Since, in essence, the search robots see like different pages, then the content will cease to be unique on them. And this already negatively affects ranking.

Also, the presence of a duplicate erods the reference weight that the optimizer tried to focus on the target page. Because of the double, he may not be at all on that page that he wanted to move. That is, the effect of inner translets and external references can be repeatedly reduced.

In the overwhelming majority of cases in the occurrence of a double, CMS is to blame - due to not right settings And the lack of proper attention of the optimizer is generated clear copies. With this, many CMS are sinning, for example, Joomla. To solve the problem, it is difficult to choose a universal recipe, but you can try to use one of the plug-ins to delete copies.

The emergence of fuzzy doubles, in which the content is not fully identical, usually occurs due to the fault of the webmaster. Such pages are often found on online store websites, where pages with goods are characterized by only a few sentences with a description, and the rest of the content consisting of through blocks and other elements is the same.

Many specialists argue that a small amount of doubles will not hurt a site, but if more than 40-50% more than 40-50%, then the resource can wait for serious difficulties. In any case, even if the copies are not so much, it is worthwhile to do with their elimination, so you are guaranteed to get rid of problems with dubs.

Search Page copies

There are several ways to search for duplicate pages, but first you should contact several search engines and see how they see your site - you only need to compare the number of pages in the index of each. This is quite simple, without resorting any additional means: in Yandex or Google enough in the search string, enter Host: Yoursite.ru and look at the number of results.

If, after such a simple check, the quantity will be very different, 10-20 times, then this is with some more likely to talk about the contents of the dub in one of them. Page copies can be not to blame for such a difference, but nevertheless it gives a reason for further more thorough search. If the site is small, you can manually calculate the number of real pages and then compare with indicators from search engines.

Search duplicate pages You can search for a URL in the issuance of the search engine. If they have to be CNC, then pages with a URL of incomprehensible characters, like "index.php? S \u003d 0F6B2903D", will immediately be embarrassed from the general list.

Another way to determine the presence of a duplicate by the means of search engines is a search on text fragments. The procedure for such an inspection is simple: you need to enter a text fragment out of 10-15 words from each page in the search string, and then analyze the result. If there will be two or more pages in extradition, there are copies, if the result is only one, then there are no doubles from this page, and you can not worry.

It is logical that if the site consists of a large number of pages, then such a check can turn into an impracticable routine for an optimizer. To minimize time costs, you can use special programs. One of these tools, which is probably a sign of experienced specialists, is Xenu`s Link Sleuth.

To check the site, you need to open new projectBy selecting in the "File" menu "Check URL", enter the address and click "OK". After that, the program will begin processing all the URL of the site. At the end of the check, you need to export the received data to any convenient editor and start looking for a double.

In addition to the above methods in the tools of the Yandex.Vebmaster panels and Google WebMaster Tools, there are means for checking indexing pages that can be used to search for a double.

Methods for solving the problem

After all the duplicas are found, their elimination will be required. This can also be done in several ways, but for each specific case you need your own method, it is possible that everyone will have to use.

Copy Pages can be deleted manually, but this method is rather suitable only for those doubles that were created manual way By inconsistency of the webmaster.

Redirect 301 is great for gluing pages-copies whose url is distinguished by the presence and absence of WWW.

Solving problems with doubles using the canonical tag can be used for fuzzy copies. For example, for categories of goods in the online store, which have a duplicate, distinguished by sorting in various parameters. Canonical is also suitable for versions of pages for printing and in other similar cases. It is used quite simply - for all copies, the REL \u003d "Canonical" attribute is indicated, and for the main page that is most relevant - no. The code should look something like this: Link Rel \u003d "canonical" href \u003d "http://yoursite.ru/stranica-kopiya" /, and stand within the head tag.

In the fight against doubles can help configure the robots.txt file. The Disallow directive will allow you to close access to dubs for search robots. You can read more about the syntax of this file in the release of №64 of our newsletter.

conclusions

If users perceive duplicate as one page with different addresses, then for spiders, these are different pages with duplicate content. Page copies are one of the most common pitfalls that can not get around newcomers. Their presence in large quantities on the promotable site is unacceptable, as they create serious obstacles to exit.

Drops pages on sites or blogsWhere they come from and what problems can create.
It is about this that we'll talk about this post, we will try to deal with this phenomenon and find ways to minimize those potential troubles that can bring us duplicate pages on the site.

So, will continue.

What is duplicate pages?

Dutch pages on any web resource means access to the same information at different addresses. Such pages are also called the internal dubs of the site.

If the text on the page is completely identical, then such a duplicate is called complete or clear. With partial coincidence duplicate is called incomplete or fuzzy.

Incomplete duplication - These are pages of categories, page list of goods and the like pages containing the announcements of the materials of the site.

Full duplicate pages- These are versions for printing, pages with different extensions, archives page, search on the site, pages with comments so on.

Sources of double pages.

On the this moment Most packed pages are generated when using modern CMS. - Content management systems, they are also called engines of sites.

This is I. WordPress, and Joomla, and Dle And other popular CMS. This phenomenon seriously annifies the optimizers of sites and webmasters and delivers them additional troubles.

In online stores Dupils may appear when the goods are displayed with sorting on various details (manufacturer of goods, appointment of goods, the date of manufacture, price, etc.).

Also need to remember the notorious console www.and to determine if it is in the name of the domain when creating, developing, promoting and promoting the site.

As you can see, the sources of the appearance of a double can be different, I listed only the main, but all of them are well known to those skilled in the art.

Dutch pages, negative.

Despite the fact that many at the appearance of the doubles do not pay special attention, this phenomenon can create serious problems when promoting sites.

Search engine may regard drokes like spamand, as a result, it is serious to reduce the position of both these pages and the site as a whole.

When promoting the site links may occur as follows. At some point, the search engine is regarded as the most relevant Page DoubleAnd not the one you promote links and all your efforts and costs will be vain.

But there are people who try use duplicate weight On the desired pages, most important, for example, or any other.

Methods of dealing with dubs pages

How to avoid a double or how to reduce the negative moments when they appear?
And in general it is worth it to somehow deal with this or all to give the mercy to search engines. Let them disassemble, since they are so smart.

Using robots.txt

Robots.txt- This is a file placed in the root directory of our site and containing directives for search robots.

In these directives, we indicate which pages on our site index, and which is not. We can also specify the name of the main domain of the site and the file containing the site map.

To prohibit indexing pages used Directive Disallow. It is it that the webmasters use it, in order to close from the indexation of duplicate pages, and not only duplicate, but any other information that is not directly related to the content of pages. For example:

Disallow: / Search / - Close Site Search Pages
Disallow: / *? - Close the pages containing the question mark "?"
Disallow: / 20 * - Close the archive page

Using file.htaccess

File.htaccess.(without expansion) is also placed in the root directory of the site. To combat duplicates in this file, customize use 301 redirect.
This method helps to keep the site indicators when cMS Site change or change its structure. The result is correct redirection without loss of reference weight. At the same time, the weight of the page at the old address will be passed to the page at a new address.
301 redirect apply and when determining the main domain of the site - with www or without www.

Using the REL \u003d "CANNONICAL" tag

With this tag, the webmaster indicates the search engine of the original source, that is, the page that should be indexed and participate in the ranking of search engines. The page is called canonical. The entry in the HTML code will look like this:

When using CMS WordPress, this can be done in the settings of such useful. plugin as ALL IN ONE SEO PACK.

Additional measures to combat doubles for CMS Wordpress

By applying all of the above methods of dealing with duplicate pages on your blog, I had a feeling all the time that I did not all that you can. Therefore, fighting on the Internet, consulting with professionals, I decided to do something else. Now I will describe it.

I decided to eliminate dupils that are created on the blog, when use anchors I told them about the article "HTML Anchors". On blogs running CMS WordPress anchors are formed when tag "#More" and when using comments. The feasibility of their application is rather controversial, but the ducky they are fruit clearly.
Now how I eliminated this problem.

First, we will take the #more tag.

Found a file where it is formed. Rather, I suggested.
This ../ WP-includes / Post-Template.php
Then I found a fragment of the program:

Id) \\ "class \u003d \\" more-link \\ "\u003e $ more_link_text", $ More_link_text);

Fragment marked in red removed

#more - ($ post-\u003e id) \\ "class \u003d

And received in the end the string of this kind.

$ Output. \u003d Apply_Filters ('The_Content_More_Link', ' $ more_link_text", $ More_link_text);

Remove the anchors comments #comment

We now turn to comments. This is already Dodumal himself.
Also determined with the file ../wp-includes/comment-template.php.
Find the desired fragment of the program code

return Apply_Filters ('Get_Comment_Link', $ Link . '# COMMENT-'. $ comment-\u003e comment_id, $ comment, $ args);)

Similarly, a fragment marked red removed. Very neatly, carefully, right up to each point.

. '# COMMENT-'. $ comment-\u003e comment_id

We as a result of the following row of the program code.

return Apply_Filters ('Get_Comment_Link', $ Link, $ Comment, $ Args);
}

Naturally, all this was done, previously coping the specified software files To your computer so that in case of failure it is easy to restore the state to changes.

As a result of these changes, when you click on the text "Read the rest of the record ..." I have a page with a canonical address and without adding to the address of the tail in the form "# More- ....". Also when clicking on comments, I have a normal canonical address without a prefix in the form of "# comment- ...".

Thus, the number of double pages on the site slightly decreased. But what else will form our WordPress now I can not say. We will keep track of the problem further.

And in conclusion, I bring to your attention a very good and informative video on this topic. I strongly recommend seeing.

All health and success. Until the following meetings.

Useful materials:

Pupils pages - one of the many reasons for lowering positions in search results And even getting under the filter. To prevent this, you need to warn them into the search engine index.

Determine the presence of a double on the site and get rid of them in various ways, but the seriousness of the problem is that the duplicate is not always useless pages, they simply should not be in the index.

We will solve this problem now, only for a start to find out what a duplicate is and how they arise.

What is duplicate pages

Pupil pages is a copy of the content of the canonical (main) page, but with another URL. It is important here to note that they can be both complete and partial.

Full duplication It is an accurate copy, but with its address, the difference of which can manifest itself in the slash, the WWW abbreviation, the substitution of the parameters index.php?, Page \u003d 1, Page / 1, etc.

Partial duplication It is manifested in incomplete copying of the content and associated with the site structure, when the announcements of the articles directory, archives, content from Sidebar, Page Page and other through elements of the resource contained on the canonical page are indexed. This is inherent in most CMS and online stores in which the catalog is an integral part of the structure.

We have already spoken about the consequences of the occurrence of the oak, and this is due to the distribution of reference mass between duplicates, submenuing pages in the index, the loss of the uniqueness of the content, etc.

How to find ducky pages on site

The following methods can be used to search for a double:

google search string. With the design of the Site: myblog.ru, where myblog.ru is your URL, pages from the main index are detected. To see dupils, you need to go to last page search results and click on the line "Show hidden results";
team "Advanced Search" in Yandex. Pointing in a special window address of your site and entering in quotes one of the proposals of an indexed article exposed to check, we must only get one result. If their more is a duplicate;
toolbar For webmasters in PS;
manually, Substituting B. address line Slash, www, HTML, ASP, PHP, the letters of the upper and lower registers. In all cases, redirection must occur on the page with the main address;
special programs and services: Xenu, Megaindex, etc.

Remove sheets of pages

The removal of doubles also have several. Each of them has its impact and consequencesTherefore, it is not necessary to talk about most effective. It should be remembered that physical destruction An indexed duplicate is not a way out: search engines about it will still remember. Therefore, the best method of dealing with dubs - prevent their appearance Using the right settings of the site.

Here are some of the ways to eliminate the doubles:

Setting Robots.txt. This will allow specific pages from indexing. But if Yandex robots are susceptible to this file, Google captures even pages closed, not particularly considering his recommendations. In addition, with the help of robots.txt, remove indexed duplicas is very difficult;
301 redirect. It contributes to gluing a double with a canonical page. The method is valid, but not always useful. It can not be used in the case when duplicates must remain independent pages, but should not be indexed;
Assignment 404 errors Infected dubs. The method is very good for their removal, but will require some time before the effect manifests itself.

When nothing to glue and delete nothing, but I don't want to lose the weight of the page and get a punishment from search engines, it is used rel Canonical Href attribute.

Rel Canonical attribute on the fight against doubles

I will start with the example. In the online store there are two pages with identical content cards, but on the same goods are alphabetically, and on the other in cost. Both are needed and redirected is not allowed. At the same time, for search engines it is a clear double.

In this case, rational use of the tag link Rel Canonicalindicating the canonical page that is indexed, but the main page remains available to users.

This is done as follows: In the HEAD block of pages-duplicate, reference is specified. "Link REL \u003d" Canonical "href \u003d" http://site.ru/osnovnaya stranitsa "/"where Stranitsa is the address of the canonical page.

With this approach, the user can freely visit any page of the site, but a robot, reading the Rel Canonical attribute code, will go index only the address of which is listed in the link.

This attribute may be useful and for pagation pages. In this case, create a page "Show everything" (such "portylight") and take for canonical, and pagination pages send a robot to it through Rel Canonical.

Thus, the choice of the method of combating pages duplication depends on the nature of their emergence and necessity Presence on the site.

Quite often, on the same site there are copies of pages, and his owner may not guessed about it. When they open, everything is displayed correctly, but if you look at the address of the site, then you can notify that different addresses may correspond to the same content.

What does this mean? For simple users in Moscow, nothing, because they came to your site not to look at the names of pages, but because they were interested in content. But this cannot be said about search engines, because they are perceived by such a position in a completely different light - they see the pages with the same content different from each other.

If ordinary users can not notice the duplicated pages on the site, this will definitely not slip away from the attention of search engines. What can it lead to? Search robots will define copies as different pages, as a result they will cease to perceive their content as unique. If you are interested in promoting the site, then know that on ranking it will certainly affect. In addition, the presence of a double will reduce the reference weight, which emerged as a result of a considerable effort of an optimizer, which tried to highlight the target page. Duplication of pages can lead to the fact that a completely different part of the site will be highlighted. And this can at times reduce the effectiveness of external references and inner translets.

Can draft pages bring harm?

Often, the culprit of the appearance of a double is CMS, wrong settings which or the lack of attention of the optimizer can lead to a generation of clear copies. Such sites management systems like Joomla sin often. Immediately note that universal means To combat this phenomenon, it simply does not exist, but you can set one of the plugins designed to search and delete copies. However, fuzzy duplicas may appear, the contents of which are not fully coincided. It most often happens due to the flawlessness of the webmaster. Often, such pages can be found in online stores in which the cards of goods differ only with several descriptions of the description, the rest of the same content, which consists of various elements and through blocks, is the same. Often, experts agree that some doubles does not prevent the site, but if there are about half or more about half or more, then the promotion of the resource will cause a lot of problems. But even in cases where there are several copies on the site, it is better to find them and eliminate - so you will probably get rid of the doubles on your resource.

Finding duplicate pages

You can find duplicate pages in several ways. But before the search itself it would be good to look at your site by the eyes of the search engines: how they imagine him. To do this, simply compare the number of your pages with those that are in their index. To see it, just enter in search string Google either "Yandex" phrase Host: Yoursite.ru, after which evaluate the results.

If such simple check Provide various data that may differ in 10 or more times, that is, it is reason to believe that your electronic resource contains a dunk. Although it does not always happen due to the fault of duplicate pages, but this check will serve as a good basis for their search. If your site has a small size, then you can independently calculate the number of real pages, then compare the result with search engine indicators. You can search for duplicates and via URLs that are offered in search results. If you use CNC, then pages with incomprehensible symbols in URL, such as "index.php? C \u003d 0F6B3953D", immediately attract your attention.

Another method of determining the presence of a double is to search for text fragments. To perform such a check, you need to enter text from several words of each page into the search string, then simply analyze the result. In cases where two or more pages fall into the issuance, it becomes obvious that the copies take place. If the page is in extradition only one, then it does not have duplicates. Of course, this test method is suitable only for a small site consisting of several pages. When the site contains hundreds of them, its optimizer can use special programs, such as Xenu`s Link Sleuth.

To check the site, open a new project and go to the "File" menu, find "Check URL", enter the address of the site you are interested in and click OK. Now the program will start processing all the URL of the specified resource. When the work is completed, the information received will need to open in any convenient editor and search for a double. On this methods of searching for duplicate pages do not end: in the Google Webmaster and Yandex.Vebmaster toolbar, you can see the means to check the indexing of pages. With their help, you can also find a duplicate.

On the way to solve the problem

When you find all the duplicas, you will have a task to eliminate them. There are several possibilities for solving this problem and various methods Remedy duplicate pages.

Bonding copies can be performed using redirect 301. This is effective in cases where the URL is distinguished by the absence or availability of WWW. You can delete copies pages in manual mode, but this method is successful only for those doubles that were created manually.

You can solve the problem with a duplicate with a canonical tag that is used for fuzzy copies. So, it can be used in the online store for categories of goods for which there are duplicas and which differ only for sorting by different parameters. In addition, the Canonical tag is suitable for use on print pages and in similar cases. It is not difficult to use it - for each copy, the attribute is set in the form of REL \u003d "canonical", for the progressable page with the most relevant characteristics, this attribute is not specified. Approximate view of the code: Link REL \u003d "Canonical" href \u003d "http://site.ru/stranica-kopiya" /. It should be located in the HEAD tag area.

A properly configured Robots.txt file will also achieve success in the fight against the dubs. Using the Disallow directive, you can overlap the access of search robots to all duplicate pages.

Even the professional development of the site will not help withdraw it in the top if the resource contains duplicate pages. Today, the pages - copies are one of the common pitfalls from which newcomers suffer. Their large amount on your site will create significant difficulties in bringing it to the top, and even make it impossible at all.