Discuz! Board

 找回密碼
 立即註冊
搜索
熱搜: 活動 交友 discuz
查看: 1|回復: 0

Allow block all crawlers except one You can

[複製鏈接]

1

主題

1

帖子

5

積分

新手上路

Rank: 1

積分
5
發表於 17:28:41 | 顯示全部樓層 |閱讀模式
After understanding the terminology used, if you want to set up a rule, you can enter /robots.txt in the domain address and set it. First, let me introduce some terms: robots.txt directive user-agent: The name of the crawler to which the rule applies. disallow: Blocks the user agent from crawling a directory or page. allow: Allow user agents to crawl directories or pages. Please note that this only applies to Googlebot. sitemap: A list file that lists all resources on a website . Crawling bot names: GooglebotSlurp (Yahoo) robots.txt rules Let's learn about some rules using robots.txt as an example. Block crawling If you want to block all content on your site from crawlers, you can set it as follows: Block crawling robots.txt 1 Block crawling of robots.txt As explained earlier, user-agent refers to a crawling bot, and those marked with * refer to all crawling bots. Disallow is a setting that prevents the page from being crawled.


You can block crawling by marking it with /. This means that crawling of all pages, including the  homepage, is blocked Cambodia Phone Number Data from all crawling bots . Allow crawling Conversely, there are also settings that allow access to crawling bots. Allow crawling of robots.txt Allow crawling of robots.txt If there is no / in disallow, it indicates that crawling is allowed. In other words, all crawling bots are allowed to crawl all pages, including the homepage. Block specific crawlers – specific folders If you want to block a specific folder from crawlers, you can set it as follows. Block crawling of robots.txt – specific folder Googlebot is the name of Google's crawling bot. If you want to block this bot, just enter the bot's name in user-agent. If you want to block crawling bots only in a specific location, you can select the location. Block crawling of robots.txt - specific location Block crawling of robots.txt – specific locations This may block crawling of pages containing URLs such as . It is also possible to block content crawling of multiple directories at the same time.




Block crawling of robots.txt - specific location Block crawling of robots.txt – specific locations In the case of the above file, the crawler cannot access the calendar and junk directories. Block specific crawlers – specific web pages You can block crawlers from accessing specific web pages. Block robots.txt crawling - specific web pages Block robots.txt crawling – specific web pages With this setting , you can block Naver's crawler Yeti from crawling a specific page at  .  set the settings introduced above at the same time. Allow all crawlers except one Block robots.txt crawling – allow all crawlers except one In this case, only Unnecessarybot will be blocked and the remaining crawlers will be able to access the site. Conversely, if you only want to allow one crawler, you can configure it as follows: Only one crawler allowed Block crawling of robots.txt – allow only one crawler With this setting, only Googlebot-news is allowed and access to other crawlers is blocked.

回復

使用道具 舉報

您需要登錄後才可以回帖 登錄 | 立即註冊

本版積分規則

Archiver|手機版|自動贊助|z

GMT+8, 18:33 , Processed in 0.033247 second(s), 18 queries .

抗攻擊 by GameHost X3.4

Copyright © 2001-2021, Tencent Cloud.

快速回復 返回頂部 返回列表
一粒米 | 中興米 | 論壇美工 | 設計 抗ddos | 天堂私服 | ddos | ddos | 防ddos | 防禦ddos | 防ddos主機 | 天堂美工 | 設計 防ddos主機 | 抗ddos主機 | 抗ddos | 抗ddos主機 | 抗攻擊論壇 | 天堂自動贊助 | 免費論壇 | 天堂私服 | 天堂123 | 台南清潔 | 天堂 | 天堂私服 | 免費論壇申請 | 抗ddos | 虛擬主機 | 實體主機 | vps | 網域註冊 | 抗攻擊遊戲主機 | ddos |