In my robots.txt i have this: Disallow: /lo lo is a directory with a script i want blocked. Problem is that \"Disallow: /lo\" blocks a post of mine:
Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update开发者_StackOverflow中文版 the question so it's on-topic for Stack Overf
Is it better to use me开发者_JS百科ta tags* or the robots.txt file for informing spiders/crawlers to include or exclude a page?
I want to disallow all files and folders on my site from SE bots, except a special folder and files in it.
How is it possible that my page /admin/login.asp is开发者_JS百科 found in Google with the query \"inurl:admin/login.asp\" while it isn\'t with the \"site:www.domain.xx\" query?
How can i disallow in robots.txt indexing of pages http://example.net/something,category1.php http://开发者_开发技巧example.net/something,category2.php
My URL structure is set up in two parallels (both lead to the same place ): www.example.com/subname www.example.com/123
I want to know how to parse the robots.txt in java. Is there already any co开发者_运维问答de?Heritrix is an open-source web crawler written in Java.Looking through their javadoc, I see that they have
I\'ve been told to understand how to maximize the visibility of an upcoming web application that is initially available in multiple languages, specifically French and English.
Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow.