There are things that are easy to do online and are not questioned. This includes, for example, the robots.txt file, which can be found on almost every webspace. For this you give the search engine crawler some information and determine which files or folders, should not be indexed. These and other instructions for the search engines can be quickly and easily written to the file and are also recognized.
The robots.txt is a real veteran. The format behind this file was already introduced in 1994 and has not really changed since then. Nobody yet has taken care of the standardization until today. There are some differences in the instructions that are given about the file and in the worst case this can lead to misunderstandings in the crawlers. In most cases this is not a bad thing, but sometimes it can have problems for the operator of the website. A standard practice that specifies exactly which commands leads to which results could simplify the situation for all everyone.
That’s exactly what Google has recognized. Their experiences are of course varied and one always strives to find the best possible solution for all. For this reason, two initiatives should now make it easier to use robots.txt. On the one hand, standardization should be promoted, so that in the future it will no longer be possible for misunderstandings to occur and also to be able to use innovations such as the character set or caching. On the other hand, Google has now also released the parser as open source. This can be used as a test tool and also downloaded. If there are misinterpretations or unwanted crawl behavior, this can now be detected in advance by the website operator. Unfortunately there is no API for it. Google explains that parts of the parser are from the 90s and written in C ++. However, the functionality is by no means impaired – and even without an API, the tool does a great job!