Sorry folks, I'm not sure this is the best place for this report, but I thought this was important enough to be brought to the attention of the dev team, so I can be reasonably sure that somebody sees this...
The French National Library
http://www.bnf.fr uses the Heritrix robot for collecting web pages in its mission of "legal deposit" defined by french laws (That's some recent french law requiring that French National Library gets a copy of each and every website published in France...)
There's a web page that the French National Library has set up for enquiring about its robot's activity here :
http://bibnum.bnf.fr/robot/
It is stated there that the BNF has some king of partnership with archive.org, uses its robot, and this web page also states that the BNF robot uses Archive.org's User-Agent string : "Mozilla/5.0 (compatible; archive.org_bot)"
...Which is partly inaccurate as my webserver logs show that the actual UserAgent string that the robot uses is : "Mozilla/5.0 (compatible; heritrix/1.10.1 +http://bibnum.bnf.fr/robot/)"
I have noticed this afternoon (2007/08/28) that the BNF's robot was getting _many_ pages from my webserver, and was CLEARLY VIOLATING my webserver's robots.txt file instructions, as it was accessing a number of pages in directories specifically denied to all robots by my robots.txt.
Seing that the robot was violating robots.txt, I used internal redirects to send it to a robot trap generating an infinity of dummy pages linking to each other, each having a <META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW"> tag.
However, the BNF's Heritrix bot has crawled more than 5,000 pages of the sort from my server from this afternoon on.
I would like to know, seing that BNF states it has some sort of partnership with archive.org, if archive.org will ever get a feed of pages from BNF's robot collection, seing that BNF collects pages and elements violating robots.txt, which of course means that I wouldn't want those pages made public in archive.org - Well, I thougth that robot.txt was exactly for that.
archive.org has always been well behaved AFAIK, so I'm rather pissed off seing French National Library use archive.org's bot to Break the Rules, whether this is intentional or not, I don't know...
So this issue is closed as far as I am concerned.