How does it fare vs a site that has about 200K files or more? Will it crash or have performance issues before finished?
Also does it follow robots tags in page? Example html from one
<title>Printer friendly output for Sky Dancer 2</title>
<meta name="ROBOTS" content="NOINDEX">