The WebMoose FAQ

Q: What is WebMoose?

WebMoose is a web robot. WebMoose visits WWW sites and downloads their HTML pages. It processes the downloaded HTML and generates a database of statistics including: HTML keyword usage frequency page size server name and version When WebMoose notices a link, it tosses it in its database and visits the link sometime later. WebMoose tries not to flail on HTTP server sites—but because of the algorithm used, WebMoose might hit a particular site several times in a row before wandering off to another site.

Q: How Can I See the Statstics WebMoose has Generated?

You can't, just yet. This is mainly because WebMoose is still under development. Therefore, I haven't let it run quite long enough to get any meaningful results.

Q: Who Runs WebMoose?

I do. My name is Mike Blaszczak. Visit my web page, if you're curious, or write to me directly.

Q: When Will WebMoose be Done?

I don't know. I'm writing it in my spare time, and I have plenty of other things to do. Some of those other things are actually more interesting than WebMoose, so it's possible that WebMoose may never be finished.

Q: Where Can I Get the WebMoose Source Code?

You can't. WebMoose isn't done, and even when it is done (if it is ever done), I might not release its source because there's just too much chance for abuse.

Q: What Makes WebMoose Go?

WebMoose itself runs on a 200 MHz Pentium Pro® system via an ISDN connection to The Microsoft Network. WebMoose was written using MFC 4.2 and Microsoft Visual C++ 4.2. WebMoose runs under Windows 95. WebMoose talks over a local Ethernet connection to a 90 MHz Pentium system running Windows NT Server 4.0. This box runs Microsoft SQL Server Version 6.0, and stores information about everything that WebMoose has found lately.

Q: How Often Does WebMoose Run?

I generally run WebMoose for a few hours late on weekend evenings. I run WebMoose against a local web server to test it, so it doesn't often get out in public.

Q: Does WebMoose Follow the Standard for Robot Exclusion?

The Standard for Robot Exclusion gives web masters a chance at having web robots, like WebMoose, completely pass their site by. The standard is simple and flexible: it affords the server administrator a way to exclude robots by name, and to exclude robots from certain parts of their server. For now, WebMoose doesn't follow the standard but I'm working on it. (This is why, for now, I don't let the moose roam very far.) I'll probably implement handling of this standard before going much further with the development of the tool. The presence and absenece of a ROBOTS.TXT file, and the proper response to a request for such a file (whether it exists or not) are other statistics that WebMoose will keep.

Q: How do I Know that WebMoose is Visiting My Site?

In all HTTP requests it makes, WebMoose identifies itself with a User-Agent: header that looks like this: User-Agent: WebMoose/h.kk.bbbb Where h is a digit identifying the major version of the moose and kk is a pair of digits identifying the minor version of the moose. bbbb is a string of four digits identifying the build number of the moose. The build number increments each and every time the moose is recompiled, and that happens very often since WebMoose is still under development. At the exact time of this writing, WebMoose uses the string: User-Agent: WebMoose/0.01.0077 to identify itself. Undoubtedly, the last four digits have been incremented since I just thought of something else to fix.