illogical downloading...

continuing the trend from yesterday and posting more about annoying things ™, my friend pointed out that in the awstats report for one of my sites, i had a very abnormal number of reported bandwidth downloads for one of the days (~3tb, when i usually average under 50gb per day for that particular server). of course this is unrealistic (and, in fact, impossible, because the box is on a 10mbps connection, which would mean no more than ~108gb/day…).

so i did some investigation…

[ahmedre@cafesalam a]$ ls -alh download*21
-rw-r--r-- 1 ahmedre ahmedre 36M Jan 22 11:29 download-access_log.01.06.2008_21

i opened this file and noticed that a particular ip was repeated a huge number of times in this file, mainly with 200 and 206 return codes.

[ahmedre@cafesalam a]$ wc -l download*21
158618 download-access_log.01.06.2008_21
[ahmedre@cafesalam a]$ grep -c download*21
[ahmedre@cafesalam ~]$ echo 152514/158618 | bc -l

amazing… turns out that when i swapped servers, i forgot to re-enable mod_limitipconn in the apache configuration. the thing is, this isn’t the primary download server, it’s just one of the mirrors that the download rotates to (so it’s easy to figure out, but you have to have some computer knowledge to get to it, because the webpage currently doesn’t link directly to the files, but instead to a php script that figures out which server to get the file from (and it so happens that both servers usually have all the files, but that’s a different story)).

so what i don’t understand is, if you have somewhat of an understanding of computers, why would you go about initiating 152,000 requests within one hour (from some place in new york, i should add…)? how can you possibly expect to download that much at a time? i have nothing against people using download managers, i use them all the time, but at least be respectable about it - download no more than a handful of files at once… for what it’s worth, the ua string was: “Mozilla/4.0 (compatible; MSIE 5.00; Windows 98).” i tried to see if this was some sort of download manager, but i am not sure… but it sort of has to be either a download manager or a script to hit that many files in an hour…

this really bothers me… not to mention that on the primary site, the error log reaches gbs in size due to all these 403 requests because of people with download managers constantly trying to leach a huge number of files at a time… i need to find lighttpd/apache modules that do progressive blocking to people who hit the site an exorbitant number of times…

comments powered by Disqus