Apache-Better-Blocking with common rules.txt
· 226 B · Text
原始文件
Following on from other Gists I have posted, this one shows a neat way of using Includes to centralise general blocking rules for Bat Bots, creepy crawlers and irritating IPs
see the full post at http://technology.blue-bag.com
| 1 | Following on from other Gists I have posted, this one shows a neat way of using Includes to centralise general blocking rules for Bat Bots, creepy crawlers and irritating IPs |
| 2 | see the full post at http://technology.blue-bag.com |
blocked-adresses.conf
· 691 B · Text
原始文件
## A list of known problem IPs
# pen test on FKEditor
SetEnvIfNoCase REMOTE_ADDR "175\.44\.30\.180" BlockedAddress
SetEnvIfNoCase REMOTE_ADDR "175\.44\.29\.92" BlockedAddress
SetEnvIfNoCase REMOTE_ADDR "175\.44\.30\.180" BlockedAddress
SetEnvIfNoCase REMOTE_ADDR "174\.139\.240\.74" BlockedAddress
# looking for backups
SetEnvIfNoCase REMOTE_ADDR "192\.99\.12\.128" BlockedAddress
# Bad Crawler
SetEnvIfNoCase REMOTE_ADDR "144\.76\.195\.72" BlockedAddress
SetEnvIfNoCase REMOTE_ADDR "54\.189\.47\.213" BlockedAddress
# Java scraper
SetEnvIfNoCase REMOTE_ADDR "62\.116\.110\.111" BlockedAddress
# Big hitter - known spammer
SetEnvIfNoCase REMOTE_ADDR "109\.201\.137\.166" BlockedAddress
| 1 | ## A list of known problem IPs |
| 2 | |
| 3 | # pen test on FKEditor |
| 4 | SetEnvIfNoCase REMOTE_ADDR "175\.44\.30\.180" BlockedAddress |
| 5 | SetEnvIfNoCase REMOTE_ADDR "175\.44\.29\.92" BlockedAddress |
| 6 | SetEnvIfNoCase REMOTE_ADDR "175\.44\.30\.180" BlockedAddress |
| 7 | SetEnvIfNoCase REMOTE_ADDR "174\.139\.240\.74" BlockedAddress |
| 8 | |
| 9 | |
| 10 | # looking for backups |
| 11 | SetEnvIfNoCase REMOTE_ADDR "192\.99\.12\.128" BlockedAddress |
| 12 | |
| 13 | # Bad Crawler |
| 14 | SetEnvIfNoCase REMOTE_ADDR "144\.76\.195\.72" BlockedAddress |
| 15 | SetEnvIfNoCase REMOTE_ADDR "54\.189\.47\.213" BlockedAddress |
| 16 | |
| 17 | # Java scraper |
| 18 | SetEnvIfNoCase REMOTE_ADDR "62\.116\.110\.111" BlockedAddress |
| 19 | |
| 20 | # Big hitter - known spammer |
| 21 | SetEnvIfNoCase REMOTE_ADDR "109\.201\.137\.166" BlockedAddress |
blocked-agents.conf
· 1.6 KiB · Text
原始文件
# list obtained from 3rd party
SetEnvIfNoCase User-Agent ^$ bad_bot #this is for blank user-agents
SetEnvIfNoCase User-Agent "Jakarta" BlockedAgent
SetEnvIfNoCase User-Agent "User-Agent" BlockedAgent
SetEnvIfNoCase User-Agent "libwww," BlockedAgent
SetEnvIfNoCase User-Agent "lwp-trivial" BlockedAgent
SetEnvIfNoCase User-Agent "Snoopy" BlockedAgent
SetEnvIfNoCase User-Agent "PHPCrawl" BlockedAgent
SetEnvIfNoCase User-Agent "WEP Search" BlockedAgent
SetEnvIfNoCase User-Agent "Missigua Locator" BlockedAgent
SetEnvIfNoCase User-Agent "ISC Systems iRc" BlockedAgent
SetEnvIfNoCase User-Agent "lwp-trivial" BlockedAgent
SetEnvIfNoCase User-Agent "GbPlugin" BlockedAgent
SetEnvIfNoCase User-Agent "Wget" BlockedAgent
SetEnvIfNoCase User-Agent "EmailSiphon" BlockedAgent
SetEnvIfNoCase User-Agent "EmailWolf" BlockedAgent
SetEnvIfNoCase User-Agent "libwww-perl" BlockedAgent
## end of 3rd party list (note could also block them in Robots.txt see article)
## List derived from actual activity
# Mozilla/5.0 (compatible; BLEXBot/1.0; +http://webmeup-crawler.com/)
SetEnvIfNoCase User-Agent "BLEXBot" BlockedAgent
# Mozilla/5.0 (compatible; 007ac9 Crawler; http://crawler.007ac9.net/)
SetEnvIfNoCase User-Agent "007ac9 Crawler" BlockedAgent
#Mozilla/5.0 (compatible; MJ12bot/v1.4.5; http://www.majestic12.co.uk/bot.php?+)
SetEnvIfNoCase User-Agent "MJ12bot" BlockedAgent
# Fetchbot (https://github.com/PuerkitoBio/fetchbot)
SetEnvIfNoCase User-Agent "Fetchbot" BlockedAgent
#Mozilla/5.0 (compatible; SISTRIX Crawler; http://crawler.sistrix.net/)
SetEnvIfNoCase User-Agent "SISTRIX" BlockedAgent
| 1 | # list obtained from 3rd party |
| 2 | |
| 3 | SetEnvIfNoCase User-Agent ^$ bad_bot #this is for blank user-agents |
| 4 | SetEnvIfNoCase User-Agent "Jakarta" BlockedAgent |
| 5 | SetEnvIfNoCase User-Agent "User-Agent" BlockedAgent |
| 6 | SetEnvIfNoCase User-Agent "libwww," BlockedAgent |
| 7 | SetEnvIfNoCase User-Agent "lwp-trivial" BlockedAgent |
| 8 | SetEnvIfNoCase User-Agent "Snoopy" BlockedAgent |
| 9 | SetEnvIfNoCase User-Agent "PHPCrawl" BlockedAgent |
| 10 | SetEnvIfNoCase User-Agent "WEP Search" BlockedAgent |
| 11 | SetEnvIfNoCase User-Agent "Missigua Locator" BlockedAgent |
| 12 | SetEnvIfNoCase User-Agent "ISC Systems iRc" BlockedAgent |
| 13 | SetEnvIfNoCase User-Agent "lwp-trivial" BlockedAgent |
| 14 | |
| 15 | SetEnvIfNoCase User-Agent "GbPlugin" BlockedAgent |
| 16 | SetEnvIfNoCase User-Agent "Wget" BlockedAgent |
| 17 | SetEnvIfNoCase User-Agent "EmailSiphon" BlockedAgent |
| 18 | SetEnvIfNoCase User-Agent "EmailWolf" BlockedAgent |
| 19 | SetEnvIfNoCase User-Agent "libwww-perl" BlockedAgent |
| 20 | |
| 21 | ## end of 3rd party list (note could also block them in Robots.txt see article) |
| 22 | |
| 23 | ## List derived from actual activity |
| 24 | # Mozilla/5.0 (compatible; BLEXBot/1.0; +http://webmeup-crawler.com/) |
| 25 | SetEnvIfNoCase User-Agent "BLEXBot" BlockedAgent |
| 26 | |
| 27 | # Mozilla/5.0 (compatible; 007ac9 Crawler; http://crawler.007ac9.net/) |
| 28 | SetEnvIfNoCase User-Agent "007ac9 Crawler" BlockedAgent |
| 29 | |
| 30 | #Mozilla/5.0 (compatible; MJ12bot/v1.4.5; http://www.majestic12.co.uk/bot.php?+) |
| 31 | SetEnvIfNoCase User-Agent "MJ12bot" BlockedAgent |
| 32 | |
| 33 | # Fetchbot (https://github.com/PuerkitoBio/fetchbot) |
| 34 | SetEnvIfNoCase User-Agent "Fetchbot" BlockedAgent |
| 35 | |
| 36 | #Mozilla/5.0 (compatible; SISTRIX Crawler; http://crawler.sistrix.net/) |
| 37 | SetEnvIfNoCase User-Agent "SISTRIX" BlockedAgent |
vhost-sample.conf
· 2.7 KiB · Text
原始文件
<VirtualHost *:80>
## Note this is heavily reduced just to show the relevant lines
## Expires and security options have been removed
## Don't just paste this - but refer to it along with your customisations
ServerName www.example.com
DocumentRoot /var/www/example.com/live/htdocs
<Directory /var/www/example.com/live/htdocs>
Options +FollowSymLinks
# Disable .htaccess files (remember to account for any rules they implement)
AllowOverride None
# Include our blocked lists
Include /etc/apache2/blocked-addresses.conf
Include /etc/apache2/blocked-agents.conf
Order allow,deny
Allow from all
# Deny from our blocked lists
deny from env=BlockedAddress
deny from env=BlockedAgent
<IfModule mod_rewrite.c>
RewriteEngine on
# Intercept Microsoft Office Protocol Discovery
# OPTION requests for this were hitting site regularly
RewriteCond %{REQUEST_METHOD} ^OPTIONS
RewriteCond %{HTTP_USER_AGENT} ^Microsoft\ Office\ Protocol\ Discovery [OR]
RewriteCond %{HTTP_USER_AGENT} ^Microsoft\ Office\ Existence\ Discovery [OR]
RewriteCond %{HTTP_USER_AGENT} ^Microsoft\-WebDAV\-MiniRedir.*$
RewriteRule .* - [R=405,L]
##### Security hardening ####
## DENY REQUEST BASED ON REQUEST METHOD ###
RewriteCond %{REQUEST_METHOD} ^(TRACE|TRACK|OPTIONS|HEAD)$ [NC]
RewriteRule ^.*$ - [F]
</IfModule>
</Directory>
## the following log details are included to show
## how to use SetEnvIf to include/exclude certain requests for images etc
## Also turn on robots.txt logging to check robots behaviour
## Custom Logging for combined logs - note they are filtered to not log images, robots.txt, cs, js etc
UseCanonicalName On
LogFormat "%V %h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-agent}i\"" vcommon
ErrorLog /var/www/log/customer-error.log
# Possible values include: debug, info, notice, warn, error, crit,
# alert, emerg.
LogLevel warn
## we aren't logging images, css, js etc
## flag robots.txt requests - allow these to test robot behaviour
SetEnvIf Request_URI "^/robots\.txt$" robots-request=0
## flag favicon requests
SetEnvIf Request_URI "^/favicon\.ico$" favicon-request=1
## flag image requests
SetEnvIf Request_URI "(\.gif|\.png|\.jpg)$" image-request=1
## flag Css and JS requests
SetEnvIf Request_URI \.css css-request=1
SetEnvIf Request_URI \.js js-request=1
## set do_not_log if any of the above flags are set
SetEnvIf robots-request 1 do_not_log=1
SetEnvIf favicon-request 1 do_not_log=1
SetEnvIf image-request 1 do_not_log=1
SetEnvIf css-request 1 do_not_log=1
SetEnvIf js-request 1 do_not_log=1
## only log if do_not_log is not set
CustomLog /var/www/log/customer-access.log vcommon env=!do_not_log
</VirtualHost>
| 1 | <VirtualHost *:80> |
| 2 | ## Note this is heavily reduced just to show the relevant lines |
| 3 | ## Expires and security options have been removed |
| 4 | ## Don't just paste this - but refer to it along with your customisations |
| 5 | |
| 6 | ServerName www.example.com |
| 7 | |
| 8 | DocumentRoot /var/www/example.com/live/htdocs |
| 9 | |
| 10 | <Directory /var/www/example.com/live/htdocs> |
| 11 | Options +FollowSymLinks |
| 12 | |
| 13 | # Disable .htaccess files (remember to account for any rules they implement) |
| 14 | AllowOverride None |
| 15 | |
| 16 | # Include our blocked lists |
| 17 | Include /etc/apache2/blocked-addresses.conf |
| 18 | Include /etc/apache2/blocked-agents.conf |
| 19 | |
| 20 | Order allow,deny |
| 21 | Allow from all |
| 22 | # Deny from our blocked lists |
| 23 | deny from env=BlockedAddress |
| 24 | deny from env=BlockedAgent |
| 25 | |
| 26 | <IfModule mod_rewrite.c> |
| 27 | RewriteEngine on |
| 28 | |
| 29 | |
| 30 | # Intercept Microsoft Office Protocol Discovery |
| 31 | # OPTION requests for this were hitting site regularly |
| 32 | RewriteCond %{REQUEST_METHOD} ^OPTIONS |
| 33 | RewriteCond %{HTTP_USER_AGENT} ^Microsoft\ Office\ Protocol\ Discovery [OR] |
| 34 | RewriteCond %{HTTP_USER_AGENT} ^Microsoft\ Office\ Existence\ Discovery [OR] |
| 35 | RewriteCond %{HTTP_USER_AGENT} ^Microsoft\-WebDAV\-MiniRedir.*$ |
| 36 | RewriteRule .* - [R=405,L] |
| 37 | |
| 38 | ##### Security hardening #### |
| 39 | ## DENY REQUEST BASED ON REQUEST METHOD ### |
| 40 | RewriteCond %{REQUEST_METHOD} ^(TRACE|TRACK|OPTIONS|HEAD)$ [NC] |
| 41 | RewriteRule ^.*$ - [F] |
| 42 | |
| 43 | |
| 44 | </IfModule> |
| 45 | </Directory> |
| 46 | |
| 47 | ## the following log details are included to show |
| 48 | ## how to use SetEnvIf to include/exclude certain requests for images etc |
| 49 | ## Also turn on robots.txt logging to check robots behaviour |
| 50 | |
| 51 | ## Custom Logging for combined logs - note they are filtered to not log images, robots.txt, cs, js etc |
| 52 | UseCanonicalName On |
| 53 | LogFormat "%V %h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-agent}i\"" vcommon |
| 54 | |
| 55 | ErrorLog /var/www/log/customer-error.log |
| 56 | |
| 57 | # Possible values include: debug, info, notice, warn, error, crit, |
| 58 | # alert, emerg. |
| 59 | LogLevel warn |
| 60 | |
| 61 | ## we aren't logging images, css, js etc |
| 62 | |
| 63 | ## flag robots.txt requests - allow these to test robot behaviour |
| 64 | SetEnvIf Request_URI "^/robots\.txt$" robots-request=0 |
| 65 | ## flag favicon requests |
| 66 | SetEnvIf Request_URI "^/favicon\.ico$" favicon-request=1 |
| 67 | |
| 68 | ## flag image requests |
| 69 | SetEnvIf Request_URI "(\.gif|\.png|\.jpg)$" image-request=1 |
| 70 | |
| 71 | ## flag Css and JS requests |
| 72 | SetEnvIf Request_URI \.css css-request=1 |
| 73 | SetEnvIf Request_URI \.js js-request=1 |
| 74 | |
| 75 | ## set do_not_log if any of the above flags are set |
| 76 | SetEnvIf robots-request 1 do_not_log=1 |
| 77 | SetEnvIf favicon-request 1 do_not_log=1 |
| 78 | SetEnvIf image-request 1 do_not_log=1 |
| 79 | SetEnvIf css-request 1 do_not_log=1 |
| 80 | SetEnvIf js-request 1 do_not_log=1 |
| 81 | |
| 82 | ## only log if do_not_log is not set |
| 83 | CustomLog /var/www/log/customer-access.log vcommon env=!do_not_log |
| 84 | </VirtualHost> |