• Welcome to Overclockers Forums! Join us to reply in threads, receive reduced ads, and to customize your site experience!

Removing my blogs cache

Overclockers is supported by our readers. When you click a link to make a purchase, we may earn a commission. Learn More.

Puer Aeternus

Member
Joined
Sep 23, 2001
Location
In your head (Ottawa.Canada)
How do i prevent my blog from appearing in search engines?
I am reading about "removing cached version of a page" but its nothing that I easily understand.
Blogger is what I am using and what ive done so far is check the setting so search engines will not find it and changed the url to a diff name. I'd like to keep the blogs content but remove it from public eye. it still shows up in google but the content is not accessible.
 
It isn't specific to the site. You just create a robots.txt in the root directory of the website with the information in it.

Here is the example on my site: http://thideras.com/robots.txt

I appreciate your help, however I am not doing something right. I tried a robot.txt from a few sources and nothing seems to work...I get an error on my blog when I enter it into the html. This is completely new territory for me.
Now...here is a question, as my blog remains w/ its new settings to not be noticed by search engines and set to private....how many days will it take to stop being seen in searches?
At the moment when you type in a key word the blog shows in search results and there is a snapshot of previous pages even though when clicking the link it says the blog is removed.

Again...much thanks!
 
Wait, enter what HTML? There is nothing you need to edit in the website itself except the robots.txt file directly. Can you link your robots.txt file?

The searches will update the next time they scan.
 
Wait, enter what HTML? There is nothing you need to edit in the website itself except the robots.txt file directly. Can you link your robots.txt file?

The searches will update the next time they scan.

oh....ummm...where do I find the robots.txt file? Thats the issue i am having, the instructions for many of the sites are not clear to someone like myself who has absolutely no experience w/ this sort of thing. Even the google Webmaster tool verification is not clear. When I said i needed hand holding...I am embarrassed to say thats exactly what I need. I am a close cousin to that guy who says "but where is the ANY KEY on my keyboard"...maybe not that bad but you know what I mean.
 
You will likely have to create the robots.txt file, it may not exist. Here is what you need in that file to block everything:

Code:
User-agent: *
Disallow: /
If there is anything already in this file, delete it and put what I have above.
 
You will likely have to create the robots.txt file, it may not exist. Here is what you need in that file to block everything:

Code:
User-agent: *
Disallow: /
If there is anything already in this file, delete it and put what I have above.

I totally get that part but I have idea where to put it in blogger, thats whats tripping me up. Ive looked through Bloggers settings and assumed I had to add it to the HTML...obviously not. Every site ive visited gives me the .txt file instructions and says add it to your site but ive no idea where it goes exactly.
I just want to say..I appreciate your patience:)
 
You need to edit this file directly, not through your website. The easiest route is to connect to the server with a FTP client, pull the file to your computer, edit the contents and re-upload it. The method by which you connect the FTP client will be determined by your hosting provider.
 
Yeah that robots.txt line should tell everything to drop any SER you may have. If you really wanted to get into it you could also do some ".htaccess" strings.
(the file below also adds some security features most people ignore, they are extremely helpful and important though!)


A good start would be a modified version of this. The htaccess file below is a snipped version of the one provided for the forum software I use. (IcyPhoenix)

Code:
##################################
#      Errors Pages - BEGIN      #
##################################
##################################
# Decomment these lines to enable error document management.
# You can add absolute path if you want always the correct path being parsed.
# Something like:
# ErrorDocument 400 http://www.icyphoenix.com/errors.php?code=400
##################################
#ErrorDocument 400 /errors.php?code=400
#ErrorDocument 401 /errors.php?code=401
#ErrorDocument 403 /errors.php?code=403
#ErrorDocument 404 /errors.php?code=404
#ErrorDocument 500 /errors.php?code=500
##################################
#       Errors Pages - END       #
##################################


<IfModule mod_deflate.c>
    <IfModule mod_setenvif.c>
        BrowserMatch ^Mozilla/4 gzip-only-text/html
        BrowserMatch ^Mozilla/4\.0[678] no-gzip
        BrowserMatch \bMSIE !no-gzip !gzip-only-text/html
        BrowserMatch \bMSI[E] !no-gzip !gzip-only-text/html
    </IfModule>
    <IfModule mod_headers.c>
        Header append Vary User-Agent env=!dont-vary
    </IfModule>
    <IfModule mod_filter.c>
        AddOutputFilterByType DEFLATE text/css application/x-javascript text/html text/richtext image/svg+xml text/plain text/xsd text/xsl text/xml image/x-icon
    </IfModule>
</IfModule>
<FilesMatch "\.(css|js|CSS|JS)$">
    FileETag None
    <IfModule mod_headers.c>
         Header set X-Powered-By "W3 Total Cache/0.9.2.3"
    </IfModule>
</FilesMatch>
<FilesMatch "\.(html|htm|rtf|rtx|svg|svgz|txt|xsd|xsl|xml|HTML|HTM|RTF|RTX|SVG|SVGZ|TXT|XSD|XSL|XML)$">
    FileETag None
    <IfModule mod_headers.c>
         Header set X-Powered-By "W3 Total Cache/0.9.2.3"
    </IfModule>
</FilesMatch>
<FilesMatch "\.(asf|asx|wax|wmv|wmx|avi|bmp|class|divx|doc|docx|exe|gif|gz|gzip|ico|jpg|jpeg|jpe|mdb|mid|midi|mov|qt|mp3|m4a|mp4|m4v|mpeg|mpg|mpe|mpp|odb|odc|odf|odg|odp|ods|odt|ogg|pdf|png|pot|pps|ppt|pptx|ra|ram|swf|tar|tif|tiff|wav|wma|wri|xla|xls|xlsx|xlt|xlw|zip|ASF|ASX|WAX|WMV|WMX|AVI|BMP|CLASS|DIVX|DOC|DOCX|EXE|GIF|GZ|GZIP|ICO|JPG|JPEG|JPE|MDB|MID|MIDI|MOV|QT|MP3|M4A|MP4|M4V|MPEG|MPG|MPE|MPP|ODB|ODC|ODF|ODG|ODP|ODS|ODT|OGG|PDF|PNG|POT|PPS|PPT|PPTX|RA|RAM|SWF|TAR|TIF|TIFF|WAV|WMA|WRI|XLA|XLS|XLSX|XLT|XLW|ZIP)$">
    FileETag None
    <IfModule mod_headers.c>
         Header set X-Powered-By "W3 Total Cache/0.9.2.3"
    </IfModule>
</FilesMatch>

RewriteEngine On
#This may cause issues with subdirs and so it is not enabled by default.
RewriteBase /

#Make sure the whole site goes to www.mysite.com instead of mysite.com. This is good for the search engines
#Edit and uncomment the below lines for your own site.
#Make sure to replace icyphoenix.com with your site address.
#RewriteCond %{HTTP_HOST} ^yourdomain.com
#RewriteRule (.*) http://www.yourdomain.com/$1 [R=301,L]

#Permanent redirection (the first line is the old domain, the second one is the new domain)
#RewriteCond %{HTTP_HOST} ^yourdomain.com [NC]
#RewriteCond %{HTTP_HOST} ^www.yourdomain.com [NC]
#RewriteRule ^(.*)$ http://www.yourdomain.com.com/$1 [R=301,L]

########## Rewrite rules to block out some common exploits - BEGIN
#
# Block out any script trying to set a mosConfig value through the URL
RewriteCond %{QUERY_STRING} mosConfig_[a-zA-Z_]{1,21}(=|\%3D) [OR]
# Block out any script trying to base64_encode crap to send via URL
RewriteCond %{QUERY_STRING} base64_encode.*\(.*\) [OR]
# Block out any script that includes a <script> tag in URL
RewriteCond %{QUERY_STRING} (\<|%3C).*script.*(\>|%3E) [NC,OR]
# Block out any script trying to set a PHP GLOBALS variable via URL
RewriteCond %{QUERY_STRING} GLOBALS(=|\[|\%[0-9A-Z]{0,2}) [OR]
# Block out any script trying to modify a _REQUEST variable via URL
RewriteCond %{QUERY_STRING} _REQUEST(=|\[|\%[0-9A-Z]{0,2})
# Send all blocked request to homepage with 403 Forbidden error!
RewriteRule ^(.*)$ index.php [F,L]
#
########## Rewrite rules to block out some common exploits - END


# Block if useragent and referer are unknown.
# the referer string can cause some problems with mozilla so it has been disabled
#RewriteCond %{HTTP_REFERER} ^.*$ [OR]
#RewriteCond %{HTTP_REFERER} ^-$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^-$ [OR]

# You may want to enable these lines below to disallow php and perl scripts to access your site
#RewriteCond %{HTTP_USER_AGENT} ^.*PHP.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*libwww-perl [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^libwww.* [NC]
RewriteRule .* - [F,L]

#SetEnvIfNoCase User-Agent "^libwww-perl*" block_bad_bots
#Deny from env=block_bad_bots

### VIRUS - EXPLOITS - BEGIN
# SANTY
RewriteCond %{HTTP_REFERER} ^.*$
RewriteRule ^.*%27.*$ http://127.0.0.1/ [redirect,last]
RewriteRule ^.*%25.*$ http://127.0.0.1/ [redirect,last]
RewriteRule ^.*rush=.*$ http://127.0.0.1/ [redirect,last]
RewriteRule ^.*echr.*$ http://127.0.0.1/ [redirect,last]
RewriteRule ^.*esystem.*$ http://127.0.0.1/ [redirect,last]
RewriteRule ^.*wget.*$ http://127.0.0.1/ [redirect,last]
RewriteCond %{HTTP_COOKIE}% s:(.*):%22test1%22%3b
RewriteRule ^.*$ http://127.0.0.1/ [R,L]

# Prevent perl user agent (most often used by santy)
RewriteCond %{HTTP_USER_AGENT} ^lwp.* [NC]
RewriteRule ^.*$ http://127.0.0.1/ [R,L]

# This ruleset is to "stop" stupid attempts to use MS IIS expolits on us
# NIMDA
RewriteCond %{REQUEST_URI} /(admin�cmd�httpodbc�nsiislog�root�shell)\.(dll�exe) [NC]
RewriteRule !(error\.php|robots\.txt) /error.php?mode=nimda [L,E=HTTP_USER_AGENT:NIMDA_EXPLOIT,T=application/x-httpd-cgi]


# User-Agents with no privileges (mostly spambots/spybots/offline downloaders that ignore robots.txt)
# These bots are anoying website harvesting tools, webdownloaders, and a few misc annoyances.

# Rude Bots - BEGIN
### All bots removed to speed up things in htaccess...
# Rude Bots - END

# SPAM Referers - BEGIN
### All bots removed to speed up things in htaccess...
# SPAM Referers - END

# IE's "make available offline" mode
RewriteCond %{HTTP_USER_AGENT} MSIECrawler [OR]

# Various
RewriteCond %{REQUEST_URI} ^/(bin/|cgi/|cgi\-local/|cgi\-bin/|sumthin) [NC,OR]
RewriteCond %{THE_REQUEST} ^GET\ http [NC,OR]
RewriteCond %{REQUEST_METHOD}!^(GET|HEAD|POST) [NC,OR]

# Cyveillance is a spybot that scours the web for copyright violations and ?damaging information? on
# behalf of clients such as the RIAA and MPAA. Their robot spoofs its User-Agent to look like Internet
# Explorer, and it completely ignores robots.txt. So it has been banned it by IP address.
RewriteCond %{REMOTE_ADDR} ^63\.148\.99\.2(2[4-9]|[34][0-9]|5[0-5])$ [OR]
RewriteCond %{REMOTE_ADDR} ^63\.226\.3[34]\. [OR]
RewriteCond %{REMOTE_ADDR} ^63\.212\.171\.161$ [OR]
RewriteCond %{REMOTE_ADDR} ^65\.118\.41\.(19[2-9]|2[01][0-9]|22[0-3])$ [OR]

# NameProtect peddles their ?online brand monitoring? to unsuspecting and gullible companies
# looking for people to sue. Despite the claims on their robot information page, they do not
# respect robots.txt; in fact, they spoof their User-Agent in multiple ways to avoid detection.
# I have banned them by User-Agent and IP address.
RewriteCond %{REMOTE_ADDR} ^12\.148\.196\.(12[8-9]|1[3-9][0-9]|2[0-4][0-9]|25[0-5])$ [OR]
RewriteCond %{REMOTE_ADDR} ^12\.148\.209\.(19[2-9]|2[0-4][0-9]|25[0-5])$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^NPBot [NC,OR]

# This ruleset is for formmail script abusers...
# We don't use Perl for Postnuke so this is not really needed.
RewriteCond %{REQUEST_URI} (mail.?form|form|form.?mail|mail|mailto)\.(cgi�exe�pl)$ [NC]

# Used to send these bots to someplace else you can change the url to whatever you would like
#RewriteRule .* http://www.facebook.com/ [F,R,L]
#RewriteRule /* http://www.geocities.com/WestHollywood/Heights/3204/1home.html [L,R]
#RewriteRule !(errors\.php|robots\.txt) /errors.php?code=404 [L,E=HTTP_USER_AGENT:BAD_USER_AGENT]
#RewriteRule !(errors\.php|robots\.txt) /errors.php?code=404 [L,E=HTTP_USER_AGENT:FORMMAIL_EXPLOIT,T=application/x-httpd-cgi]
# This could also be used to simply deny access to your site instead of the one above
#RewriteRule .* - [F,L]

You can customize that to fit your needs a little more but it should be a drop-in addition to what ever you already do or don't have.
 
Last edited:
Don't do it, it's not your job to cleanup the mess the search engines make when they cache everything under the sun. If you don't want to be indexed then go with the robots.txt disallow, otherwise it aint your pig and aint your farm. You are not the manager or responsible for the actions of any search engine.

You can also use this part of the person's request to prove BS in their legal claim, as any attorney they would hire would immediately see the futility in making such a request. They're just trying to scare you, and apparently you scare easily. I'm watching you. Boooo!! :p

I'd just ignore them, and if you get any paperwork just write return to sender on the envelope.
 
I agree with Pinky, and this happens a lot! Just don't reproduce (ea. copy/paste) anything and be careful of trademarks and you will be fine.

I actually know someone that was scared out of a domain name due to alleged trademark infringement. That was part of the reason my first domain name is pretty much my name. trademark 1983 LOL :b
 
Back