SpiderBomb Read Me

Contents
 

Pay Per Click Affiliate programs-Be sure to visit the DomBomber Forum for the latest news and insights into tons of PPC affiliate programs. You'll find free and low cost resources to add revenue producing content to your pages.

We'll show you how to add free and low cost pay per click links so you can profit from your SpiderBomb Built directory traffic.


Directories/Folders:

libwww - All needed perl modules are included!
cache - Used for SpiderBomb to store info it needs
data -
output - Where your text file databases are stored
templates - Used only for admin. No need to alter
XIT -
dmoz_cache - Stores ODP/DMOZ results to increase speed
Files:
spider.cgi - The actual SpiderBomb script
odpbomb.cgi - Script to add ODP/DMOZ links to your site.
test.shtml - Use this to test to see if you're getting SpiderBomb results pulled correctly.
spiderbomb-readme.html (this file)
.htaccess -  Allows you to use .html instead of .shtml for your pages. Needs to be uploaded to the folder where your web pages are stored on your host.


Installation:

    1. 1  Open spider.cgi in a text editor
    2. 2  Check/change path to perl  - if you don't know your path to perl, you'll need to ask them what it is.
    3. 3  Set Password
    4. 4  Save
    5. 5  Repeat 1 and 2 with odpbomb.cgi
    6. 6  Upload the spiderbomb cgi folder using the ASCII/text mode of your FTP program.
    7. 7  CHMOD all directories (folders) to 777
    8. 8  CHMOD spider.cgi to 755 (some servers need 775 or 777)


    Using a web browser, open www.yourdomain.com/cgi-bin/spiderbomb/spider.cgi and enter the password you chose to enter the admin interface. (Note: Your URL may be different than the example due to different server setups.

    Newbies: For help setting permissions, please visit our:
    DomBom Newbie Install Guide.
    (Scroll down the page to: Using LeechFTP to install DomBom ProfitBombs, which explains how to "set permissions".)
     
     

Web Admin
 
Starting URL:
Tell SpiderBomb EXACTLY where to start.

This is a very important feature, the very foundation of the quality of your resources. The better you target your starting points, the higher quality of resources you will have.

Some common starting points may be Google, Open Directory Project or other major search engine or directory.

There's also a few little tricks you can do to help your efficiency. For example, some engines such as Google allow you to set how many results you want to have shown for each search, or filter out sites with certain types of language.

For Google, you do it from their preferences:
http://www.google.com/preferences?hl=en

Be sure to set the "Number of Results" to 100 (the max.). This allows SpiderBomb to get more more links from Google (and others) each time SpiderBomb asks them for a page of results.

When you select a starting point, spend five minutes doing a little research looking for a good and relevant starting point based on keywords relative to your site.

For example, if finding a starting place for "television shows", you might dig down the Google Directory until you got to this page:
http://directory.google.com/Top/Arts/Television/Programs/

Then, just enter the url into SpiderBomb's "Starting Point".

Or, you can enter the exact URL from a search engines' search results. For example, it you go to Google.com and search for:
television shows

The URL for that page looks like:
http://www.google.com/search?num=100&hl=en&lr=&ie=UTF-8&oe=UTF-8&q=television shows&btnG=Google Search

Note: Sometimes URLs become too long for SpiderBomb. This is easy to shorten as much of the stuff above isn't needed. Only the part of the URL above in red is needed.

So if we strip out all the junk, we have:

http://www.google.com/search?num=100&q=television shows
(you can just paste this into SpiderBomb, remembering to change the keywords at the end)

Just change television shows to whatever keywords you want to use to find content for. Remember to use a (plus) sign in place of blank spaces between words.


Caution: Continual spidering of Google and other search engines could get your site banned.

Please read the section on Exclude URL's from Domains for info on how to be responsible, and safe.


Output File:
Lets you easily use an already existing list of links, or creates entirely new databases

If it's your first time, you will want to create a new file. Just pick a name that you can remember and enter it into the first Output File box.

It's probably best to use the .txt extension, for example:
television-shows.txt

If you've already created a database, just select it from the drop-down list on the right.

If Output File Exists:
SpiderBomb gives you two choices: Overwrite or Append . If you've already created a database of links, you can either "start from scratch", erasing your old data and building a database of freshly indexed content, or you can add to an existing file, building an ever larger supply of content.

Include URL's Only from Domains:
You can limit which domains you want to include. This is great for building a site map of your pages. And if you have more than one webpage, since you can enter a number of domains, you can even include all of your domains.

Exclude URL's from Domains:
You also have the control to not include what ever domain you want. Maybe it's your "evil competition" or a domain is giving irrelevant results, it's your party and you're the one sending out the invitations.

Warning: You don't want to keep spidering Google and other engines over and over. If you use Google (for example) as a starting point, it is probably best to use a search results page as your starting point, but ALSO add Google.com to Exclude URL's from Domains:

SpiderBomb will then use this Google page as a starting point, but it won't follow any more links to Google.com that may be on that page.

For example:
Starting Point = http://directory.google.com/Top/Arts/Television/Programs?
Exclude URLS from Domains = google.com

Again, this tells SpiderBomb to go to the starting point page and only follow links that are NOT ON GOOGLE.COM.

Remember, spidering pages and crawling links is exactly what the major engines do to your site. You are doing nothing immoral or unethical. The search engines also spider each others pages for links.

If a search engine doesn't want its pages spidered, it has the option of using a No Robots file that tells SpiderBomb and other spiders not to include this page in their results. Which brings us to....

Respect Robots Rules: Yes No
Be a good Net Citizen. Some webmasters don't want their pages indexed by search engines for a variety of reasons. SpiderBomb has the option to respect respect or ignore webmaster's requests. Please note: The only reason you should EVER NOT respect the NO ROBOTS is WITH YOUR OWN PAGES, or from those you have permission.

Remember, spidering pages and crawling links is exactly what the major engines do to your pages.

Follow CGI Queries: Yes No
Note: CGI Queries are generally defined as URLs containing a ? (question mark).
SpiderBomb gives you the ability to include or ignore dynamic pages, just set "Follow CGI Queries" to "yes or no". Note: Be sure to limit total number of requests, as you don't want to get into a never ending "loop". The default of 10,000 is probably fine in most cases, but this depends on many factors, such as your particular hosting situation.

Spidering CGI Queries is good for finding dynamic content, such as a forum or DomBom content, etc.

There are times when you may not want to follow CGI Queries. One example may be that many banner and other advertising methods will use CGI queries in their links, so by not following cgi queries you may be limited the number of affiliate programs and other advertising media.

Depth Limit: 1 - 10 levels
This tells SpiderBomb how "deep" you want to crawl pages. Simply, find links on this page, follow links found here, then follow links found there, then follow those even further...The "deeper" you crawl, the more results you will find. However, your results will also tend to be less relevant. However, if you're spidering your own sites, set it to the "max".

Max. Number of Results:
This is the total number of links that you've spidered that you want included this session. If you're on a discount hosting service, this is one way to limit SpiderBomb.

If you have a discount web host, you may want to limit this number to 1000.

Max. Number of Requests:
This is the total number of times you want to "crawl" or visit web pages. This isn't always the same as "Results" because some pages that SpiderBomb visits won't be indexed due to Robots.Txt, you've entered the domain in the "exclude" form, etc.

If you have a discount web host, you may want to limit this number to 1000.

Max. Number of Parallel Requests:
SpiderBomb gives you the power to visit more than one page at a time. The higher you set this, the faster you can index pages. Warning: The higher you set this, the more server power you will need.

Max. CPU Time (seconds):
This let's you control exactly how much of the server's CPU you use. These allows extreme control for a wide range of hosting solutions. Please note that each host has its own quota.

If you have a discount web host, you may want to monitor this closely.

Max. Absolute Time (seconds):
This is the "elapsed time", or how much "real time" has gone by since you started the last session.

Delay Between Requests (seconds):
It is proper etiquette to not just "pound away" at other's webpages, sucking in data (and their resources) as fast as you can. Plus, you can be more "polite" to your own server, by using a delay in your requests. Of course, if the situation warrants it, you can always set this option to "0", which will give you the fastest results.

Password Protected Admin
Set/reset the password by opening spiderbomb.cgi in a text editor.  You'll see:
$PASSWORD = 'test';

Just change whatever is here to whatever you want. Be sure not to delete the ' (apostrophes).
 


Working With Your Data
 

SpiderBomb has two ways for you to view your data:

Results and Search - Both are available from the SpiderBomb Web Admin.

Results - First choose a file from the drop-down menu, then select the number of results you want per page. This will show all results in the order they were found.

Search - This allows you to scour your data to see if you have enough quality and quantity of relevant content for particular keywords and phrases.

Warning: SpiderBomb's search feature is for your use and not your visitors. Since your goal is not to have a huge database, you probably won't have enough content to justify a full-fledged search engine for your own data.

If you want a search engine, we recommend using on of the pay per click solutions (See DomBom Forum) were not only are the results much more varied than your own, they can also earn you cash.
 

How to add SpiderBomb results directly to html pages using a simple tag:

Here's the tag you need to paste into your html page where you want the SpiderBomb results to be displayed:
 

<!--#include virtual="cgi-bin/spiderbomb/spider.cgi/fast_search?keywords=sit coms&file=television.txt&results=12" -->
Important Note: When pasting these SSI tags into your pages, be sure that the entire tag is on ONE line, and not broken.
 
There's three parameters, or controls, that you need to adjust:

1. keywords=your subject   Just insert the keywords for the content you want inserted in your pages. Remember to include the (plus sign) in place of blank spaces between words.

2. file=television.txt   Replace television.txt with the name of any data file that you've created.

3. results=12   Simply change 12 to any number of results you wish to display and SpiderBomb will insert this many links into your page, assuming there are enough search matches.
 

ODPBomb:
Inserting ODP/DMOZ content into html page uses a similar tag:
<!--#include virtual="cgi-bin/spiderbomb/odpbomb.cgi?keywords=sit coms&results=12" -->
The main difference is that odpbomb.cgi doesn't need the "file=equal"...So Just paste the above line into the html of a webpage, making sure it is all on one line.

You can mix and match these tags. You can include SpiderBomb tags using different keywords and/or text files, along with different combinations of odpbomb results.

Be creative and experiment with different combinations
 

.htaccess
 
Important note: .htaccess - If you to use .html instead of .shtml for your pages, .htaccess needs to be uploaded to the folder where your web pages are stored on your host.

There are two different types of .htaccess files included with SpiderBomb:

1. Used to tell the server to show SSI on .html pages (instead of only .shtml pages). This htaccess file needs to be uploaded ONLY to directories containing web pages.

Entire contents for  .htaccess html version = AddHandler server-parsed .html
 

2. To restrict access to your Spiderman Admin and data. These .htaccess files are in the folders that go in your cgi-bin.

Entire contents for  .htaccess cgi/admin versions Deny from all
 

If you're in doubt about a particular .htaccess file, just open it with a text editor and look at it. If you look at the contents, you should be able to tell that one deals with "html" and the other "denies".

Sometimes you can't see .htaccess files when they are on your server. If you believe you may be having trouble due to an .htaccess file, just upload another one. Even though you can't see it, it will "over write" (replace) the exisitng .htaccess file.
 

Misc.
 
Be Legal...
SpiderBomb indexes Websites Instead of Stealing Content-There are some programs that merely go to a search engine and copy its search results. Using the exact searches and in the same order is probably illegal, and at the least immoral. SpiderBomb doesn't steal content. It uses "starting points" you define, then wanders the web collecting data. This is exactly what Google, Inktomi and others do, and it is how SpiderBomb collects data. Note: It's rumored that Google may use ODP as its "starting point".

Be Respectful...
Uses Title tags and Meta Description Tags-After SpiderBomb visits a page, it will extract each pages' Title and Meta Descriptions, and use these for your search results. It is traditional on the web that the purpose of these two tags are for the use of other sites as a means to describe their pages. By using the Title and Meta Description of each page, you are respecting the wishes of each webmaster.

Be Honest...Spiderbomb content pages don't need to hide links or play other search engine tricks.

Create Site Maps-Just set SpiderBomb to include only links from your domain(s) and it will create a site map just for you. Site maps increase your usability, as well as greatly increasing your ability to get indexed by the major engines.

Spy On Your Competition- Discover your competition's link strategy, hidden pages, external links and more...Set their page as the "starting point" and put their domain name in the "Include URL's Only from Domains" and you'll be able to see their link structure.

Check your "Indelibility Ability"- Can't get in the engines? Use SpiderBomb to find your pages. If SpiderBomb can't find them, then chances are, no one can.

Perl Modules Built In-All required perl modules are included so you don't need to ask your host if they have them installed.
 

For more help, tips and secrets visit the DomBomber Forum

Remember...CherryBomb works great with all other DomBom stuff:
 

PowerBomb-Incoming! Finds the most explosive (and profitable) keywords.

CherryBomb-The Mother of all Bombs. Adds revenue streams to your pages by adding any pay per click affiliate that uses XML feeds. Flexible, versatile and profitable.

DomBom-The Original Mix and Match Cluster Bomb system exploits a vast array of money making content.

PageBomb-Weapon of Mass Production. Creates massive of amounts of DomBom pages, exploiting the power of the other bombs by creating real web pages really fast.