Thursday 30 May 2013

What is Robots.txt

Robots.txt
It is great when search engines frequently visit your site and index your content but often there are cases when indexing parts of your online content is not what you want. For instance, if you have two versions of a page (one for viewing in the browser and one for printing), you'd rather have the printing version excluded from crawling, otherwise you risk being imposed a duplicate content penalty. Also, if you happen to have sensitive data on your site that you do not want the world to see, you will also prefer that search engines do not index these pages (although in this case the only sure way for not indexing sensitive data is to keep it offline on a separate machine). Additionally, if you want to save some bandwidth by excluding images, stylesheets and javascript from indexing, you also need a way to tell spiders to keep away from these items.
One way to tell search engines which files and folders on your Web site to avoid is with the use of the Robots metatag. But since not all search engines read metatags, the Robots matatag can simply go unnoticed. A better way to inform search engines about your will is to use a robots.txt file.

What Is Robots.txt?
Robots.txt is a text (not html) file you put on your site to tell search robots which pages you would like them not to visit. Robots.txt is by no means mandatory for search engines but generally search engines obey what they are asked not to do. It is important to clarify that robots.txt is not a way from preventing search engines from crawling your site (i.e. it is not a firewall, or a kind of password protection) and the fact that you put a robots.txt file is something like putting a note “Please, do not enter” on an unlocked door – e.g. you cannot prevent thieves from coming in but the good guys will not open to door and enter. That is why we say that if you have really sensitive data, it is too naïve to rely on robots.txt to protect it from being indexed and displayed in search results.
The location of robots.txt is very important. It must be in the main directory because otherwise user agents (search engines) will not be able to find it – they do not search the whole site for a file named robots.txt. Instead, they look first in the main directory and if they don't find it there, they simply assume that this site does not have a robots.txt file and therefore they index everything they find along the way. So, if you don't put robots.txt in the right place, do not be surprised that search engines index your whole site.
The concept and structure of robots.txt has been developed more than a decade ago and if you are interested to learn more about it, visit http://www.robotstxt.org/ or you can go straight to the Standard for Robot Exclusion because in this article we will deal only with the most important aspects of a robots.txt file. Next we will continue with the structure a robots.txt file.

Structure of a Robots.txt File
The structure of a robots.txt is pretty simple (and barely flexible) – it is an endless list of user agents and disallowed files and directories. Basically, the syntax is as follows:
User-agent: Disallow:
“User-agent” are search engines' crawlers and disallow: lists the files and directories to be excluded from indexing. In addition to “user-agent:” and “disallow:” entries, you can include comment lines – just put the # sign at the beginning of the line:
# All user agents are disallowed to see the /temp directory.
User-agent: *
Disallow: /temp/

The Traps of a Robots.txt File
When you start making complicated files – i.e. you decide to allow different user agents access to different directories – problems can start, if you do not pay special attention to the traps of a robots.txt file. Common mistakes include typos and contradicting directives. Typos are misspelled user-agents, directories, missing colons after User-agent and Disallow, etc. Typos can be tricky to find but in some cases validation tools help.
The more serious problem is with logical errors. For instance:
User-agent: *
Disallow: /temp/
User-agent: Googlebot
Disallow: /images/
Disallow: /temp/
Disallow: /cgi-bin/
The above example is from a robots.txt that allows all agents to access everything on the site except the /temp directory. Up to here it is fine but later on there is another record that specifies more restrictive terms for Googlebot. When Googlebot starts reading robots.txt, it will see that all user agents (including Googlebot itself) are allowed to all folders except /temp/. This is enough for Googlebot to know, so it will not read the file to the end and will index everything except /temp/ - including /images/ and /cgi-bin/, which you think you have told it not to touch. You see, the structure of a robots.txt file is simple but still serious mistakes can be made easily.

Read more!

Sunday 3 March 2013

Google vs Yahoo

If one looks around, he may found two different styles of working - one organizing ways of getting in the market which sometimes makes Google superior than Yahoo . everything in a big pile and leaving it to you to sort out and find what you need .
In other words , gathering everything under one roof . Next approach , gathering neat piles of everything . Putting in front of you what you actually want . Google and Yahoo work with the above mentioned styles respectively . Along with this , Google and Yahoo! have entirely different Many of the Yahoo fans wont like this , i know! :P
I studied briefly the statistics , the proactive approach Google and Yahoo take and the way they market their firms . While studying it all , I found one thing very clear everywhere - Google is a developer and Yahoo is a user . Yes .
Yahoo's Strategy So Far:

Lets talk about Yahoo side of the story first . Yahoo has always believed in partnerships in order to drive traffic . In 2006 , as we all know , Yahoo made a $1+billion bid for Facebook . It was one of the most crucial steps for Yahoo! to drive public through a social networking site . But unfortunately , Yahoo didn't make it at the end as Microsoft had offered a more powerful amount of money ($40 billion that is ) :P
Yahoo had always been aware of the fact that it cannot develop a social media website to counter Google's products , and that is why , Yahoo created partnership with social media sites . In 2009 , Yahoo announced a sort-of-connection with Facebook Connect by integrating it in its popular web properties . Through this , Yahoo would be able to get traffic generated by sharing articles and other things on Facebook.
Moreover , Yahoo has also been a partner of Twitter . Yahoo gave users the feasibility to access Twitter from within Yahoo . So we see that Yahoo instead of competing with the social media , believes in partnering with them so as to generate traffic through them .
Google's Strategy So Far:
Now lets discuss Google in detail : Google has always been focusing on developing things on its own e.g Google Buzz . Google has focused on technology unlike Yahoo , who believes in partnership . Doesn't it look more like Development vs Partnership ? Mind taking a look ?
In Oct 2006 , Google buys YouTube . Same year , 2006 , Yahoo and eBay formed a partnership . In 2008 , Google launched Google Chrome . The very next year , 2009 , Yahoo formed a search partnership with Microsoft . Later in Feb 2010 , Google came up with Google Buzz . Same year , Yahoo and Twitter formed a partnership . Moreover , Google is launching an application Google Wallet this summer . While , there are no latest updates of Yahoo (of this summer) in my knowledge .
So you see how well-planned Google has always been . Always taking a very calculated step in developing applications which always grabbed the attention of people , while Yahoo continued on partnerships . Google's strategy is continuously succeeding since the launch of Google Buzz because it is one of the most complete product Google has presented up till now . As compared to Yahoo , Google has developed better technical products in a short duration ( Buzz , Chrome , Wallet , Android etc. ) . Plus , Google is working more on smartphones now , which is going to be the next biggest target for searching .
To conclude , I would say that COMPARING the two firms is really difficult . I and you all can never judge just on the basis of search engine results that Yahoo is better or Google is better . It all relates to the strategies they follow . And they have entirely different strategies . Neither company's strategy or direction is wrong . Its just that they need different social strategies to succeed .

Read more!

Friday 22 February 2013

Web Directories and Specialized Search Engines

SEO experts spend most of their time optimizing for Google and occasionally one or two other search engines. There is nothing wrong in it and it is most logical, having in mind that topping Google is the lion's share in Web popularity but very often, no matter what you do, topping Google does not happen. Or sometimes, the price you need to pay (not literally but in terms of effort and time) to top Google and keep there is too high. Maybe we should mention here the ultimate SEO nightmare – being banned from Google, when you simply can't use Google (or not at least until you are readmitted to the club) and no matter if you like it or not, you need to have a look about possible alternatives.
What are Google Alternatives

The first alternative to Google is obvious – optimize for the other major search engines, if you have not done it already. Yahoo! and MSN (to a lesser degree) can bring you enough visitors, though sometimes it is virtually impossible to optimize for the three of them at the same time because of the differences in their algorithms. You could also optimize your site for (or at least submit to) some of the other search engines (Lycos, Excite, Netscape, etc.) but having in mind that they altogether hardly have over 3-5% of the Web search traffic, do not expect much.

Naming all Google alternatives would be a long list and it is outside the scope of this article but just to be a little more precise about what alternatives exist, we cannot skip SEO instruments like posting to blogs and forums or paid advertisements.

Web Directories

What is a Web Directory?

Although many Web directories offer a search functionality of some kind (otherwise it will be impossible to browse thousands of pages for let's say Computers), search directories are fundamentally different from search engines in the two ways – most directories are edited by humans and URLs are not gathered automatically by spiders but submitted by site owners. The main advantage of Web directories is that no matter how clever spiders become, when there is a human to view and check the pages, there is a lesser chance that pages will be classified in the wrong categories. The disadvantages of the first difference are that the lists in web directories are sometimes outdated, if no human was available to do the editing and checking for some time (but this is not that bad because search engines also deliver pages that do not exist anymore) and that sometimes you might have to wait half an year before being included in a search directory.

The second difference – no spiders – means that you must go and submit your URL to the search directory, rather than sit and wait for the spider to come to your site. Fortunately, this is done only once for each directory, so it is not that bad.

Once you are included in a particular directory, in most cases you can stay there as long as you wish to and wait for people (and search engines) to find you. The fact that a link to your site appears in a respectable Web directory is good because first, it is a backlink and second, you increase your visibility for spiders, which in turn raises your chance to be indexed by them.

Examples of Web Directories

There are hundreds and thousands of search directories but undoubtedly the most popular one is DMOZ. It is a general purpose search directory and it accepts links to all kinds of sites. Other popular general-purpose search directories are Google Directory and Yahoo! Directory. The Best of the Web is one of the oldest Web directories and it still keeps to high standards in selecting sites.

Besides general-purpose Web directories, there are incredibly many topical ones. For instance, the The Environment Directory lists links to environmental sites only, while The Radio Directory lists thousands of radio stations worldwide, arranged by country, format, etc. There are also many local and national Web directories, which accept links to sites about a particular region or country only and which can be great if your site is targeted at local and national audience only. You see, it is not possible to mention even the topics of specialized search directories only because the list will get incredibly long. Using Google and specialized search resources like The Search Engines Directory, you can find on your own many directories that are related to your area of interest.

Specialized Search Engines

What is a Specialized Search Engine?

Specialized search engines are one more tool to include in your SEO arsenal. Unlike general-purpose search engines, specialized search engines index pages for particular topics only and very often there are many pages that cannot be found in general-purpose search engines but only in specialized ones. Some of the specialized search engines are huge sites that actually host the resources they link to, or used to be search directories but have evolved to include links not only to sites that were submitted to them. There are many specialized search engines for every imaginable topic and it is always wise to be aware of the specialized search engines for your niche. The examples in the next section are by no means a full list of specialized search engines but are aimed to give you the idea of what is available. If you search harder on the Web, you will find many more resources.

Examples of Specialized Search Engines


Read more!

Sunday 13 January 2013

What is SEO


Whenever you enter a query in a search engine and hit 'enter' you get a list of web results that contain that query term. Users normally tend to visit websites that are at the top of this list as they perceive those to be more relevant to the query. If you have ever wondered why some of these websites rank better than the others then you must know that it is because of a powerful web marketing technique called Search Engine Optimization (SEO).
SEO is a technique which helps search engines find and rank your site higher than the millions of other sites in response to a search query. SEO thus helps you get traffic from search engines.

This SEO tutorial covers all the necessary information you need to know about Search Engine Optimization - what is it, how does it work and differences in the ranking criteria of major search engines.

How Search Engines Work

The first basic truth you need to know to learn SEO is that search engines are not humans. While this might be obvious for everybody, the differences between how humans and search engines view web pages aren't. Unlike humans, search engines are text-driven. Although technology advances rapidly, search engines are far from intelligent creatures that can feel the beauty of a cool design or enjoy the sounds and movement in movies. Instead, search engines crawl the Web, looking at particular site items (mainly text) to get an idea what a site is about. This brief explanation is not the most precise because as we will see next, search engines perform several activities in order to deliver search results – crawling, indexing, processing, calculating relevancy, and retrieving

First, search engines crawl the Web to see what is there. This task is performed by a piece of software, called a crawler or a spider (or Googlebot, as is the case with Google). Spiders follow links from one page to another and index everything they find on their way. Having in mind the number of pages on the Web (over 20 billion), it is impossible for a spider to visit a site daily just to see if a new page has appeared or if an existing page has been modified, sometimes crawlers may not end up visiting your site for a month or two.

What you can do is to check what a crawler sees from your site. As already mentioned, crawlers are not humans and they do not see images, Flash movies, JavaScript, frames, password-protected pages and directories, so if you have tons of these on your site, you'd better run the Spider Simulator below to see if these goodies are viewable by the spider. If they are not viewable, they will not be spidered, not indexed, not processed, etc. - in a word they will be non-existent for search engines.

After a page is crawled, the next step is to index its content. The indexed page is stored in a giant database, from where it can later be retrieved. Essentially, the process of indexing is identifying the words and expressions that best describe the page and assigning the page to particular keywords. For a human it will not be possible to process such amounts of information but generally search engines deal just fine with this task. Sometimes they might not get the meaning of a page right but if you help them by optimizing it, it will be easier for them to classify your pages correctly and for you – to get higher rankings.

When a search request comes, the search engine processes it – i.e. it compares the search string in the search request with the indexed pages in the database. Since it is likely that more than one page (practically it is millions of pages) contains the search string, the search engine starts calculating the relevancy of each of the pages in its index with the search string.

There are various algorithms to calculate relevancy. Each of these algorithms has different relative weights for common factors like keyword density, links, or metatags. That is why different search engines give different search results pages for the same search string. What is more, it is a known fact that all major search engines, like Yahoo!, Google, Bing, etc. periodically change their algorithms and if you want to keep at the top, you also need to adapt your pages to the latest changes. This is one reason (the other is your competitors) to devote permanent efforts to SEO, if you'd like to be at the top.

The last step in search engines' activity is retrieving the results. Basically, it is nothing more than simply displaying them in the browser – i.e. the endless pages of search results that are sorted from the most relevant to the least relevant sites.

Differences Between the Major Search Engines

Although the basic principle of operation of all search engines is the same, the minor differences between them lead to major changes in results relevancy. For different search engines different factors are important. There were times, when SEO experts joked that the algorithms of Bing are intentionally made just the opposite of those of Google. While this might have a grain of truth, it is a matter a fact that the major search engines like different stuff and if you plan to conquer more than one of them, you need to optimize carefully.

There are many examples of the differences between search engines. For instance, for Yahoo! and Bing, on-page keyword factors are of primary importance, while for Google links are very, very important. Also, for Google sites are like wine – the older, the better, while Yahoo! generally has no expressed preference towards sites and domains with tradition (i.e. older ones). Thus you might need more time till your site gets mature to be admitted to the top in Google, than in Yahoo!.

Read more!