Thursday, January 26, 2012

Title, Meta Tags and Keyword Analysis

Title

How To See About Your Page's Title, Meta Tags and Keywords like image bellow ?

Open your website
right click anywhere on the page
and then select View Page Source


you'll see the whole things about your page.

Meta Tags



The meta tags are a very important part of the HTML code of your web page. They are read by the search engines but are not displayed as a part of your web page design. Usually they include a concise summary of the web page content and you should include your relevant keywords in them. Most meta tags are included within the 'header' code of a website. The most important tags are the title, description, keyword s and robot tags.

Keyword

How to optimize meta tags?  The title tag and the meta description and keywords tags should include keywords relevant to the content of the web page they describe. Besides that, you should consider the length and the order of the characters/words included in each of the meta tags. Note that the search engine robots read from left to right and those words that come first are more important than those that come towards the end of the page.

Title tag It could be said that the title is one of the most important factors for a successful search engine optimization of your website. Located within the section, right above the Description and Keywords tag, it provides summarized information about your website. Besides that, the title is what appears on search engines result page (SERP). The title tags should be between 10-60 characters. This is not a law, but a relative guideline - a few more symbols is not a problem. You won't get penalized for having longer title tags, but the search engine will simply ignore the longer part. (use microsoft office word for easy characters counting)

Meta Description tag The description tag should be written in such way that it will show what information your website contains or what your website is about. Write short and clear sentences that will not confuse your visitors. The description tag should be less than 200 characters. The meta description tag also has a great importance for the SEO optimization of your page. It is most important for the prospect visitor when looking at the search engine result page - this tag is often displayed there and helps you to distinguish your site from the others in the list.

Meta Keywords tag Lately, the meta keyword tag has become the least important tag for the search engines and especially Google. However, it is an easy way to reinforce once again your most important keywords. We recommend its usage as we believe that it may help the SEO process, especially if you follow the rules mentioned below. The keyword tags should contain between 4 and 10 keywords. They should be listed with commas and should correspond to the major search phrases you are targeting. Every word in this tag should appear somewhere in the body, or you might get penalized for irrelevance. No single word should appear more than twice, or it may be considered spam.

Meta Robots tag This tag helps you to specify the way your website will be crawled by the search engine. There are 4 types of Meta Robots Tag: Index, Follow - The search engine robots will start crawling your website from the main/index page and then will continue to the rest of the pages. Index, NoFollow - The search engine robots will start crawling your website from the main/index page and then will NOT continue to the rest of the pages. NoIndex, Follow - The search engine robots will skip the main/index page, but will crawl the rest of the pages. NoIndex, NoFollow - None of your pages will be crawled by the robot and your website will not be indexes by the search engines. If you want to be sure that all robots will crawl you website we advise you to add a "Index, follow" meta robot tag. Please note that most of the search engine crawlers will index your page staring from the index page, continuing to the rest of the pages, even if you do not have a robot tag. So if you wish your page not to be crawled or to be crawled differently use the appropriate robot tag.

How to edit meta tags? You can edit your meta tags through the File Manager in the cPanel of your hosting account. You need to edit the file of each web page. The file contains the HTML code of the page.
Read more ...

Search Engine Basics


Search Engine Basics

A very basic search engine includes a number of processing phases.
  • Crawling: to discover the web pages on the internet
  • Indexing: to build an index to facilitate query processing
  • Query Procesisng: Extract the most relevant page based on user's query terms
  • Ranking: Order the result based on relevancy

Search Engine Basics
Notice that each element in the above diagram reflects a logical function unit but not its physical boundary. For example, the processing unit in each orange box is in fact executed across many machines in parallel. Similarly, each of the data store element is spread physically across many machines based on the key partitioning.

Vector Space Model

Here we use the "Vector Space Model" where each document is modeled as a multi-dimensional vector (each word represents a dimension). If we put all documents together, we form a matrix where the rows are documents and columns are words, and each cell contains the TF/IDF value of the word within the document.

Search Engine Basics
To determine the similarity between 2 documents, we can apply the dot product between 2 documents and the result will represents the degree of similarity.

Crawler

Crawler's job is to collect web pages on the internet, it is typically done by a farm of crawlers, who do the following
Start from a set of seed URLs, repeat following ...
  1. Pick the URL that has the highest traversal priority.
  2. Download the page content from the URLs to the content repository (which can be a distributed file system, or DHT), as well as update the entry in the doc index
  3. Discover new URL links from the download pages. Add the link relationship into the link index and add these links to the traversal candidates
  4. Prioritize the traversal candidates
The content repository can be any distributed file system, here lets say it is a DHT.
There are a number of considerations.
  • How to make sure different Crawlers are working on different set of contents (rather than crawling the same page twice) ? When the crawler detects overlapping is happening (url is already exist in the page repository with pretty recent time), the crawler will skip the processing on this URL and pick up the next best URL to crawl.
  • How does the crawler determines which is the next candidate to crawl ? We can use a heuristic algorithm based on some utility function (e.g. we can pick the URL candidate which has the highest page rank score)
  • How frequent do we re-crawl ? We can track the rate of changes of the page to determine the frequency of crawling.

Indexer

The Indexer's job is to build the inverted index for the query processor to serve the online search requests.
First the indexer will build the "forward index"
  1. The indexer will parse the documents from the content repository into a token stream.
  2. Build up a "hit list" which describe each occurrence of the token within the document (e.g. position in the doc, font size, is it a title, archor text ... etc).
  3. Apply various "filters" to the token stream (like stop word filters to remove words like "a", "the", or a stemming filter to normalize words "happy", "happily", "happier" into "happy")
  4. Compute the term frequency within the document.
From the forward index, the indexer will proceed to build a reverse index (typically through a Map/Reduce mechanism). The result will be keyed by word and stored in a DHT.

Ranker

Ranker's job is to compute the rank of a document, based on how many in-links pointing to the document as well as the rank of the referrers (hence a recursive definition). Two popular ranking algorithms including the "Page Rank" and "HITs".
  • Page Rank Algorithm
Page rank is a global rank mechanism. It is precomputed upfront and is independent of the query

  • Search Engine BasicsHITS Algorithm
In HITS, every page is playing a dual role: "hub" role and "authority" role. It has two corresponding ranks on these two roles. Hub rank measures the quality of the outlinks. A good hub is one that points to many good authorities. Authority ranks measures the quality of my content. A good authority is one that has many good hubs pointing to.

Search Engine BasicsNotice that HITS doesn't pre-compute the hub and authority score. Instead it invoke a regular search engine (which only do TF/IDF matches but not ranking) to get a set of initial results (typically with a predefined fix size) and then expand this result set by tracing the outlinks into the expand result set. It also incorporate a fix size of inlinks (by sampling the inlinks into the initial result set) into the expanded result set. After this expansion, it runs an iterative algorithm to compute the authority ranks and hub ranks. And use the combination of these 2 ranks to calculate the ultimate rank of each page, usually pages with high hub rank will weight more than high authority rank.
Notice that the HITS algorithm is perform at query time and not pre-computed upfront. The advantage of HITS is that it is sensitive to the query (as compare to PageRank which is not). The disadvantage is that it perform ranking per query and hence expensive.

Query Processor

When user input a search query (containing multiple words), the query will be treated as a "query document". Relevancy is computed and combined with the rank of the document and return an ordered list of result.
There are many ways to compute the relevancy. We can consider only the documents that contains all the terms specified in the query. In this model, we search for each term (with the query) a list of document id and then do an intersection with them. If we order the document list by the document id, the intersection can be computed pretty efficiently.
Alternatively, we can return the union (instead of intersection) of all document and order them by a combination of the page rank TF/IDF score. Document that have more terms intersecting with the query will have a higher TF/IDF score.
In some cases, an automatic query result feedback loop can be used to improve the relevancy.
  1. In first round, the search engine will perform a search (as described above) based on user query
  2. Construct a second round query by expanding the original query with additional terms found in the return documents which has high rank in the first round result
  3. Perform a second round of query and return the result.

Outstanding Issues

Fighting the spammer is a continuous battle in search engine. Because of the financial value of being shown up in the first page of search result. Many spammers try to manipulate their page. Earlier attempt is to modify a page to repeat the terms many many times (trying to increase the TF/IDF score). The evolution of Page rank has mitigate this to some degree because page rank in based on "out-of-page" information that the site owner is much harder to manipulate.
But people use Link-farms to game the page rank algorithms. The ideas is to trade links between different domains. There is active research in this area about how to catch these patterns and discount their ranks.

Thanks : horicky

Read more ...

Introduction to Search Engine Optimization


Introduction to Search Engine Optimization


Search engines are one of the primary ways that Internet users find Web sites. That's why a Web site with good search engine listings may see a dramatic increase in traffic.
Everyone wants those good listings. Unfortunately, many Web sites appear poorly in search engine rankings or may not be listed at all because they fail to consider how search engines work.
In particular, submitting to search engines  is only part of the challenge of getting good search engine positioning. It's also important to prepare a Web site through "search engine optimization."
Search engine optimization means ensuring that your Web pages are accessible to search engines and are focused in ways that help improve the chances they will be found.

This next section provides information, techniques and a good grounding in the basics of search engine optimization. By using this information where appropriate, you may tap into visitors who previously missed your site.
The guide is not a primer on ways to trick or "spam" the search engines. In fact, there are not any "search engine secrets" that will guarantee a top listing. But there are a number of small changes you can make to your site that can sometimes produce big results.
Let's go forward and first explore the two major ways search engines get their listings; then you will see how search engine optimization can especially help with crawler-based search engines.

How Search Engines Work


The term "search engine" is often used generically to describe both crawler-based search engines and human-powered directories. These two types of search engines gather their listings in radically different ways.
Crawler-Based Search Engines
Crawler-based search engines, such as Google, create their listings automatically. They "crawl" or "spider" the web, then people search through what they have found.
If you change your web pages, crawler-based search engines eventually find these changes, and that can affect how you are listed. Page titles, body copy and other elements all play a role.
Human-Powered Directories
A human-powered directory, such as the Open Directory, depends on humans for its listings. You submit a short description to the directory for your entire site, or editors write one for sites they review. A search looks for matches only in the descriptions submitted.
Changing your web pages has no effect on your listing. Things that are useful for improving a listing with a search engine have nothing to do with improving a listing in a directory. The only exception is that a good site, with good content, might be more likely to get reviewed for free than a poor site.
"Hybrid Search Engines" Or Mixed Results
In the web's early days, it used to be that a search engine either presented crawler-based results or human-powered listings. Today, it extremely common for both types of results to be presented. Usually, a hybrid search engine will favor one type of listings over another. For example, MSN Search is more likely to present human-powered listings from LookSmart. However, it does also present crawler-based results (as provided by Inktomi), especially for more obscure queries.

The Parts Of A Crawler-Based Search Engine
Crawler-based search engines have three major elements. First is the spider, also called the crawler. The spider visits a web page, reads it, and then follows links to other pages within the site. This is what it means when someone refers to a site being "spidered" or "crawled." The spider returns to the site on a regular basis, such as every month or two, to look for changes.
Everything the spider finds goes into the second part of the search engine, the index. The index, sometimes called the catalog, is like a giant book containing a copy of every web page that the spider finds. If a web page changes, then this book is updated with new information.
Sometimes it can take a while for new pages or changes that the spider finds to be added to the index. Thus, a web page may have been "spidered" but not yet "indexed." Until it is indexed -- added to the index -- it is not available to those searching with the search engine.
Search engine software is the third part of a search engine. This is the program that sifts through the millions of pages recorded in the index to find matches to a search and rank them in order of what it believes is most relevant. You can learn more about how search engine software ranks web pages on the aptly-named How Search Engines Rank Web Pages page.

Major Search Engines: The Same, But Different
All crawler-based search engines have the basic parts described above, but there are differences in how these parts are tuned. That is why the same search on different search engines often produces different results. Some of the significant differences between the major crawler-based search engines are summarized on the Search Engine Features Page. Information on this page has been drawn from the help pages of each search engine, along with knowledge gained from articles, reviews, books, independent research, tips from others and additional information received directly from the various search engines.
Now let's look more about how crawler-based search engine rank the listings that they gather.

Whenever you enter a query in a search engine and hit 'enter' you get a list of web results that contain that query term. Users normally tend to visit websites that are at the top of this list as they perceive those to be more relevant to the query. If you have ever wondered why some of these websites rank better than the others then you must know that it is because of a powerful web marketing technique called Search Engine Optimization (SEO).
SEO is a technique which helps search engines find and rank your site higher than the millions of other sites in response to a search query. SEO thus helps you get traffic from search engines.
This SEO tutorial covers all the necessary information you need to know about Search Engine Optimization - what is it, how does it work and differences in the ranking criteria of major search engines.

1. How Search Engines Work

The first basic truth you need to know to learn SEO is that search engines are not humans. While this might be obvious for everybody, the differences between how humans and search engines view web pages aren't. Unlike humans, search engines are text-driven. Although technology advances rapidly, search engines are far from intelligent creatures that can feel the beauty of a cool design or enjoy the sounds and movement in movies. Instead, search engines crawl the Web, looking at particular site items (mainly text) to get an idea what a site is about. This brief explanation is not the most precise because as we will see next, search engines perform several activities in order to deliver search results – crawlingindexingprocessingcalculating relevancy, andretrieving.
First, search engines crawl the Web to see what is there. This task is performed by a piece of software, called a crawler or a spider (or Googlebot, as is the case with Google). Spiders follow links from one page to another and index everything they find on their way. Having in mind the number of pages on the Web (over 20 billion), it is impossible for a spider to visit a site daily just to see if a new page has appeared or if an existing page has been modified, sometimes crawlers may not end up visiting your site for a month or two.
What you can do is to check what a crawler sees from your site. As already mentioned, crawlers are not humans and they do not see images, Flash movies, JavaScript, frames, password-protected pages and directories, so if you have tons of these on your site, you'd better run theSpider Simulator below to see if these goodies are viewable by the spider. If they are not viewable, they will not be spidered, not indexed, not processed, etc. - in a word they will be non-existent for search engines.

After a page is crawled, the next step is to index its content. The indexed page is stored in a giant database, from where it can later be retrieved. Essentially, the process of indexing is identifying the words and expressions that best describe the page and assigning the page to particular keywords. For a human it will not be possible to process such amounts of information but generally search engines deal just fine with this task. Sometimes they might not get the meaning of a page right but if you help them by optimizing it, it will be easier for them to classify your pages correctly and for you – to get higher rankings.
When a search request comes, the search engine processes it – i.e. it compares the search string in the search request with the indexed pages in the database. Since it is likely that more than one page (practically it is millions of pages) contains the search string, the search engine startscalculating the relevancy of each of the pages in its index with the search string.
There are various algorithms to calculate relevancy. Each of these algorithms has different relative weights for common factors like keyword density, links, or metatags. That is why different search engines give different search results pages for the same search string. What is more, it is a known fact that all major search engines, like Yahoo!, Google, Bing, etc. periodically change their algorithms and if you want to keep at the top, you also need to adapt your pages to the latest changes. This is one reason (the other is your competitors) to devote permanent efforts to SEO, if you'd like to be at the top.
The last step in search engines' activity is retrieving the results. Basically, it is nothing more than simply displaying them in the browser – i.e. the endless pages of search results that are sorted from the most relevant to the least relevant sites.

2. Differences Between the Major Search Engines


Although the basic principle of operation of all search engines is the same, the minor differences between them lead to major changes in results relevancy. For different search engines different factors are important. There were times, when SEO experts joked that the algorithms of Bing are intentionally made just the opposite of those of Google. While this might have a grain of truth, it is a matter a fact that the major search engines like different stuff and if you plan to conquer more than one of them, you need to optimize carefully.
There are many examples of the differences between search engines. For instance, for Yahoo! and Bing, on-page keyword factors are of primary importance, while for Google links are very, very important. Also, for Google sites are like wine – the older, the better, while Yahoo! generally has no expressed preference towards sites and domains with tradition (i.e. older ones). Thus you might need more time till your site gets mature to be admitted to the top in Google, than in Yahoo!.


Read more ...