| <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> |
| <html><head>
|
|
|
|
|
| <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
|
| <meta name="Generator" content="NetObjects Fusion 5.0.2 for Windows"><title>Pagerank Explained Correctly with Examples</title>
|
|
|
| <script>
|
| <!--
|
| function F_loadRollover(){} function F_roll(){}
|
| //-->
|
| </script>
|
| <script language="JavaScript1.2" src="pagerank_files/rollover.js"></script></head><body nof="(MB=(Freebies, 68, 50, 140, 15), L=(pagerankLayout, 577, 16252))" topmargin="0" leftmargin="0" alink="#000099" bgcolor="#ffffff" link="#000099" marginheight="0" marginwidth="0" text="#000000" vlink="#003399">
|
| <table nof="LY" border="0" cellpadding="0" cellspacing="0" width="614">
|
| <tbody><tr align="left" valign="top">
|
| <td height="8" width="14"><img src="pagerank_files/clearpixel.gif" border="0" height="1" width="14"></td>
|
| <td></td>
|
| </tr>
|
| <tr align="left" valign="top">
|
| <td height="60"></td>
|
| <td width="600"><img id="Banner1" src="pagerank_files/pagerank_NBanner_1.gif" alt="Page Rank Explained" nof="B_H" border="0" height="60" width="600"></td>
|
| </tr>
|
| </tbody></table>
|
| <table nof="LY" border="0" cellpadding="0" cellspacing="0" width="717">
|
| <tbody><tr align="left" valign="top">
|
| <td>
|
| <table nof="LY" border="0" cellpadding="0" cellspacing="0" width="133">
|
| <tbody><tr align="left" valign="top">
|
| <td height="125" width="13"><img src="pagerank_files/clearpixel.gif" border="0" height="1" width="13"></td>
|
| <td width="120">
|
| <table id="NavigationBar1" nof="NB_UYVP" border="0" cellpadding="0" cellspacing="0" width="120">
|
| <tbody><tr align="left" valign="top">
|
| <td height="25" width="120"><a href="http://www.iprcom.com/index.html" onmouseover="F_roll('NavigationButton1',1)" onmouseout="F_roll('NavigationButton1',0)"><img id="NavigationButton1" name="NavigationButton1" src="pagerank_files/_Np1_11.gif" onload="F_loadRollover(this,'_NRp2_7.gif')" alt="Home" border="0" height="25" width="120"></a></td>
|
| </tr>
|
| <tr align="left" valign="top">
|
| <td height="25" width="120"><a href="http://www.iprcom.com/services/index.html" onmouseover="F_roll('NavigationButton2',1)" onmouseout="F_roll('NavigationButton2',0)"><img id="NavigationButton2" name="NavigationButton2" src="pagerank_files/_Np1_12.gif" onload="F_loadRollover(this,'_NRp2_8.gif')" alt="Services" border="0" height="25" width="120"></a></td>
|
| </tr>
|
| <tr align="left" valign="top">
|
| <td height="25" width="120"><a href="http://www.iprcom.com/portfolio/index.html" onmouseover="F_roll('NavigationButton3',1)" onmouseout="F_roll('NavigationButton3',0)"><img id="NavigationButton3" name="NavigationButton3" src="pagerank_files/_Np1_13.gif" onload="F_loadRollover(this,'_NRp2_9.gif')" alt="Portfolio" border="0" height="25" width="120"></a></td>
|
| </tr>
|
| <tr align="left" valign="top">
|
| <td height="25" width="120"><a href="http://www.iprcom.com/downloads/index.html" onmouseover="F_roll('NavigationButton4',1)" onmouseout="F_roll('NavigationButton4',0)"><img id="NavigationButton4" name="NavigationButton4" src="pagerank_files/_Np1_14.gif" onload="F_loadRollover(this,'_NRp2_10.gif')" alt="Downloads" border="0" height="25" width="120"></a></td>
|
| </tr>
|
| <tr align="left" valign="top">
|
| <td height="25" width="120"><a href="http://www.iprcom.com/papers/index.html" onmouseover="F_roll('NavigationButton5',1)" onmouseout="F_roll('NavigationButton5',0)"><img id="NavigationButton5" name="NavigationButton5" src="pagerank_files/_Np1_15.gif" onload="F_loadRollover(this,'_NRp2_11.gif')" alt="White Papers" border="0" height="25" width="120"></a></td>
|
| </tr>
|
| </tbody></table>
|
| </td>
|
| </tr>
|
| </tbody></table>
|
| <table nof="LY" border="0" cellpadding="0" cellspacing="0">
|
| <tbody><tr align="left" valign="top">
|
| <td height="13" width="16"><img src="pagerank_files/clearpixel.gif" border="0" height="1" width="16"></td>
|
| <td></td>
|
| </tr>
|
| <tr align="left" valign="top">
|
| <td></td>
|
| <td width="115">
|
| <table id="Table11" bgcolor="#dcdcdc" border="1" cellpadding="3" cellspacing="0" width="100%">
|
| <tbody><tr>
|
| <td bgcolor="#cccccc" width="105">
|
| <p align="center"><a href="http://www.maillocate.com/">MailLocate</a></p>
|
| </td>
|
| </tr>
|
| <tr>
|
| <td width="105">
|
| <p align="left"><a href="http://www.maillocate.com/">Ever changed your email address? Never lose a contact again! Permanently lose spam and junk emailers!<br><b><font color="#ff0033">Free sign-up</font></b></a>
|
| </p>
|
| </td>
|
| </tr>
|
| </tbody></table>
|
| </td>
|
| </tr>
|
| <tr align="left" valign="top">
|
| <td><img src="pagerank_files/clearpixel_002.gif" border="0" height="15" width="1"></td>
|
| </tr>
|
| <tr>
|
| <td></td>
|
| <td> <script type="text/javascript">
|
| <!--
|
| google_ad_client = 'pub-8259747279030049';
|
| google_ad_width = 120;
|
| google_ad_height = 600;
|
| google_ad_format = '120x600_as';
|
| // -->
|
| </script> <script type="text/javascript" src="pagerank_files/show_ads.js">
|
| </script><iframe name="google_ads_frame" src="pagerank_files/ads.htm" marginwidth="0" marginheight="0" vspace="0" hspace="0" allowtransparency="true" frameborder="0" height="600" scrolling="no" width="120"></iframe>
|
| </td>
|
| </tr>
|
|
|
| </tbody></table>
|
| </td>
|
| <td>
|
| <table nof="LY" border="0" cellpadding="0" cellspacing="0" width="584">
|
| <tbody><tr align="left" valign="top">
|
| <td height="11" width="7"><img src="pagerank_files/clearpixel.gif" border="0" height="1" width="7"></td>
|
| <td width="577"><img src="pagerank_files/clearpixel.gif" border="0" height="1" width="577"></td>
|
| </tr>
|
| <tr align="left" valign="top">
|
| <td></td>
|
| <td width="577">
|
| <p align="center"> </p>
|
| <h1 align="center">The Google Pagerank Algorithm<br>and How It Works</h1>
|
| <p align="center">Ian Rogers<br>IPR Computing Ltd.<br><a href="mailto:ian@iprcom.com">ian@iprcom.com</a></p>
|
| <h3>Introduction</h3>
|
| <p>Page
|
| Rank is a topic much discussed by Search Engine Optimisation (SEO)
|
| experts. At the heart of PageRank is a mathematical formula that seems
|
| scary to look at but is actually fairly simple to understand.</p>
|
| <p>Despite
|
| this many people seem to get it wrong! In particular “Chris Ridings of
|
| www.searchenginesystems.net” has written a paper entitled “PageRank
|
| Explained: Everything you’ve always wanted to know about PageRank”,
|
| pointed to by many people, that contains a <a href="http://www.iprcom.com/papers/pagerank/altered_equation.html">fundamental mistake</a> early on in the explanation! Unfortunately this means some of the recommendations in the paper are not quite accurate.
|
| </p>
|
| <p>By showing code to correctly calculate real PageRank I hope to achieve several things in this response:</p>
|
| <ol>
|
| <li>Clearly explain how PageRank is calculated.</li>
|
| <li>Go
|
| through every example in Chris’ paper, and add some more of my own,
|
| showing the correct PageRank for each diagram. By showing the code used
|
| to calculate each diagram I’ve opened myself up to peer review - mostly
|
| in an effort to make sure the examples are correct, but also because
|
| the code can help explain the PageRank calculations.</li>
|
| <li>Describe some principles and observations on website design based on these correctly calculated examples.</li>
|
| </ol>
|
| <p>Any
|
| good web designer should take the time to fully understand how PageRank
|
| really works - if you don’t then your site’s layout could be seriously
|
| hurting your Google listings!</p>
|
| <p>[Note: I have nothing in particular against Chris. If I find any other papers on the subject I’ll try to comment evenly]</p>
|
| <h3>How is PageRank Used?</h3>
|
| <p>PageRank
|
| is one of the methods Google uses to determine a page’s relevance or
|
| importance. It is only one part of the story when it comes to the
|
| Google listing, but the other aspects are discussed elsewhere (and are
|
| ever changing) and PageRank is interesting enough to deserve a paper of
|
| its own.</p>
|
| <p>PageRank is also displayed on the toolbar of your browser if you’ve installed the Google toolbar (<a href="http://toolbar.google.com/">http://toolbar.google.com/</a>). But the Toolbar PageRank only goes from 0 – 10
|
| and seems to be something like a logarithmic scale:</p>
|
| <p align="left">
|
| <table nof="TE" border="0" cellpadding="0" cellspacing="0" width="100%">
|
| <tbody><tr>
|
| <td align="center">
|
| <table id="Table1" border="0" cellpadding="1" cellspacing="3" width="68%">
|
| <tbody><tr valign="top">
|
| <td align="center" width="49%">
|
| <p align="center"><b>Toolbar PageRank<br>(log base 10)</b></p>
|
| </td>
|
| <td align="center" width="50%">
|
| <p align="center"><b>Real PageRank</b></p>
|
| </td>
|
| </tr>
|
| <tr>
|
| <td align="center" width="49%">
|
| <p align="center">0</p>
|
| </td>
|
| <td align="center" width="50%">
|
| <p align="center">0 - 10</p>
|
| </td>
|
| </tr>
|
| <tr>
|
| <td align="center" width="49%">
|
| <p align="center">1</p>
|
| </td>
|
| <td align="center" width="50%">
|
| <p align="center">100 - 1,000</p>
|
| </td>
|
| </tr>
|
| <tr>
|
| <td align="center" width="49%">
|
| <p align="center">2</p>
|
| </td>
|
| <td align="center" width="50%">
|
| <p align="center">1,000 - 10,000</p>
|
| </td>
|
| </tr>
|
| <tr>
|
| <td align="center" width="49%">
|
| <p align="center">3</p>
|
| </td>
|
| <td align="center" width="50%">
|
| <p align="center">10,000 - 100,000</p>
|
| </td>
|
| </tr>
|
| <tr>
|
| <td align="center" width="49%">
|
| <p align="center">4</p>
|
| </td>
|
| <td align="center" width="50%">
|
| <p align="center">and so on...</p>
|
| </td>
|
| </tr>
|
| </tbody></table>
|
| </td>
|
| </tr>
|
| </tbody></table>
|
| </p><p>We
|
| can’t know the exact details of the scale because, as we’ll see later,
|
| the maximum PR of all pages on the web changes every month when Google
|
| does its re-indexing! If we presume the scale is logarithmic (although
|
| there is only anecdotal evidence for this at the time of writing) then
|
| Google could simply give the highest actual PR page a toolbar PR of 10
|
| and scale the rest appropriately. </p>
|
| <p>Also
|
| the toolbar sometimes guesses! The toolbar often shows me a Toolbar PR
|
| for pages I’ve only just uploaded and cannot possibly be in the index
|
| yet! </p>
|
| <p>What seems to be happening
|
| is that the toolbar looks at the URL of the page the browser is
|
| displaying and strips off everything down the last “/” (i.e. it goes to
|
| the “parent” page in URL terms). If Google has a Toolbar PR for that
|
| parent then it subtracts 1 and shows that as the Toolbar PR for this
|
| page. If there’s no PR for the parent it goes to the parent’s parent’s
|
| page, but subtracting 2, and so on all the way up to the root of your
|
| site. If it can’t find a Toolbar PR to display in this way, that
|
| is if it doesn’t find a page with a real calculated PR, then the bar is
|
| greyed out.</p>
|
| <p>Note
|
| that if the Toolbar is guessing in this way, the Actual PR of the page
|
| is 0 - though its PR will be calculated shortly after the Google spider
|
| first sees it.</p>
|
| <p>PageRank says
|
| nothing about the content or size of a page, the language it’s written
|
| in, or the text used in the anchor of a link!</p>
|
| <h3>Definitions</h3>
|
| <p>I’ve
|
| started to use some technical terms and shorthand in this paper. Now’s
|
| as good a time as any to define all the terms I’ll use:</p>
|
| <p>
|
| <table nof="TE" border="0" cellpadding="0" cellspacing="0" width="100%">
|
| <tbody><tr>
|
| <td align="center">
|
| <table id="Table2" border="0" cellpadding="1" cellspacing="3" width="77%">
|
| <tbody><tr>
|
| <td valign="top" width="21%">
|
| <p><b>PR: </b></p>
|
| </td>
|
| <td width="78%">
|
| <p>Shorthand
|
| for PageRank: the actual, real, page rank for each page as calculated
|
| by Google. As we’ll see later this can range from 0.15 to billions.</p>
|
| </td>
|
| </tr>
|
| <tr>
|
| <td valign="top" width="21%">
|
| <p><b>Toolbar PR:</b></p>
|
| </td>
|
| <td width="78%">
|
| <p>The PageRank displayed in the Google toolbar in your browser. This ranges from 0 to 10.</p>
|
| </td>
|
| </tr>
|
| <tr>
|
| <td valign="top" width="21%">
|
| <p><b>Backlink:</b></p>
|
| </td>
|
| <td width="78%">
|
| <p>If page A links out to page B, then page B is said to have a “backlink” from page A.</p>
|
| </td>
|
| </tr>
|
| </tbody></table>
|
| </td>
|
| </tr>
|
| </tbody></table>
|
| </p><p>That’s enough of that, let’s get back to the meat…</p>
|
| <h3><a name="what"></a>So what is PageRank?</h3>
|
| <p>In
|
| short PageRank is a “vote”, by all the other pages on the Web, about
|
| how important a page is. A link to a page counts as a vote of support.
|
| If there’s no link there’s no support (but it’s an abstention from
|
| voting rather than a vote against the page). </p>
|
| <p>Quoting from the original Google paper, PageRank is defined like this:</p>
|
| <ul>
|
| <p align="left"><i>We
|
| assume page A has pages T1...Tn which point to it (i.e., are
|
| citations). The parameter d is a damping factor which can be set
|
| between 0 and 1. We usually set d to 0.85. There are more details about
|
| d in the next section. Also C(A) is defined as the number of links
|
| going out of page A. The PageRank of a page A is given as follows:</i> </p>
|
| <p align="left"><i>PR(A) = (1-d) + d (PR(T1)/C(T1) + ... + PR(Tn)/C(Tn))</i> </p>
|
| <p align="left"><i>Note that the PageRanks form a probability distribution over web pages, so the sum of all web pages' PageRanks will be one.</i></p>
|
| <p><i>PageRank
|
| or PR(A) can be calculated using a simple iterative algorithm, and
|
| corresponds to the principal eigenvector of the normalized link matrix
|
| of the web.</i></p>
|
| </ul>
|
| <p>but that’s not too helpful so let’s break it down into sections.</p>
|
| <ol>
|
| <li><b>PR(Tn)</b>
|
| - Each page has a notion of its own self-importance. That’s “PR(T1)”
|
| for the first page in the web all the way up to “PR(Tn)” for the last
|
| page</li>
|
| <li><b>C(Tn)</b> - Each page
|
| spreads its vote out evenly amongst all of it’s outgoing links. The
|
| count, or number, of outgoing links for page 1 is “C(T1)”, “C(Tn)” for
|
| page n, and so on for all pages. </li>
|
| <li><b>PR(Tn)/C(Tn)</b> - so if our page (page A) has a backlink from page “n” the share of the vote page A will get is “PR(Tn)/C(Tn)”</li>
|
| <li><b>d(...</b>
|
| - All these fractions of votes are added together but, to stop the
|
| other pages having too much influence, this total vote is “damped down”
|
| by multiplying it by 0.85 (the factor “d”)</li>
|
| <li><b>(1 - d)</b> - The (1 – d) bit at the beginning is a bit of probability math magic so the “<i>sum of all web pages' PageRanks will be one</i>”: it adds in the bit lost by the <b>d(...</b>.
|
| It also means that if a page has no links to it (no backlinks) even
|
| then it will still get a small PR of 0.15 (i.e. 1 – 0.85). (Aside: the
|
| Google paper says “the sum of all pages” but they mean the “the
|
| normalised sum” – otherwise known as “the average” to you and me.</li>
|
| </ol>
|
| <h3>How is PageRank Calculated?</h3>
|
| <p>This
|
| is where it gets tricky. The PR of each page depends on the PR of the
|
| pages pointing to it. But we won’t know what PR those pages have until
|
| the pages pointing to <b>them</b> have their PR calculated and so on…
|
| And when you consider that page links can form circles it seems
|
| impossible to do this calculation!</p>
|
| <p>But actually it’s not that bad. Remember this bit of the Google paper:</p>
|
| <ul>
|
| <p align="left"><i>PageRank
|
| or PR(A) can be calculated using a simple iterative algorithm, and
|
| corresponds to the principal eigenvector of the normalized link matrix
|
| of the web.</i></p>
|
| </ul>
|
| <p>What that means to us is that we can just go ahead and calculate a page’s PR <b>without knowing the final value of the PR of the other pages</b>.
|
| That seems strange but, basically, each time we run the calculation
|
| we’re getting a closer estimate of the final value. So all we need to
|
| do is remember the each value we calculate and repeat the calculations
|
| lots of times until the numbers stop changing much.</p>
|
| <p>Lets take the simplest example network: two pages, each pointing to the other:</p>
|
| <p align="left">
|
| <table nof="TE" border="0" cellpadding="0" cellspacing="0" width="100%">
|
| <tbody><tr>
|
| <td align="center"><img id="Picture1" src="pagerank_files/image001.gif" border="0" height="51" width="227"></td>
|
| </tr>
|
| </tbody></table>
|
| </p><p align="left">Each page has one outgoing link (the outgoing count is 1, i.e. C(A) = 1 and C(B) = 1).</p>
|
| <p align="left"> </p>
|
| <p align="left"><b><font face="Arial,Helvetica,Univers,Zurich BT,sans-serif" size="+1">Guess 1</font></b></p>
|
| <p align="left">We don’t know what their PR should be to begin with, so let’s take a guess at 1.0 and do some calculations:</p>
|
| <p align="left">
|
| <table id="Table3" border="0" cellpadding="1" cellspacing="3" width="94%">
|
| <tbody><tr>
|
| <td align="right" width="24%">
|
| <p align="right">d</p>
|
| </td>
|
| <td width="75%">
|
| <p>= 0.85</p>
|
| </td>
|
| </tr>
|
| <tr>
|
| <td align="right" width="24%">
|
| <p align="right">PR(A)</p>
|
| </td>
|
| <td width="75%">
|
| <p align="left">= (1 – d) + d(PR(B)/1)</p>
|
| </td>
|
| </tr>
|
| <tr>
|
| <td align="right" width="24%">
|
| <p align="right">PR(B)</p>
|
| </td>
|
| <td width="75%">
|
| <p align="left">= (1 – d) + d(PR(A)/1)</p>
|
| </td>
|
| </tr>
|
| </tbody></table>
|
| </p><p align="left">i.e.</p>
|
| <p align="left">
|
| <table id="Table4" border="0" cellpadding="1" cellspacing="3" width="94%">
|
| <tbody><tr>
|
| <td align="right" valign="top" width="23%">
|
| <p align="right">PR(A)</p>
|
| </td>
|
| <td width="76%">
|
| <p align="left">= 0.15 + 0.85 * 1<br>= 1</p>
|
| </td>
|
| </tr>
|
| <tr>
|
| <td align="right" valign="top" width="23%">
|
| <p align="right">PR(B)</p>
|
| </td>
|
| <td width="76%">
|
| <p align="left">= 0.15 + 0.85 * 1<br>= 1</p>
|
| </td>
|
| </tr>
|
| </tbody></table>
|
| </p><p align="left">Hmm, the numbers aren’t changing at all! So it looks like we started out with a lucky guess!!! </p>
|
| <p align="left"><b><font face="Arial,Helvetica,Univers,Zurich BT,sans-serif" size="+1">Guess 2</font></b></p>
|
| <p align="left">No,
|
| that’s too easy, maybe I got it wrong (and it wouldn’t be the first
|
| time). Ok, let’s start the guess at 0 instead and re-calculate:</p>
|
| <p align="left">
|
| <table id="Table5" border="0" cellpadding="1" cellspacing="3" width="94%">
|
| <tbody><tr>
|
| <td align="right" valign="top" width="24%">
|
| <p align="right">PR(A)</p>
|
| </td>
|
| <td valign="top" width="33%">
|
| <p align="left">= 0.15 + 0.85 * 0<br>= 0.15</p>
|
| </td>
|
| <td width="41%">
|
| <p> </p>
|
| </td>
|
| </tr>
|
| <tr>
|
| <td align="right" valign="top" width="24%">
|
| <p align="right">PR(B)</p>
|
| </td>
|
| <td valign="top" width="33%">
|
| <p align="left">= 0.15 + 0.85 * 0.15<br>= 0.2775</p>
|
| </td>
|
| <td valign="top" width="41%">
|
| <p align="left">NB. we’ve already calculated a “next best guess” at PR(A) so we use it here</p>
|
| </td>
|
| </tr>
|
| </tbody></table>
|
| </p><p align="left">And again:</p>
|
| <p align="left">
|
| <table id="Table6" border="0" cellpadding="1" cellspacing="3" width="94%">
|
| <tbody><tr>
|
| <td align="right" valign="top" width="24%">
|
| <p align="right">PR(A)</p>
|
| </td>
|
| <td width="75%">
|
| <p align="left">= 0.15 + 0.85 * 0.2775<br>= 0.385875</p>
|
| </td>
|
| </tr>
|
| <tr>
|
| <td align="right" valign="top" width="24%">
|
| <p align="right">PR(B)</p>
|
| </td>
|
| <td width="75%">
|
| <p align="left">= 0.15 + 0.85 * 0.385875 <br>= 0.47799375</p>
|
| </td>
|
| </tr>
|
| </tbody></table>
|
| </p><p align="left">And again</p>
|
| <p align="left">
|
| <table id="Table7" border="0" cellpadding="1" cellspacing="3" width="94%">
|
| <tbody><tr>
|
| <td align="right" valign="top" width="24%">
|
| <p align="right">PR(A)</p>
|
| </td>
|
| <td width="75%">
|
| <p align="left">= 0.15 + 0.85 * 0.47799375<br>= 0.5562946875</p>
|
| </td>
|
| </tr>
|
| <tr>
|
| <td align="right" valign="top" width="24%">
|
| <p align="right">PR(B)</p>
|
| </td>
|
| <td width="75%">
|
| <p align="left">= 0.15 + 0.85 * 0.5562946875 <br>= 0.622850484375</p>
|
| </td>
|
| </tr>
|
| </tbody></table>
|
| </p><p align="left">and
|
| so on. The numbers just keep going up. But will the numbers stop
|
| increasing when they get to 1.0? What if a calculation over-shoots and
|
| goes above 1.0?</p>
|
| <p align="left"> </p>
|
| <p align="left"><b><font face="Arial,Helvetica,Univers,Zurich BT,sans-serif" size="+1">Guess 3</font></b></p>
|
| <p align="left">Well let’s see. Let’s start the guess at 40 each and do a few cycles:</p>
|
| <ul>
|
| <p align="left">PR(A) = 40<br>PR(B) = 40</p>
|
| </ul>
|
| <p align="left">First calculation</p>
|
| <p align="left">
|
| <table id="Table8" border="0" cellpadding="1" cellspacing="3" width="94%">
|
| <tbody><tr>
|
| <td align="right" valign="top" width="24%">
|
| <p align="right">PR(A)</p>
|
| </td>
|
| <td width="75%">
|
| <p align="left">= 0.15 + 0.85 * 40<br>= 34.25</p>
|
| </td>
|
| </tr>
|
| <tr>
|
| <td align="right" valign="top" width="24%">
|
| <p align="right">PR(B)</p>
|
| </td>
|
| <td width="75%">
|
| <p align="left">= 0.15 + 0.85 * 0.385875 <br>= 29.1775</p>
|
| </td>
|
| </tr>
|
| </tbody></table>
|
| </p><p align="left">And again</p>
|
| <p align="left">
|
| <table id="Table9" border="0" cellpadding="1" cellspacing="3" width="94%">
|
| <tbody><tr>
|
| <td align="right" valign="top" width="24%">
|
| <p align="right">PR(A)</p>
|
| </td>
|
| <td width="75%">
|
| <p align="left">= 0.15 + 0.85 * 29.1775<br>= 24.950875</p>
|
| </td>
|
| </tr>
|
| <tr>
|
| <td align="right" valign="top" width="24%">
|
| <p align="right">PR(B)</p>
|
| </td>
|
| <td width="75%">
|
| <p align="left">= 0.15 + 0.85 * 24.950875 <br>= 21.35824375</p>
|
| </td>
|
| </tr>
|
| </tbody></table>
|
| </p><p align="left">Yup, those numbers are heading down alright! It sure looks the numbers will get to 1.0 and stop</p>
|
| <p align="left"><a name="ex0"></a>Here’s the code used to calculate this example starting the guess at 0: <a href="http://www.iprcom.com/papers/pagerank/src_0">Show the code</a> | <a href="http://www.iprcom.com/papers/pagerank/pr0.pl">Run the program</a></p>
|
| <ul>
|
| <li><b>Principle:</b> it doesn’t matter where you start your guess, once the PageRank calculations have settled down, the “<i>normalized probability distribution</i>” (the average PageRank for all pages) will be 1.0
|
| </li>
|
| </ul>
|
| <p align="left"> </p>
|
| <p align="left"><b><font face="Arial,Helvetica,Univers,Zurich BT,sans-serif" size="+1">Getting the answer quicker</font></b></p>
|
| <p align="left">How
|
| many times do we need to repeat the calculation for big networks?
|
| That’s a difficult question; for a network as large as the World Wide
|
| Web it can be many millions of iterations! The “damping factor” is
|
| quite subtle. If it’s too high then it takes ages for the numbers to
|
| settle, if it’s too low then you get repeated over-shoot, both above
|
| and below the average - the numbers just swing about the average like a
|
| pendulum and never settle down.</p>
|
| <p align="left">Also
|
| choosing the order of calculations can help. The answer will always
|
| come out the same no matter which order you choose, but some orders
|
| will get you there quicker than others.</p>
|
| <p align="left">I’m
|
| sure there’s been several Master’s Thesis on how to make this
|
| calculation as efficient as possible, but, in the examples below, I’ve
|
| used very simple code for clarity and roughly 20 to 40 iterations were
|
| needed!</p>
|
| <p align="left"> </p>
|
| <p align="left"><a name="ex1"></a><b><font face="Arial,Helvetica,Univers,Zurich BT,sans-serif" size="+1">Example 1</font></b></p>
|
| <p align="left">
|
| <table nof="TE" border="0" cellpadding="0" cellspacing="0" width="100%">
|
| <tbody><tr>
|
| <td align="center"><img id="Picture2" src="pagerank_files/image002.gif" border="0" height="229" width="319"></td>
|
| </tr>
|
| </tbody></table>
|
| </p><p align="left">I’m
|
| not going to repeat the calculations here, but you can see them by
|
| running the program (yes, if you click the link the program really is
|
| re-run to do the calculations for you)</p>
|
| <ul>
|
| <p align="left"><a href="http://www.iprcom.com/papers/pagerank/src_1">Show the code</a> | <a href="http://www.iprcom.com/papers/pagerank/pr1.pl">Run the program</a></p>
|
| </ul>
|
| <p align="left">So the correct PR for the example is:</p>
|
| <p align="left">
|
| <table nof="TE" border="0" cellpadding="0" cellspacing="0" width="100%">
|
| <tbody><tr>
|
| <td align="center"><img id="Picture3" src="pagerank_files/image003.gif" border="0" height="229" width="428"></td>
|
| </tr>
|
| </tbody></table>
|
| </p><p align="left">You can see it took about 20 iterations before the network began to settle on these values!</p>
|
| <p align="left">Look at Page D though - it has a PR of 0.15 even though no-one is voting for it (i.e. it has no incoming links)! Is this right?</p>
|
| <p align="left">The first part, or "term" to be techinal, of the PR equation is doing this:</p>
|
| <ul>
|
| <p align="left"><i>PR(A) = (1-d) + d (PR(T1)/C(T1) + ... + PR(Tn)/C(Tn))</i></p>
|
| </ul>
|
| <p align="left">So, for Page D, no backlinks means the equation looks like this:</p>
|
| <p align="left">
|
| <table id="Table10" border="0" cellpadding="1" cellspacing="3" width="94%">
|
| <tbody><tr>
|
| <td align="right" valign="top" width="24%">
|
| <p align="right">PR(A)</p>
|
| </td>
|
| <td width="75%">
|
| <p align="left">= (1-d) + d * (0)<br>= 0.15</p>
|
| </td>
|
| </tr>
|
| </tbody></table>
|
| </p><p align="left">no matter what else is going on or how many times you do it.</p>
|
| <ul>
|
| <p align="left"><b>Observation: </b>every
|
| page has at least a PR of 0.15 to share out. But this may only be in
|
| theory - there are rumours that Google undergoes a post-spidering phase
|
| whereby any pages that have no incoming links at all are completely
|
| deleted from the index...</p>
|
| <p align="left"> </p>
|
| </ul>
|
| <p align="left"><a name="ex2"></a><b><font face="Arial,Helvetica,Univers,Zurich BT,sans-serif" size="+1">Example 2</font></b></p>
|
| <p align="left">A simple hierarchy with some outgoing links</p>
|
| <p align="left">
|
| <table nof="TE" border="0" cellpadding="0" cellspacing="0" width="100%">
|
| <tbody><tr>
|
| <td align="center"><img id="Picture4" src="pagerank_files/image004.gif" border="0" height="235" width="429"></td>
|
| </tr>
|
| </tbody></table>
|
| </p><ul>
|
| <p align="left"><a href="http://www.iprcom.com/papers/pagerank/src_2">Show the code</a> | <a href="http://www.iprcom.com/papers/pagerank/pr2.pl">Run the program</a></p>
|
| </ul>
|
| <p align="left">
|
| <table nof="TE" border="0" cellpadding="0" cellspacing="0" width="100%">
|
| <tbody><tr>
|
| <td align="center"><img id="Picture5" src="pagerank_files/image005.gif" border="0" height="235" width="493"></td>
|
| </tr>
|
| </tbody></table>
|
| </p><p align="left">As
|
| you’d expect, the home page has the most PR – after all, it has the
|
| most incoming links! But what’s happened to the average? It’s only
|
| 0.378!!! That doesn’t tie up with what I said earlier so something is
|
| wrong somewhere!</p>
|
| <p align="left">Well
|
| no, everything is fine. But take a look at the “external site” pages –
|
| what’s happening to their PageRank? They’re not passing it on, they’re
|
| not voting for anyone, they’re wasting their PR like so much pregnant
|
| chad!!! (NB, a more accurate description of this issue can be found in
|
| this <a href="http://www.marketpositiontalk.com/forums/Index.cfm?CFApp=11&Message_ID=37917" target="_blank">thread</a>)
|
| </p>
|
| <p align="left"> </p>
|
| <p align="left"><a name="ex3"></a><b><font face="Arial,Helvetica,Univers,Zurich BT,sans-serif" size="+1">Example 3</font></b></p>
|
| <p align="left">Let’s link those external sites back into our home page just so we can see what happens to the average…</p>
|
| <ul>
|
| <p align="left"><a href="http://www.iprcom.com/papers/pagerank/src_3">Show the code</a> | <a href="http://www.iprcom.com/papers/pagerank/pr3.pl">Run the program</a></p>
|
| </ul>
|
| <p align="left">
|
| <table nof="TE" border="0" cellpadding="0" cellspacing="0" width="100%">
|
| <tbody><tr>
|
| <td align="center"><img id="Picture6" src="pagerank_files/image006.gif" border="0" height="312" width="527"></td>
|
| </tr>
|
| </tbody></table>
|
| </p><p align="left">That’s
|
| better - it does work after all! And look at the PR of our home page!
|
| All those incoming links sure make a difference – we’ll talk more about
|
| that later.</p>
|
| <p align="left"> </p>
|
| <p align="left"><a name="ex4"></a><b><font face="Arial,Helvetica,Univers,Zurich BT,sans-serif" size="+1">Example 4</font></b></p>
|
| <p align="left">What happens to PR if we follow a suggestion about writing page reviews?</p>
|
| <p align="left">
|
| <table nof="TE" border="0" cellpadding="0" cellspacing="0" width="100%">
|
| <tbody><tr>
|
| <td align="center"><img id="Picture8" src="pagerank_files/image007.gif" border="0" height="370" width="459"></td>
|
| </tr>
|
| </tbody></table>
|
| </p><ul>
|
| <p align="left"><a href="http://www.iprcom.com/papers/pagerank/src_4">Show the code</a> | <a href="http://www.iprcom.com/papers/pagerank/pr4.pl">Run the program</a></p>
|
| </ul>
|
| <p align="left">
|
| <table nof="TE" border="0" cellpadding="0" cellspacing="0" width="100%">
|
| <tbody><tr>
|
| <td align="center"><img id="Picture7" src="pagerank_files/image008.gif" border="0" height="370" width="457"></td>
|
| </tr>
|
| </tbody></table>
|
| </p><p align="left"> </p>
|
| <p align="left"> </p>
|
| <p align="left"><a name="ex5"></a><b><font face="Arial,Helvetica,Univers,Zurich BT,sans-serif" size="+1">Example 5</font></b></p>
|
| <p align="left">A simple hierarchy</p>
|
| <ul>
|
| <p align="left"><a href="http://www.iprcom.com/papers/pagerank/src_5">Show the code</a> | <a href="http://www.iprcom.com/papers/pagerank/pr5.pl">Run the program</a></p>
|
| </ul>
|
| <p align="left">
|
| <table nof="TE" border="0" cellpadding="0" cellspacing="0" width="100%">
|
| <tbody><tr>
|
| <td align="center"><img id="Picture9" src="pagerank_files/image009.gif" border="0" height="202" width="236"></td>
|
| </tr>
|
| </tbody></table>
|
| </p><p align="left">Our home page has 2 and a half times as much PR as the child pages! Excellent!</p>
|
| <ul>
|
| <li><b>Observation</b>: a hierarchy concentrates votes and PR into one page</li>
|
| </ul>
|
| <p align="left"> </p>
|
| <p align="left"><a name="ex6"></a><b><font face="Arial,Helvetica,Univers,Zurich BT,sans-serif" size="+1">Example 6</font></b></p>
|
| <p align="left">Looping</p>
|
| <ul>
|
| <p align="left"><a href="http://www.iprcom.com/papers/pagerank/src_6">Show the code</a> | <a href="http://www.iprcom.com/papers/pagerank/pr6.pl">Run the program</a></p>
|
| </ul>
|
| <p align="left">
|
| <table nof="TE" border="0" cellpadding="0" cellspacing="0" width="100%">
|
| <tbody><tr>
|
| <td align="center"><img id="Picture10" src="pagerank_files/image010.gif" border="0" height="177" width="242"></td>
|
| </tr>
|
| </tbody></table>
|
| </p><p align="left">This
|
| is what we’d expect. All the pages have the same number of incoming
|
| links, all pages are of equal importance to each other, all pages get
|
| the same PR of 1.0 (i.e. the “average” probability).</p>
|
| <p align="left"> </p>
|
| <p align="left"><a name="ex7"></a><b><font face="Arial,Helvetica,Univers,Zurich BT,sans-serif" size="+1">Example 7</font></b></p>
|
| <p align="left">Extensive Interlinking – or Fully Meshed</p>
|
| <ul>
|
| <p align="left"><a href="http://www.iprcom.com/papers/pagerank/src_7">Show the code</a> | <a href="http://www.iprcom.com/papers/pagerank/pr7.pl">Run the program</a></p>
|
| </ul>
|
| <p align="left">
|
| <table nof="TE" border="0" cellpadding="0" cellspacing="0" width="100%">
|
| <tbody><tr>
|
| <td align="center"><img id="Picture11" src="pagerank_files/image011.gif" border="0" height="177" width="241"></td>
|
| </tr>
|
| </tbody></table>
|
| </p><p align="left">Yes, the results are the same as the Looping example above and for the same reasons.</p>
|
| <p align="left"> </p>
|
| <p align="left"><a name="ex8"></a><b><font face="Arial,Helvetica,Univers,Zurich BT,sans-serif" size="+1">Example 8</font></b></p>
|
| <p align="left">Hierarchical – but with a link in and one out.</p>
|
| <p align="left">We’ll
|
| assume there’s an external site that has lots of pages and links with
|
| the result that one of the pages has the average PR of 1.0. We’ll also
|
| assume the webmaster really likes us – there’s just one link from that
|
| page and it’s pointing at our home page.</p>
|
| <ul>
|
| <p align="left"><a href="http://www.iprcom.com/papers/pagerank/src_8">Show the code</a> | <a href="http://www.iprcom.com/papers/pagerank/pr8.pl">Run the program</a></p>
|
| </ul>
|
| <p align="left">
|
| <table nof="TE" border="0" cellpadding="0" cellspacing="0" width="100%">
|
| <tbody><tr>
|
| <td align="center"><img id="Picture12" src="pagerank_files/image012.gif" border="0" height="202" width="485"></td>
|
| </tr>
|
| </tbody></table>
|
| </p><p align="left">In
|
| example 5 the home page only had a PR of 1.92 but now it is 3.31!
|
| Excellent! Not only has site A contributed 0.85 PR to us, but the
|
| raised PR in the “About”, “Product” and “More” pages has had a lovely
|
| “feedback” effect, pushing up the home page’s PR even further!</p>
|
| <ul>
|
| <li>Priciple: a well structured site will amplify the effect of any contributed PR</li>
|
| </ul>
|
| <p align="left"> </p>
|
| <p align="left"><a name="ex9"></a><b><font face="Arial,Helvetica,Univers,Zurich BT,sans-serif" size="+1">Example 9</font></b></p>
|
| <p align="left">Looping – but with a link in and a link out</p>
|
| <ul>
|
| <p align="left"><a href="http://www.iprcom.com/papers/pagerank/src_9">Show the code</a> | <a href="http://www.iprcom.com/papers/pagerank/pr9.pl">Run the program</a></p>
|
| </ul>
|
| <p align="left">
|
| <table nof="TE" border="0" cellpadding="0" cellspacing="0" width="100%">
|
| <tbody><tr>
|
| <td align="center"><img id="Picture13" src="pagerank_files/image013.gif" border="0" height="177" width="489"></td>
|
| </tr>
|
| </tbody></table>
|
| </p><p align="left">Well, the PR of our home page has gone up a little, but what’s happened to the “More” page? </p>
|
| <p align="left">The
|
| vote of the “Product” page has been split evenly between it and the
|
| external site. We now value the external Site B equally with our “More”
|
| page. The “More” page is getting only half the vote it had before –
|
| this is good for Site B but very bad for us!</p>
|
| <p align="left"> </p>
|
| <p align="left"><a name="ex10"></a><b><font face="Arial,Helvetica,Univers,Zurich BT,sans-serif" size="+1">Example 10</font></b></p>
|
| <p align="left">Fully meshed – but with one vote in and one vote out</p>
|
| <ul>
|
| <p align="left"><a href="http://www.iprcom.com/papers/pagerank/src_10">Show the code</a> | <a href="http://www.iprcom.com/papers/pagerank/pr10.pl">Run the program</a></p>
|
| </ul>
|
| <p align="left">
|
| <table nof="TE" border="0" cellpadding="0" cellspacing="0" width="100%">
|
| <tbody><tr>
|
| <td align="center"><img id="Picture14" src="pagerank_files/image014.gif" border="0" height="177" width="489"></td>
|
| </tr>
|
| </tbody></table>
|
| </p><p align="left">That’s
|
| much better. The “More” page is still getting less share of the vote
|
| than in example 7 of course, but now the “Product” page has kept three
|
| quarters of its vote within our site - unlike example 10 where it was
|
| giving away fully half of it’s vote to the external site!</p>
|
| <p align="left">Keeping
|
| just this small extra fraction of the vote within our site has had a
|
| very nice effect on the Home Page too – PR of 2.28 compared with just
|
| 1.66 in example 10.</p>
|
| <ul>
|
| <li><b>Observation:</b>
|
| increasing the internal links in your site can minimise the damage to
|
| your PR when you give away votes by linking to external sites.</li>
|
| <li><b>Principle:</b> </li>
|
| <ul type="1">
|
| <li>If a particular page is highly important – use a hierarchical structure with the important page at the “top”.</li>
|
| <li>Where a group of pages may contain outward links – increase the number of internal links to retain as much PR as possible.</li>
|
| <li>Where a group of pages do not contain outward links – the number of internal links in the site has <b>no</b> effect on the site’s average PR. You might as well use a link
|
| structure that gives the user the best navigational experience.</li>
|
| </ul>
|
| </ul>
|
| <p align="left"> </p>
|
| <p align="left"><b><font face="Arial,Helvetica,Univers,Zurich BT,sans-serif" size="+1">Site Maps</font></b></p>
|
| <p align="left">Site maps are useful in at least two ways:</p>
|
| <ul type="1">
|
| <li>If
|
| a user types in a bad URL most websites return a really unhelpful “404
|
| – page not found” error page. This can be discouraging. Why not
|
| configure your server to return a page that shows an error has been
|
| made, but also gives the site map? This can help the user enormously</li>
|
| <li>Linking
|
| to a site map on each page increases the number of internal links in
|
| the site, spreading the PR out and protecting you against your vote
|
| “donations”</li>
|
| </ul>
|
| <p align="left"> </p>
|
| <p align="left"><a name="ex11"></a><b><font face="Arial,Helvetica,Univers,Zurich BT,sans-serif" size="+1">Example 11</font></b></p>
|
| <p align="left">Lets try to fix our site to artificially concentrate the PR into the home page.</p>
|
| <p align="left">
|
| <table nof="TE" border="0" cellpadding="0" cellspacing="0" width="100%">
|
| <tbody><tr>
|
| <td align="center"><img id="Picture15" src="pagerank_files/image015.gif" border="0" height="178" width="302"></td>
|
| </tr>
|
| </tbody></table>
|
| </p><p align="left">That looks good, most of the links seem to be pointing up to page A so we should get a nice PR.</p>
|
| <p align="left"> </p>
|
| <p align="left">Try to guess what the PR of A will be before you scroll down or run the code.</p>
|
| <ul>
|
| <p align="left"><a href="http://www.iprcom.com/papers/pagerank/src_11">Show the code</a> | <a href="http://www.iprcom.com/papers/pagerank/pr11.pl">Run the program</a></p>
|
| </ul>
|
| <p align="left"> </p>
|
| <p align="left"> </p>
|
| <p align="left">
|
| <table nof="TE" border="0" cellpadding="0" cellspacing="0" width="100%">
|
| <tbody><tr>
|
| <td align="center"><img id="Picture16" src="pagerank_files/image016.gif" border="0" height="178" width="302"></td>
|
| </tr>
|
| </tbody></table>
|
| </p><p align="left">Oh
|
| dear, that didn’t work at all well – it’s much worse than just an
|
| ordinary hierarchy! What’s going on is that pages C and D have such
|
| weak incoming links that they’re no help to page A at all!</p>
|
| <ul>
|
| <li><b>Principle</b>: trying to abuse the PR calculation is harder than you think.</li>
|
| </ul>
|
| <p align="left"> </p>
|
| <p align="left"><a name="ex12"></a><b><font face="Arial,Helvetica,Univers,Zurich BT,sans-serif" size="+1">Example 12</font></b></p>
|
| <p align="left">A
|
| common web layout for long documentation is to split the document into
|
| many pages with a “Previous” and “Next” link on each plus a link back
|
| to the home page. The home page then only needs to point to the first
|
| page of the document.</p>
|
| <ul>
|
| <p align="left"><a href="http://www.iprcom.com/papers/pagerank/src_12">Show the code</a> | <a href="http://www.iprcom.com/papers/pagerank/pr12.pl">Run the program</a></p>
|
| </ul>
|
| <p align="left">
|
| <table nof="TE" border="0" cellpadding="0" cellspacing="0" width="100%">
|
| <tbody><tr>
|
| <td align="center"><img id="Picture17" src="pagerank_files/image017.gif" border="0" height="178" width="302"></td>
|
| </tr>
|
| </tbody></table>
|
| </p><p align="left">In
|
| this simple example, where there’s only one document, the first page of
|
| the document has a higher PR than the Home Page! This is because page B
|
| is getting all the vote from page A, but page A is only getting
|
| fractions of pages B, C and D.</p>
|
| <ul>
|
| <li><b>Principle</b>:
|
| in order to give users of your site a good experience, you may have to
|
| take a hit against your PR. There’s nothing you can do about this - and
|
| neither should you try to or worry about it! If your site is a pleasure
|
| to use lots of other webmasters will link to it and you’ll get back
|
| much more PR than you lost.</li>
|
| </ul>
|
| <p align="left">Can
|
| you also see the trend between this and the previous example? As you
|
| add more internal links to a site it gets closer to the Fully Meshed
|
| example where every page gets the average PR for the mesh.</p>
|
| <ul>
|
| <li><b>Observation</b>: as you add more internal links in your site, the PR will be spread out more evenly between the pages.</li>
|
| </ul>
|
| <p align="left"> </p>
|
| <p align="left"><a name="ex13"></a><b><font face="Arial,Helvetica,Univers,Zurich BT,sans-serif" size="+1">Example 13</font></b></p>
|
| <p align="left">Getting high PR the wrong way and the right way.</p>
|
| <p align="left">Just as an experiment, let’s see if we can get 1,000 pages pointing to our home page, but only have one link leaving it…</p>
|
| <ul>
|
| <p align="left"><a href="http://www.iprcom.com/papers/pagerank/src_13">Show the code</a> | <a href="http://www.iprcom.com/papers/pagerank/pr13.pl">Run the program</a></p>
|
| </ul>
|
| <p align="left">
|
| <table nof="TE" border="0" cellpadding="0" cellspacing="0" width="100%">
|
| <tbody><tr>
|
| <td align="center"><img id="Picture18" src="pagerank_files/image018.gif" border="0" height="320" width="511"></td>
|
| </tr>
|
| </tbody></table>
|
| </p><p align="left">Yup, those spam pages are pretty worthless but they sure add up!</p>
|
| <ul>
|
| <li><b>Observation</b>:
|
| it doesn’t matter how many pages you have in your site, your average PR
|
| will always be 1.0 at best. But a hierarchical layout can strongly
|
| concentrate votes, and therefore the PR, into the home page!</li>
|
| </ul>
|
| <p align="left">This
|
| is a technique used by some disreputable sites (mostly adult content
|
| sites). But I can’t advise this - if Google’s robots decide you’re
|
| doing this there’s a good chance you’ll be banned from Google! Disaster!</p>
|
| <p align="left">On the other hand there are at least two right ways to do this:</p>
|
| <p align="left"><b><font face="Arial,Helvetica,Univers,Zurich BT,sans-serif" size="+1">1. Be a Mega-site</font></b></p>
|
| <p align="left">Mega-sites, like <a href="http://news.bbc.co.uk/">http://news.bbc.co.uk</a>
|
| have tens or hundreds of editors writing new content – i.e. new pages -
|
| all day long! Each one of those pages has rich, worthwile content of
|
| its own and a link back to its parent or the home page! That’s why the
|
| Home page Toolbar PR of these sites is 9/10 and the rest of us just get
|
| pushed lower and lower by comparison…</p>
|
| <ul>
|
| <li><b>Principle</b>: Content Is King! There really is no substitute for lots of good content…</li>
|
| </ul>
|
| <p align="left"> </p>
|
| <p align="left"><b><font face="Arial,Helvetica,Univers,Zurich BT,sans-serif" size="+1">2. Give away something useful</font></b></p>
|
| <p align="left"><a href="http://www.phpbb.com/">www.phpbb.com</a> has a Toolbar PR of 8/10 (at the time of writing) and it has no big money or marketing behind it! How can this be?</p>
|
| <p align="left">What
|
| the group has done is write a very useful bulletin board system that is
|
| becoming very popular on many websites. And at the bottom of every
|
| page, in <b>every</b> installation, is this HTML code:</p>
|
| <blockquote align="LEFT">Powered by <a href="http://www.phpbb.com/" target="_blank">phpBB</a></blockquote>
|
| <p align="left">The administrator of each installation can remove that link, but most don’t because they want to return the favour…</p>
|
| <p align="left">Can you imagine all those millions of pages giving a fraction of a vote to <a href="http://www.phpbb.com/">www.phpbb.com</a>? Wow!</p>
|
| <ul>
|
| <li><b>Principle</b>:
|
| Make it worth other people’s while to use your content or tools. If
|
| your give-away is good enough other site admins will gladly give you a
|
| link back.</li>
|
| <li><b>Principle</b>:
|
| it’s probably better to get lots (perhaps thousands) of links from
|
| sites with small PR than to spend any time or money desperately trying
|
| to get just the one link from a high PR page.</li>
|
| </ul>
|
| <p align="left"> </p>
|
| <p align="left"><b><font face="Arial,Helvetica,Univers,Zurich BT,sans-serif" size="+1">A Discussion on Averages</font></b></p>
|
| <p align="left">From the Brin and Page paper, the average Actual PR of all pages in the index is 1.0! </p>
|
| <p align="left">So
|
| if you add pages to a site you’re building the total PR will go up by
|
| 1.0 for each page (but only if you link the pages together so the
|
| equation can work), but the average will remain the same. </p>
|
| <p align="left">If
|
| you want to concentrate the PR into one, or a few, pages then
|
| hierarchical linking will do that. If you want to average out the PR
|
| amongst the pages then "fully meshing" the site (lots of evenly
|
| distributed links) will do that - examples 5, 6, and 7 in my above.
|
| (NB. this is where Ridings’ goes wrong, in his MiniRank model feedback
|
| loops will increase PR - indefinitely!)</p>
|
| <p align="left">Getting
|
| inbound links to your site is the only way to increase your site's
|
| average PR. How that PR is distributed amongst the pages on your site
|
| depends on the details of your internal linking and which of your pages
|
| are linked to.</p>
|
| <p align="left">If
|
| you give outbound links to other sites then your site's average PR will
|
| decrease (you're not keeping your vote "in house" as it were). Again
|
| the details of the decrease will depend on the details of the linking.</p>
|
| <p align="left">Given
|
| that the average of every page is 1.0 we can see that for every site
|
| that has an actual ranking in the millions (and there are some!) there
|
| must be lots and lots of sites who's Actual PR is below 1.0
|
| (particularly because the absolute lowest Actual PR available is (1 -
|
| d)).</p>
|
| <p align="left">It
|
| may be that the Toolbar PR 1,2 correspond to Actual PR's lower than
|
| 1.0! E.g. the logbase for the Toolbar may be 10 but the Actual PR
|
| sequence could start quite low: 0.01, 0.1, 1, 10, 100, 1,000 etc... </p>
|
| <p align="left"><b><font face="Arial,Helvetica,Univers,Zurich BT,sans-serif" size="+1">Finally</font></b></p>
|
| <p align="left">PageRank
|
| is, in fact, very simple (apart from one scary looking formula). But
|
| when a simple calculation is applied hundreds (or billions) of times
|
| over the results can <b>seem</b> complicated.</p>
|
| <p align="left">PageRank
|
| is also only part of the story about what results get displayed high up
|
| in a Google listing. For example there’s some evidence to suggest that
|
| Google is paying a lot of attention these days to the text in a link’s
|
| anchor when deciding the relevance of a target page – perhaps more so
|
| than the page’s PR…</p>
|
| <p align="left">PageRank <b>is</b> still part of the listings story though, so it’s worth your while as a good designer to make sure you understand it correctly.</p>
|
| <p align="left"> </p>
|
| <p align="left"><b><font face="Arial,Helvetica,Univers,Zurich BT,sans-serif" size="+1">Links</font></b></p>
|
| <ul>
|
| <li>The original PageRank paper by Google’s founders Sergey Brin and Lawrence Page<b> </b>- <a href="http://www-db.stanford.edu/%7Ebackrub/google.html" target="_blank">http://www-db.stanford.edu/~backrub/google.html</a>
|
| </li>
|
| <li>Chris Ridings’ “PageRank Explained” paper which, as of April 2002 <a href="http://web.archive.org/web/*/http://www.goodlookingcooking.co.uk/PageRank.pdf">http://web.archive.org/web/*/http://www.goodlookingcooking.co.uk/PageRank.pdf</a> ,
|
| contains one major mistake/misunderstanding - <a href="http://www.goodlookingcooking.co.uk/PageRank.pdf" target="_blank">http://www.goodlookingcooking.co.uk/PageRank.pdf</a> </li>
|
| <li>Phil Craven’s <a href="http://webworkshop.net/pagerank_calculator.html" target="_blank">PageRank Calculator</a> (fortunately his figures agree with mine)</li>
|
| <li>A detailed explanation of how easy it is to <a href="http://www.iprcom.com/papers/pagerank/altered_equation.html">alter the PageRank equation by mistake</a></li>
|
| <li>An excellent discussion on chad-jams (including “pregnant chad”) by Douglas W. Jones - <a href="http://www.cs.uiowa.edu/%7Ejones/cards/chad.html">http://www.cs.uiowa.edu/~jones/cards/chad.html</a> - I don’t think many people know the
|
| United States’ voting system is this flawed!!!</li>
|
| <li>Discussion forums on this topic:</li>
|
| <ul>
|
| <li><a href="http://www.marketpositiontalk.com/forums/index.cfm?cfapp=11" target="_blank">MarketPositionTalk - PageRank updates</a></li>
|
| <li><a href="http://searchengineforums.com/Forum28/HTML/002922.html" target="_blank">SearchEngineForums - PR documents and calculator</a></li>
|
| <li><a href="http://www.webmasterworld.com/forum3/3199.htm" target="_blank">WebmasterWorld - PR document and calculator</a></li>
|
| </ul>
|
| </ul>
|
| <p align="left"><b><font face="Arial,Helvetica,Univers,Zurich BT,sans-serif" size="+1">About the Author</font></b></p>
|
| <p align="left"><a href="http://www.ianrogers.net/">Ian Rogers</a>
|
| first used the Internet in 1986 sending email on a University VAX
|
| machine! He first installed a webserver in 1990, taught himself HTML
|
| and perl CGI scripting. Since then he has been a Senior Research Fellow
|
| in User Interface Design and a consultant in Network Security and
|
| Database Backed Websites. He has had an informal interest in topology
|
| and the mathematics and behaviour of networks for years and has also
|
| been known to do a little Jive dancing.</p>
|
| <p align="left">This paper was sponsored by <a href="http://www.iprcom.com/index.html">IPR Computing Ltd – specialists in Secure Networks and Database Backed Websites</a></p>
|
| </td>
|
| </tr>
|
| </tbody></table>
|
| <table nof="LY" border="0" cellpadding="0" cellspacing="0">
|
| <tbody><tr align="left" valign="top">
|
| <td height="10" width="7"><img src="pagerank_files/clearpixel.gif" border="0" height="1" width="7"></td>
|
| <td></td>
|
| </tr>
|
| <tr align="left" valign="top">
|
| <td></td>
|
| <td nof="NB_UYHT" nowrap="nowrap">
|
| <p>[<a href="http://www.iprcom.com/index.html">Home</a>] [<a href="http://www.iprcom.com/services/index.html">Services</a>] [<a href="http://www.iprcom.com/portfolio/index.html">Portfolio</a>] [<a href="http://www.iprcom.com/downloads/index.html">Downloads</a>] [<a href="http://www.iprcom.com/papers/index.html">White Papers</a>]</p>
|
| </td>
|
| </tr>
|
| </tbody></table>
|
| </td>
|
| </tr>
|
| </tbody></table>
|
|
|
| </body></html> |