Archive for Google Search
The Life and Near Death of DMOZ
Posted by: | CommentsThe casket was all but closed on the venerable Open Directory Project (ODP, or dmoz.org ). A December 16 blog post by an ODP founder, Rich Skrenta, “DMOZ had 9 lives. Used up yet?” , suggested that the directory at DMOZ is now, like Marley’s ghost, deader than a doornail. DMOZ was down and, for over a month and a half, it looked like it was down for the count.
In reality, DMOZ is not dead though the rumours of its demise were not exactly exaggerated either. Because this six-week unscheduled outage followed several years of consumer dissatisfaction, lagging editorial energy, and layoffs at AOL, many made the logical assumption that the plug had been pulled.
While the website still functions as a searchable directory, its editing functions have only just been restored after six weeks of downtime. Since the last week in October, editors and submitters have been greeted by versions of a customized DMOZ 404 page. DMOZ was basically a dead directory referencing over 4 million websites spanning nearly 600,000 categories. Though editing has been restored, it is still not possible to submit new sites.
Even if webmasters could submit new sites, chances are they would not receive timely editorial attention. For the last few years, webmasters have complained about the now legendary backlog of sites awaiting review and inclusion. It can take months or even years for spelling mistakes to be corrected and an enormous number of the 590,000 categories that make up the directory do not even have editors. Though many webmasters consider the Open Directory useless because of that backlog, it still swings a big weíght in the search sector.
The greatest success of the Open Directory Project stems from the free database access offered to any other search entity. The majority of search engines and directories use the ODP’s RDF-esque data-dump to help populate their databases. As every ODP listing is human edited, Google and other search engines have tended to treat ODP references as trustable sites. Carrying a PageRank of 8, links from the ODP continue to be considered Google-Gold by SEOs. Other search engines receiving results from the DMOZ directory include Ask, Yahoo and AOL. Clearly the ODP remains an important entity in the search space.
It has certainly earned its status as an important entity. The Open Directory Project has a long history that dates back to 1998. Since the day it went online as GnuHoo in June 1998 it has played a crucial, defining role in the evolution of the search sector and of the Internet.
Gnuhoo appeared on June 5 1998 in response to the rapid growth of the web. The number of new sites coming online in 1998 far exceeded the capacity of Yahoo’s editorial staff that was rumoured to number less than 200. Gnuhoo co-founders Richard Skrenta and Bob Truel believed they could create a better directory using an unlimited supply of volunteer editors than Yahoo could with their limited team of professional editors.
They were right. NewHoo grew faster than Yahoo did in the last half of 1998. Less than a year after it went online, the all-volunteer project had acquired 8,000 editors and over 430,000 websites. By then it had undergone two name changes and had been acquired by one of the largest emerging online entities.
Within days of being online, Gnuhoo had attracted enough attention to force a rapid succession of name changes. First the Free Software Foundation objected to the use of the term GNU after a Slashdot article misconnected the two projects. Gnuhoo was thus renamed Newhoo. A few days later, Yahoo raised issues about the use of the suffix “Hoo”. At the same time, Netscape Communications Corporation opened a dialogue with Skrenta about acquiring the upstart directory project as a major asset during their competitive phase with Microsoft.
Promising to respect the founders’ original intentions to keep the site a non-commercial entity, Netscape acquired the directory for $1 million in October 1998 and renamed it the Open Directory Project. ODP data was released freely under the Open Directory Licence. A month after Netscape bought ODP, America Online (AOL) purchased Netscape. AOL agreed to honour the Open Directory Licence, formalizing it in a Social Contract with the web community. This marks the real start of the ODP’s rise. By early spring 1999, most of the major search engines were pulling data from the ODP.
1998 and 1999 was a special time in the history of the Internet. Billíons of dollars were invested as eager speculators and venture capitalists moved to cäsh in on the promise of ínstant riches. Start-up companies with no functional business plans became multi-million dollar concerns overnight. The first generation of ínstant online millionaires was spawned and talk of breaking the traditional business cycle was taken seriously. The bottom was about to fall out of what had become a stratospheric marketplace but at the time, very few saw the danger through the haze of the hype. When the sky fell, it fell hard. In a tangential way, the ODP was directly involved. Though it is technically a non-profit society, ownership of the ODP is considered a business asset.
The trigger event that led to the crash of 2000 was the most significant deal in the history of global publishing. In January 2000, less than a year after it had acquired Netscape and DMOZ, AOL purchased the Time Warner media empire for approximately $160 Billion in an all-stock deal. The excess of that deal, one in which an upstart tech firm absorbed the largest brick and mortar information and entertainment business in the world, made a number of analysts look at the silliness of it all. Within three months, the shares AOL used to buy Time Warner would be worth a fraction of their value when the deal was struck.
The Tech-crash of 2000 had a cascading effect across the web. Most, if not all, of those new businesses without business plans were quickly put out of business as the value of those firms had declined and no new sources of ínvestment were forthcoming. Online properties supported by shareholders, such as Yahoo and AOL/Time Warner, were in sudden desperate trouble. 18-months of tech sector doldrums set in as the ínvestment world started looking for a revenue source that could sustain the staggering costs of the sector.
A new search engine appeared on the scene around this time. It had a funny name and appeared to disregard the dominant portal or directory structure favoured by most search engines. Hidden behind its sparse front page and childish logo was a revolutionary way of producing what everyone agreed at the time were extraordinarily accurate search results. The age of Google began in late 2000. A year later, the power of viral marketing had propelled Google into the big leagues, making it a serious challenger to AltaVista, Lycos and Yahoo.
Google populated itself in part by using DMOZ data. In its earliest years, Google used DMOZ as its directory, displaying virtually mirrored results. Google’s unique method of judging page content by the number and value of incoming links made a listing at the Open Directory critically important for SEOs and webmasters. As Google’s popularity and reach grew, the value of a DMOZ link grew. Because ODP listings are human reviewed, Google has traditionally tended to trust them, thus producing stronger placements faster. Between 2001 and into 2005, Google was responsible for over 80% of all organic search listings either directly or through feeding competitors such as Yahoo and MSN.
When Google figured out how to make the paid-advertising system Overture was using make oodles of monëy, all hell broke loose again and we rapidly advanced to where we are today.
When Google became the most important search engine, search marketers began targeting the Open Directory with site submissions, often with several sites for the same company. As one ODP editor put it, “We nevër asked to be used by Google like this.” As the decade progressed, new methods of creating web documents (html editors, CMS, blogs, etc…) spurred another period of extraordinary growth that far surpassed the ability of DMOZ editors to keep up.
A classic dilemma existed. A link from DMOZ could mean the difference between weeks and months waiting for a good placement at Google. The ODP was nevër supposed to have such influence. The relationship Google’s algorithm created between itself, the Open Directory and webmasters wanting a DMOZ listing ended up threatening the open editorial policies originally envisioned. It was difficult to enlist new editors when many applicants were primarily motivated by the ability to insert their own sites.
Though it boasts almost 75,000 editors, it also contains over 590,000 directory categories and sub-categories. The Open Directory is enormous and continues to be driven by volunteers. One of two things happen; either its volunteer editors deal with an average of 8 categories each or some categories will have to go unedited. The latter tends to happen more often than the former and the public and search engines are left with a less than complete directory to draw from. Such has been the case for the past two or three years.
In their defence, the ODP editorial staff would suggest that the majority of sites they continue to see are junk advertisement pages designed for SEO or PPC purposes. Similar comments appear in any number of threads found at the Open Directory Resource Zone, a public chat forum designed to promote communication between editors and users.
With a massive backlog of unreviewed submissions and a huge demand from search marketers hungry for the rankings boost expected from a DMOZ listing, many felt the ODP was becoming an elite, secretive society. Editorial applicants reported their requests were going unanswered and allegations of corruption amongst rouge editors emerged. By end of 2005, the ODP appeared to be in total disarray with more sites in the review process than were actually in the directory. Throughout 2006, the ODP has become less and less relevant to the search marketing community until, towards the end of the year, it was gone.
Most of the directory appears to be functioning again though it is likely a version carped together using data from the last RDF file. When the server at AOL crashed, it took most of the current directory and all of its records with it. A number of meta editors have spent the past six weeks rebuilding the directory with the help of a few friendly AOL techs. The submit a site feature is, as of this time, not functioning.
Outwardly, the importance of the Open Directory was obvious but the greatest contributions to the Internet from the Open Directory team come from the people involved with the movement and the open-source philosophy that has descended from them.
When Netscape absorbed it, the Open Directory Project became part of an amazingly influential environment. Founded by legendary Marc Andreessen, Netscape was already part of the Open Source movement. Netscape founded the Mozilla Foundation in January 1998, nearly a year before it acquired DMOZ. The Mozilla Foundation introduced and marketed the Firefox browser.
The ODP was arguably the first successful long-term project that could fall under the general heading Web2.0. Its philosophy set the stage for the Wikipedia and other community based websites. Unlike other collaborative projects that predate it, the ODP was a truly grassroots endeavour. Participants didn’t need to be extraordinary technicians; they just had to be able to understand the editing techniques used by their community.
Though rumours of its death are obviously exaggerated, complaints about its demise are not. The ODP is a wonderful entity, but the power it inadvertently exerts is far greater than its ability to edit itself. Many have suggested the ODP should shut its door for good but perhaps this downtime has given its meta-editorial collective a chance to consider its role in the search community.
About The Author
Search marketing expert Jim Hedger is one of the most prolific writers in the search sector with articles appearing in numerous search related websites and newsletters, including SiteProNews, Search Engine Journal, ISEDB.com, and Search Engine Guide.
He is currently Senior Editor for the Jayde Online news sources SEO-News and SiteProNews. You can also find additional tips and news on webmaster and SEO topics by Jim at the SiteProNews blog.

Google’s Tag To Remove Content Spamming
Posted by: admin | Comments (0)Content spamming, in its simplest förm, is the taking of content from other sites that rank well on the search engines, and then either using it as-it-is or using a utility software like Articlebot to scramble the content to the point that it can’t be detected with plagiarism software. In either case, your good, search-engine- friendly content is stolen and used, often as part of a doorway page, to draw the attention of the search engines away from you.
Everyone has seen examples of this: the page that looks promising but contains lists of terms (like term – term paper – term papers – term limits) that link to other similar lists, each carrying Google advertising. Or the site that contains nothing but content licensed from Wikipedia. Or the site that plays well in a search but contains nothing more than SEO gibberish, often ripped off from the site of an expert and minced into word slaw.
These sites are created en masse to provide a fertile ground to draw eyeballs. It seems a waste of time when you receive a penny a view for even the best-paying ads – but when you put up five hundred sites at a time, and you’ve figured out how to get all of them to show up on the first page or two of a lucrative Google search term, it can be surprisingly profitable.
The losers are the people who clíck on these pages, thinking that there is content of worth on these sites – and you. Your places are stolen from the top ten by these spammers. Google is working hard to lock them out, but there is more that you can do to help Google.
Using The Antispam Tag
But there is another loser. One of the strengths of the Internet is that it allows for two-way public communication on a scale nevër seen before. You post a blog, or set up a wiki; your audience comments on your blog, or adds and changes your wiki.
The problem? While you have complete control over a website and its contents in the normal way of things, sites that allow for user communication remove this complete control from you and give it to your readers. There is no way to prevent readers of an open blog from posting unwanted links, except for manually removing them. Even then, links can be hidden in commas or periods, making it nearly impossible to catch everything.
This leaves you open to the accusation of link sp@m – for links you nevër put out there to begin with. And while you may police the most recent several blogs you’ve posted, no one polices the ones from several years ago. Yet Google still looks at them and indexes them. By 2002, bloggers everywhere were begging Google for an ignore tag of some sort to prevent its spiders from indexing comment areas.
Not only, they said, would bloggers be grateful; everyone with two-way uncontrolled communication – wikis, forums, guest books – needed this service from Google. Each of these types of sites has been inundated with sp@m at some point, forcing some to shut down completely. And Google itself needed it to help prevent the rampant sp@m in the industry.
In 2005, Google finally responded to these concerns. Though their solution is not everything the online community wanted (for instance, it leads to potentially good content being ignored as well as sp@m), it does at least allow you to section out the parts of your blog that are public. It is the “nofollow” attribute.
“Nofollow” allows you to mark a portion of your web page, whether you’re running a blog or you want to section out paid advertising, as an area that Google spiders should ignore. The great thing about it is that not only does it keep your rankings from suffering from sp@m, it also discourages spammers from wasting your valuable comments section with their junk text.
The most basic part of this attribute involves embedding it into a hyperlink. This allows you to manually flag links, such as those embedded in paid advertising, as links Google spiders should ignore. But what if the content is user-generated? It’s still a problem because you certainly don’t have time to go through and mark all those links up.
Fortunately, blogging systems have been sensitive to this new development. Whether you use WordPress or another blogging system, most have implemented either automated “nofollow” links in their comment sections, or have issued plugins you can implement yourself to prevent this sort of spamming.
This does not solve every problem. But it’s a great start. Be certain you know how your user-generated content system provides this service to you. In most cases, a software update will implement this change for you.
Is This Spamming And Will Google Block Me?
There’s another problem with the spamming crowd. When you’re fighting search engine sp@m and start seeing the different forms it can take – and, disturbingly, realizing that some of your techniques for your legitímate site are similar – you have to wonder: Will Google block me for my search engine optimization techniques?
This happened recently to BMW’s corporate site. Their webmaster, dissatisfied with the dealership’s position when web users searched for several terms (such as “new car”), created and posted a gateway page – a page optimized with text that then redirects searchers to an often graphics-heavy page.
Google found it and, rightly or wrongly, promptly dropped their page rank manually to zero. For weeks, searches for their site turned up plenty of sp@m and dozens of news stories – but to find their actual site, it was necessary to drop to the bottom of the search, not easy to do in Googleworld.
This is why you really need to understand what Google counts as search engine sp@m, and adhere to their restrictions even if everyone else doesn’t. Nevër create a gateway page, particularly one with spammish data. Instead, use legitímate techniques like image alternate text and actual text in your page. Look for ways to get other pages to point to your site – article submission, for instance, or directory submission. And keep your content fresh, always.
While duplicated text is often a sign of serious spammage, the Google engineers realize two things: first, the original text is probably still out there somewhere, and it’s unfair to drop that person’s rankings along with those who stole it from them; and second, certain types of duplicated text, like articles or blog entries, are to be expected.
Their answer to the first issue is to credít the site first catalogued with a particular text as the creator, and to drop sites obviously spammed from that one down a rank. The other issue is addressed by looking at other data around the questionable data; if the entire site appears to be spammed, it, too, is dropped. Provided you are not duplicating text on many websites to fraudulently increase your ranking, you’re safe. Ask yourself: are you using the same content on several sites registered to you in order to maximize your chances of being read? If the answer is yes, this is a bad idea and will be classified as spamdexing. If your content would not be useful to the average Internet surfer, it is also likely to be classed as spamdexing.
There is a very thin line between search engine optimization and spamdexing. You should become very familiar with it. Start with understanding hidden/invisible text, keyword stuffing, metatag stuffing, gateway pages, and scraper sites.
About The Author
Article by Danny Wirken
http://www.chauy.com/2006/07/googles-tag-to-remove-content-spamming/