|
THE OTHER SIDE OF THE SEARCH GODS ABRACADABRA!!
By Liji Elizabeth Thomas
Thousands of servers
billions of web pages
. the possibility
of individually sifting through the WWW is null. The search engine
gods cull the information you need from the Internet...from tracking
down an elusive expert for communication to presenting the most
unconventional views on the planet. Name it and click it. Beyond
all the hype created about the web heavens they rule, lets
attempt to keep the argument balanced.
From Google to Voice of the Shuttle (for humanities research) these
ubiquitous gods that enrich the net, can be unfair
and do
wear pitfalls. And considering the rate at which the Internet continues
to grow, the problems of these gods are only exacerbated further.
Primarily, what you need to digest is the fact that search engines
fall short of Mandrakes magic mechanism! They simply dont
create URLs out of thin air but instead send their spiders crawling
across those sites that have rendered prayers (and expensive offerings!)
to them for consideration. Even when sites like Google claim to
have a massive 3 billion web pages in its database, a large portion
of the web nation is invisible to these spiders. To think they are
simply ignorant of the Invisible Web. This invisible web holds that
content, normal search engines can't index because the information
on many web sites is in databases that are only searchable within
that site.
Sites like
www.imdb.com -
The Internet Movie Database ,
www.incywincy.com
- IncyWincy, the invisible web search engine and
www.completeplanet.com
- The Complete Planet that cover this area are perhaps the only
way you can access content from that portion of the Internet, invisible
to the search gods.
Here, you dont perform a direct content search but search
for the resources that may access the content. (Meaning - be sure
to set aside considerable time for digging.) None of the search
engines indexes everything on the Web (I mean none). Tried research
literature on popular search engines? AltaVista to Yahoo, will list
thousands of sources on education, human resource development, etc.
etc. but mostly from magazines, newspapers, and various organizations'
own Web pages, rather than from research journals and dissertations-
the main sources of research literature. Thats because most
of the journals and dissertations are not yet available publicly
on the Web. Thought theyll get you all thats hosted
on the web? Think again.
The Web is huge and growing exponentially. Simple searches, using
a single word or phrase, will often yield thousands of "hits",
most of which will be irrelevant. A layman going in for a piece
of info to the internet has to deal with a more severe issue - too
much information! And if you dont learn how to control the
information overload from these websites, returned by a search result,
roll out the red carpet for some frustration. A very common problem
results from sites that have a lot of pages with similar content.
For e.g., if a discussion thread (in a forum) goes on for a hundred
posts there will be a hundred pages all with similar titles, each
containing a wee bit of information. Now instead of just one link,
all hundred of those darn pages will crop up your search result,
crowding out other relevant site. Regardless of all the sophistication
technology has brought in, many well thought-out search phrases
produce list after list of irrelevant web pages.
The typical search still requires sifting through dirt to find
the gold. If you are not specific enough, you may get too many irrelevant
hits. As said, these search engines do not actually search the web
directly but their centralized server instead. And unless this database
is updated continually to index modified, moved, deleted or renamed
documents, you will land yourself amidst broken links and stale
copies of web pages. So if they inadequately handle dynamic web
pages whose content changes frequently, chances are for the information
they reference to quickly go out-of-date. After they wage their
never ending war with over-zealous promoters (spamdexers rather),
where do they have time to keep their databases current and their
search algorithms tuned?
No surprise if a perfectly worthwhile site may go unlisted! Similarly,
many of the Web search engines are undergoing rapid development
and are not well documented. You will have only an approximate idea
of how they are working, and unknown shortcomings may cause them
to miss desired information. Not to mention, amongst the first class
information, the web also houses false, misleading, deceptive and
dressed up information actually produced by charlatans. The Web
itself is unstable and tomorrow they may not find you the site they
found you today. Well if you could predict them, they would not
be god!
would they?!
The syntax (word order and punctuation) for various types of complex
searches varies some from search engine to search engine, and small
errors in the syntax can seriously compromise the search. For instance,
try the same phrase search on different search engines and youll
know what I mean. Novices
read this line - using search engines
does involve a learning curve. Many beginning Internet users, because
of these disadvantages, become discouraged and frustrated. Like
a journalist put it, Not showing favoritism to its business
clients is certainly a rare virtue in these times. Search
engines have increasingly turned to two significant revenue streams.
Paid placement: In addition to the main editorial-driven search
results, the search engines display a second and sometimes
third listing that's usually commercial in nature.
The more you pay, the higher you'll appear in the search results.
Paid inclusion: An advertiser or content partner pays the search
engine to crawl its site and include the results in the main editorial
listing. So?
more likely to be in the hit list but then again
- no guarantees. Of course those refusing to favor certain devotees
are industry leaders like Google that publishes paid listings, but
clearly marks them as 'Sponsored Links.' The possibility of these
for-profit search gods (which haven't yet made much
profit) for taking fees to skew their searches, cant be ruled
out. But as a searcher, the hit list you are provided with by the
engine should obviously rank in the order of relevancy and interest.
Search command languages can often be complex and confusing and
the ranking algorithm is unique to each god based on the number
of occurrences of the search phrase in a page, if it appears in
the page title, or in a heading, or the URL itself, or the meta
tag etc. or on a weighted average of a number of these relevance
scores. E.g. Google (www.google.com) uses its patented PageRank
TM and ranks the importance of search results by examining the links
that lead to a specific site. The more links that lead to a site,
the higher the site is ranked. Pop on popularity! Alta Vista, HotBot,
Lycos, Infoseek and MSN Search use keyword indexes fast access
to millions of documents.
The lack of an index structure and poor accuracy of the size of
the WWW, will not make searching any easier. Large number of sites
indexed. Keyword searching can be difficult to get right. In reality,
however, the prevalence of a certain keyword is not always in proportion
to the relevance of a page. Take this example. A search on sari
- the national costume of India in a popular search engine,
returned among its top sites, the following links: ?www.scri.sari.ac.uk/-
of the Scottish Crop research Institute ?www.ubudsari.com/ -a health
resort in Indonesia ?www.sari-energy.org/ - The South Asia Regional
Initiative for Energy Cooperation and Development Pretty useful
sites for someone very much interested in knowing how to drape or
the tradition of the sari?! (Well, no prayer goes unanswered
whether
you like the answer or not!) By using keywords to determine how
each page will be ranked in search results and not simply counting
the number of instances of a word on a page, search engines are
attempting to make the rankings better by assigning more weight
to things like titles, subheadings, and so on.
Now, unless you have a clear idea of what you're looking for, it
may be difficult or impossible to use a keyword search, especially
if the vocabulary of the subject is unfamiliar. Similarly, the concept
based search of Excite (instead of individual words, the words that
you enter into a search are grouped and attempted to determine the
meaning) is a difficult task and yields inconsistent results. Besides
who reviews or evaluates these sites for quality or authority? They
are simply compiled by a computer program.
These active search engines rely on computerized retrieval mechanisms
called "spiders", "crawlers", or "robots",
to visit Web sites, on a regular basis and retrieve relevant keywords
to index and store in a searchable database. And from this huge
database yields often unmanageable and comprehensive results
.results
whose relevance is determined by their computers. The irrelevant
sites (high percentage of noise, as its called), questionable
ranking mechanisms and poor quality control may be the result of
less human involvement to weed out junk. Thought human intervention
would solve all probes
.read on. >From the very first search
engine Yahoo to about.com, Snap.com, Magellan, NetGuide,
Go Network, LookSmart, NBCi and Starting Point, all subject directories
index and review documents under categories making them more
manageable.
Unlike active search engines, these passive or human-selected
search engines like dont roam the web directly and are human
controlled, relying on individual submissions. Perhaps the easiest
to use in town, but the indexing structure these search engines
cover only a small portion of the actual number of WWW sites and
thus is certainly not your bet if you intend specific, narrow or
complex topics. Subject designations may be arbitrary, confusing
or wrong. A search looks for matches only in the descriptions submitted.
Never contains full text of the web they link to - you can only
search what you see titles, descriptions, subject categories, etc.
Human-labor intensive process limits database currency, size, rate
of growth and timeliness. You may have to branch through the categories
repeatedly before arriving at the right page. They may be several
months behind the times because of the need for human organization.
Try looking for some obscure topic
.chances for the people
that maintain the directory to have excluded those pages. Obviously,
machines can blindly count keywords but they can't make common-sense
judgement as humans can. But then why does human-edited directories
respond with all this junk?! And heres about those meta search
engines. A comprehensive search on the entire WWW using The Big
Hub, Dogpile, Highway61, Internet Sleuth or Savvysearch , covering
as many documents as possible may sound as good an idea as a one
stop shopping. Meta search engines do not create their own databases.
They rely on existing active and passive search engine indexes to
retrieve search results. And the very fact that they access multiple
keyword indexes reduces their response time.
It sure does save your time by searching several search engines
at once but at the expense of redundant, unwanted and overwhelming
results
.much more important misses. The default search
mode differs from search site to search site, so the same search
is not always appropriate in different search engine software. The
quality and size of the databases vary widely. Weighted Search Engines
like Ask Jeeves and RagingSearch allows the user to type queries
in plain English without advanced searching knowledge, again at
the expense of inaccurate and undetailed searching. Review or Ranking
Sources like Argus Clearinghouse (www.clearinghouse.net), eBlast
(eblast.com) and Librarian's Index to the Internet (lii.org). They
evaluate website quality from sources they find or accept submissions
from but cover a minimal number of sites. As a webmaster, your site
registration with the biggest billboards in Times Square can get
you closer to bingo! for the searcher.
Those who didnt even know you existed before are in your
living room in New York time! Your URL registration is a no-brainer,
considering the generation of flocking traffic to your site. Certainly
a quick and inexpensive method, yet is only a component of the overall
marketing strategy that in itself offers no guarantees, no instant
results and demands continued effort for the webmaster. Commerce
rules the web. Like how a notable Internet caveman put it, Web
publishers also find dealing with search engines to be a frustrating
pursuit. Everybody wants their pages to be easy for the world to
find, but getting your site listed can be tough. Search sites may
take a long time to list your site, may never list it at all, and
may drop it after a few months for no reason.
If you resubmit often, as it is very tempting to do, you may even
be branded a spamdexer and barred from a search site. And as for
trying to get a good ranking, forget it! You have to keep up with
all the arcane and ever-changing rules of a dozen different search
engines, and adjust the keywords on your pages just so...all the
while fighting against the very plausible theory that in fact none
of this stuff matters, and the search sites assign rankings at random
or by whim. To make the best use of Web search engines--to
find what you need and avoid an avalanche of irrelevant hits-- pick
search engines that are well suited to your needs. And lest youd
want to cry Ye immortal gods! where in the world are we?,
spend a few hours becoming moderately proficient with each.
Each works somewhat differently, most importantly in respect to
how you broaden or narrow a search. Finding the appropriate search
engine for your particular information need, can be frustrating.
To effectively use these search engines, it is important to understand
what they are, how they work, and how they differ. For e.g. while
using a meta search engine, remember that each engine has its own
methods of displaying and ranking results. Remember, search strategies
affect the results. If the user is unaware of basic search strategies,
results may be spotty. Quoting Charlie Morris (the former editor
of The Web developers journal) - Search engines and
directories survive, and indeed flourish, because they're all we've
got. If you want to use the wealth of information that is the Web,
you've got to be able to find what you want, and search engines
and directories are the only way to do that. Getting good search
results is a matter of chance.
Depending on what you're searching for, you may get a meaty list
of good resources, or you may get page after page of irrelevant
drivel. By laboriously refining your search, and using several different
search engines and directories (and especially by using appropriate
specialty directories), you can usually find what you need in the
end. Search engines are very useful, no doubt. Right from
getting a quick view of a topic to finding expert contact info
verily
certain issues lie in their lap. Now the very reason we bother about
these search engines so much is because theyre all weve
got! Though there sure is a lot of room for improvement, the hours
need is to not get caught in the middle of the road. By simply understanding
what, how and where to seek, youd spare yourself the fate
of chanting that old Jewish proverb If God lived on earth,
people would break his windows. Happy searching! |