Monday, January 3, 2011

Intext Pivoting: Marking points of interest while reading an article

We've learnt from our research that user attention while reading an article is substantially greater than anything in the sidebar, footer, or in the banner. In this context, even though attention is the highest, user interest to click and wander is lowest. However, the experience while reading the article completely determines what kind of content they'd love to explore into, after they have read your article. How can a content recommendation service balance out attention and non-intrusion?

Here are some numbers: On sites that had our Keyhole widget (which automatically figures topics that have good coverage on a site, and shows related content on hover), we found almost 30 - 50% of pageviews result in hovers over intext links. About 5% of these hovers resulted in users wanting to scroll more nuggets. However, very few users want to click and go away to other articles - while they are reading one.

The KeyHole widget in action. Notice how nuggets about Facebook from other articles are shown on hover.
Move over to our footer widget - FishEye, we notice that only about 20-30% of users ever get to see this widget. Even amongst those, at best 5% of users interact with the widget. About 70% of these interactions are click throughs to articles.
The FishEye widget. Allows users to explore article recommendations, most suited to the context of the page.

We noticed that interaction and curiosity is highest while reading the article. Click actions are highest after reading the article. So we decided to merge the two - take inputs from users while they read the articles, and allow them to explore and click after they are done with the article.

We call this Intext Pivoting - letting users mark their points of interest while reading an article, but explore, pivot and read them after they are done. Intext Pivoting does not abrupt the user's normal reading behavior, ensures he has fantastic recommendations waiting for him when needed, and increases time spent on both the context and the target pages!
Intext pivoting: A small popup asks if you want to dive into the selected text later


Intext pivots form the central view while looking at related recommendations.

When users finish reading the article, and get to our recommendations widget, we've remembered all his points of interest. The widget interactively allows users to drill down into each of them, explore article recommendations, and the concepts hidden in those articles.

Dhiti Dive is exploratory search technology for publishers, that allows readers discover and pivot around their interests.

Friday, September 24, 2010

Pick nuggets while surfing. Put the joy back into browsing.

Update: The bookmarklet mentioned in this article has now moved to driLLL. Here's a video that shows the benefits of driLLL.



Original article is retained here:

How often have you landed on a page, filled with links and wondered which links you should follow on?

How often have you wondered if there's something useful behind a link?

We've just launched a new bookmarklet today, that changes the way you read, research and browse the web!

Lets say you are searching on a topic, prisms, and land on the wikipedia page, Prism (Optics). You read most of the text here. You are overwhelmed by the rich linking on the page, and wonder which of those links have useful information for you. You may want to know about "total internal reflection", or about "dispersion", or about, how eye care professionals use prisms. Information related to these are all hidden away behind the links. A few are obvious to click. A few are nt. The ones that you want could be two links away. How do you get there? You forget, or you keep clicking. Worse, you may hit the back button, go to Google and search again.

All that changes from today.

You'll first install our bookmarklet. Go to http://nuggetize.com, drag our "Nuggetize on the go" to your browser's bookmarks or favorites bar. This is just javascript. Its perfectly safe. We dont even require you to login or register to use us.



Browse to a page like Prism (Optics).

Press the "Nuggetize on the go" button on your bookmarks.

Lo! In a few seconds, we analyse this page, and its link neighborhood and keep it ready for your exploration. See how easy it now becomes to inspect information in all those references.



You want something specific. Just close our overlay, look at the article and select any piece of text. Say, "Ray angle deviation and dispersion through a prism can be determined by tracing a sample ray through the element and using Snell's law at each interface."



Now hit "Nuggetize on the go" again. See how the nuggets picked from this page's neighborhood just got contextualized.



You like some nugget and want to see more similar ones? Just click on "Show similar" and get there.

This works on most of the public web!

Enjoy!

Sunday, September 12, 2010

In online publishing, yesterday is not history

Almost every successful publisher and blogger I've met says this:

"Search referrals drive 50-70% of my traffic."

Though this does not raise eye brows anymore, publishers have mostly done nothing different for these search referrals. Direct traffic (from home page hits, RSS readers, Twitter and Facebook) differs from search engine referrals in visitor patterns - all very visible in your favorite analytics platform. But publishers do almost nothing to treat these visitors differently.

  1. Direct referrals:
    • readers loyal to your website.
    • readers hooked onto RSS feeds.
    • readers tracking you on Twitter or Facebook.
  2. Search referrals:
    • readers coming from search engines, with specific query-intent
    • readers coming from news and blogs aggregators - like Google News, Digg etc.
If you are a publisher or a blogger, look out for the following differences in your analytics dashboard between these two: Direct traffic:
  • Traffic pattern: Direct traffic starts arriving seconds after your new post is written (or posted on Twitter, Facebook etc.). The traffic for a page peaks a few minutes, hours or days after writing. It dies a fast death thereon. Yesterday's news is history.
  • Visitor type: Most of your visitors are already known to you. They've been to your site earlier.
Search referral traffic:
  • Traffic pattern: Search referral traffic takes a while to build up. You often dont know why some pages get more search traffic than others. Traffic does not peak based on your posting time, but is often leveled out over a larger period of time. Bumps in traffic are dependent on macro-scale patterns of querying.
  • Visitor type: Most visitors from search are visiting your site for the first time.
Your direct traffic is newsy. Your search referral traffic is looking for information. Why are you being newsy to your search visitor? Blogging was invented as a simple way for people to publish articles. We saw a tool that could surface the journalist in us. We expected the same publishing lifecycle to hold. We wrote new articles, and pushed the older ones into a stack - making them harder to find. This was fine until search engines found a way to unearth those buried six feet under, and show them as search results. With search engines around (and they are going to be around for a long time), the half life of an article you write has been extended much longer than you think. We should not serve visitors old and dated content for their current queries. We have to keep it refreshed.

Let me exemplify the problem and potential solutions:
  • android phone price india: An extremely popular query. Look at the first result from androidos.in. Its kept super updated, and the page monetizes the user intent extremely well!
  • iphone price india: Another extremely popular query. Look at the first result from pluggd.in. The article is dated June 30, 2008. There are no updates, but for a link to a newer article. Visitors lose the link. This is a lost opportunity to harvest user intent.
  • deal flow: Another query popular in the VC circles. Looks like a site has been dedicated to this. The wikipedia page on the topic has been kept refreshed and up to date. The third result, from Bill's blog is an excellent article, but has no updates since 2005. It has failed to harvest user intent here, and engage him.
I think you are getting the picture. Publishers can easily track the leading queries that bring search referral traffic to a page, and satisfy that user intent, by converting that page into a wikipedia-like reference page. Blogs, and content sites must have a two-fold approach. They must publish new content to stay ahead of the curve and influence opinion. They must also keep their best intent-satisfying pages up to date. Publishing sites must be a mix of blogging and wikipedia. How do you do this? Let me shamelessly promote our product Nuggetize.com. Nuggetize can be useful in two ways to keep older pages current and reference-like.

  1. Our Landing Lights widget: You can nuggetize your blog, and install our Landing Lights widget. It fires up only for search engine referrals, especially when you have several closely matching articles for your visitors' intent. Your visitor is matched better early on, and you present nuggets from across the site matching the user intent. This is like generating a hub page pointing to relevant information - on demand. This widget takes away some burden from the publisher for maintaining his search engine-popular pages, and automatically picks up other relevant content from across articles for the visitor.
  2. Size up your real competition, and update your page: A page like Bill's Deal Flow Is Dead, Long Live Thesis Driven Investing gets traffic from many different search queries. "Deal flow", "Thesis driven investing", "Thesis driven investment" are all relevant. This page is competing with all the other peers in the search engine results - especially the ones above it. Its long been suggested that entire blogs are to be considered as competition. Eg: AVC.com may be considered competition to Bill's blog. However, in the search engine world, the competition is mostly between pages - Bill's article does not have to have worry about everything else AVC.com writes about - to keep visitors happy on this page. How does Bill improve his content? He has to size up his competition fast, see what areas they cover that he does nt, and what areas he covers that they dont. Nuggetize.com is an excellent tool for this. Try nuggetizing the topics deal flow and thesis driven investment. Nuggetize organizes the facts hidden away in all the competitors along side relevant categories, that helps you survey the breadth and depth fast. Nuggets your competition has covered but you have nt - are indicators of the direction in which you have to update your blog.

Moving away from self-promotion, Integrating analytics data, the publishing tool, and web research will assist publishers keep their old gold mine relevant!

Friday, August 27, 2010

The conflict of interest in content sites

Getting used to a page like this, even from highly respectable sites like pluggd.in is a common affair these days.


What you see is the whole view above the fold, on my high resolution laptop. Other than the title, I dont see anything relevant to me on the page, above the fold.

Pluggd.in is not alone here, in undermining the interests of their visitors versus their monetary compulsions. Most often they dont have a choice, since serving ads is their only monetization model.

These sites:

  1. Need their readers to be satisfied with their content, presentation and experience - for them to come back, spread word-of-mouth, and be influenced.
  2. Need their readers to click on those mostly intrusive and out of context ads - for them to make money.
This is a conflict of interest, a death spiral. Readers develop ad-blindness, install tools like adblock, or just do skim reading. The publisher increases the size of the ad widget, puts them higher in the article, to make them more prominent. The people who make pluggd.in a success for its quality, are most likely not the people helping it make money.

Publishers need a better monetization model than this. Readers deserve a better experience than this.

This article details using eye tracking studies, how users tend to skim over ad regions, carefully look for regions of text. The same study also details how reading behavior is different in search engine results.

Advertising drives the publishing industry

Nitin Srivastava, an expert in online publishing, had an interesting thought about the publishing world. He said, "they deliberately package the content unattractively (fonts, visuals) so that the ads get better attention." This came as a surprise to me. I remember a session with an editor from a leading news paper during my Google News days. He had said, "The ads first fill out the newspaper, and then comes to us editors. We then find appropriate content to fill the gaps."

The online publishing industry may not be far. The content farms are hard at work analyzing "tail queries" that have "ad potential" and long shelf life, and then get contractors to write skeletal, often mediocrely researched articles on those topics. They concentrate on massive SEO to lure those innocuous searchers. Now, is it surprising that they monetize very well? Their content is worse than the ads Google AdSense serves up on their pages. Visitors are happy to click away!

Yearning for a better reading experience

In my first meeting with Ashish Gupta, he had an insightful observation that has lingered on. He said (re-phrased), "A successful product ensures that its monetization model does not disrupt the normal usage model." This is deep! Imagine:


  1. What is the primary user behavior on a search result page? To inspect links, decide on a relevant one based on the search query and snippets, and lo, click! Is there anything different you do with AdWords? You look at an ad, and you click. Infact, in the spammy world, many ads are more relevant than the top queries. Have you tried a query like "car insurance" lately? AdWords is useful, and does not disrupt what the user expects to do.
  2. What is the primary user behavior when he goes shopping? To inspect a product, compare it with others, pick up related products, and always feel in control of decision-making. Now, Amazon allows exactly that - shows competing products and related products. Amazon makes most of its revenue by cross-selling. A customer who went for one, ends up doing collateral shopping and buys many.
  3. What is the primary user expectation when he watches TV? He wants to get entertained. He watches sports, sitcoms, comedies, movies - all just to get entertained. Now, if its interspersed with ads that are entertaining, will he complain? I've often found kids keep busy with toys when the sitcom runs, and watch TV when the ads start. Whats so interesting? They find the colors vivid, sounds diverse, and pace racy. They are hooked - until the same ad repeats a zillion times making you grab that remote - and bounce!
Now, what is the user expectation when he lands on an article page?
  • He wants to read the article! Now everything you saw on that pluggd.in snapshot was preventing me from reading. No wonder click-ads monetization is not the best.
  • He wants to drop off to something else, if he finds the article boring. He stills expects to read something else.
  • After reading, he wants to share - probably a snippet, or an annotation, to his network on facebook or twitter.
Can we have a monetization model for content sites that aligns with the user expectation?
  • Sponsored articles - Outbrain, Zemanta and Lijit, two companies in the content recommendation, and site search space, have been promoting a business model around pushing sponsored articles to relevant contexts in their "inventory".
  • Pay per share - How about pay per share, instead of a pay per click? The advertiser pays the ad network, only if his tweet, snippet, nugget, or article title gets shared to the readers' network.
Both these models preserve user expectation. The first very much conforms to his quest to read. The second one conforms to the best expression of a satisfied reader. No conflict of interest between monetization and experience.

In our startup, Dhiti, we are gung-ho about the latter. Pay per share will make advertisers write useful content, that users deem share-worthy. Eg: Toyota will invest in writing about their latest hybrid technology breakthrough - many nuggets from which people will love and share. Success or leads are not just accomplished by link-baiting users, but actually writing stuff they love to read and share. Wont that put sanity back into our beloved web? A pipe dream!

Saturday, May 1, 2010

Nuggetize: Faceted search for the web through dynamic categorization

The web has immense data on any topic. Traditionally search engines only return lists of links. Its up to the user to open those links and look for information. This causes search fatigue. An approach towards search retrieval - called faceted search, aims to help reduce this pain.

What is faceted search?

Also called as faceted navigation, exploratory search, faceted browsing, guided navigation and sometimes parametric search, it refers to a search engine showing its results from multiple points of view.

Faceted  search is already wildly popular in eCommerce sites - due to companies like Endeca, and even open source implementations like Solr, Sphinx and Lucene. Try a search on amazon for "baby", and you'll notice the search results neatly classified into Baby, Clothing & Accessories, Toys & Games, Health & Personal Care, Books, etc.


This is so intuitive and useful, that the users hardly made much noise about this feature. They just embraced it. eCommerce sites have been very happy, since it helped users reach products much better - in the absence of search ranking systems like pagerank. It also contributed to a better browsing experience, and exposed more products at the store to users.

Wildly successful in eCommerce, faceted search was hitting hurdles in general web search. Traditional algorithms exploited the structured meta data available in product catalogs to provide good results. On the web, where unstructured data rules, it was a challenge to present information and pages available as well-organized categories. There have been several efforts to this, using different approaches. I'll use a sample queries like - michael jackson, and hurricane proof housing, to compare the results, and posit on the approaches taken by the different products.


  • Clusty, probably one of the first on the web that concentrated on exploratory search - organized pages into search results by clustering content inside them. They had the hard problem of naming a cluster, and mined for keywords to name them appropriately.
  • Kosmix, a hyper-aggregator for the web, classifies and organizes content around popular queries. Their recall is quite low, and performance drops when the user tries tail queries.
  • Cuil, uses wikipedia for extensive query analysis, matches other concepts related to the query to content available in search results and presents facets. Since they start with the query, and not the content, they work well when the query alternatives match the concepts available in the content. If the match is not strong, there's a topic drift. There also seem to be issues in allotting weightage between query terms.
  • Nuggetize, a learning and discovery engine, that presents information present in the web for a topic as nuggets - uses wikipedia's ontology and classifies nuggets into categories dynamically. Nuggetize's facets lead to nuggets, and not pages - since there can be facts along several dimensions in a given document. The dynamic categorization is also driven by the content and not the query.

Here are comparative results. You are invited to try more among these, or suggest other faceted search products on the web.

Michael Jackson:

i) Clusty


ii) Cuil:


iii) Kosmix:


iv) Nuggetize:


Hurricane proof housing:

i) Clusty:


ii) Cuil:

iii) Kosmix: (no results)

iv) Nuggetize:


Do you find the results from Nuggetize more contextual and appropriate? The main reason for this is that Nuggetize relies on the page content, and marries concepts present in the pages onto the wikipedia ontology. From there, a proprietary tree aggregation algorithm figures out the most relevant and informative categorization to display. The dynamic categorization does not just stop there. You can click on any category, or any topic, or even a search within these nuggets, and can see the categories change!

Notice the categories returned when we drill down on the category, "Weather hazards", for hurricane proof housing.

Next, the categories when I drill down along "Design".

The categories you see are entirely decided by the context of results (or sub-results).

Faceted search promises to be a great way of managing overload on the web, and expose multi-points of view, or dimensions of facts to users. Getting to those facts by opening links one after the other takes up considerable amount of time.

Here's another example: for the query, Yosemite winter activities.


Notice how Skiing, Snow, Travel and Roads come up. A winter tourist to Yosemite is obviously very interested in knowing which roads are open and which are closed.

Now contrast this to the categories that come up for Yosemite summer activities

Do you see how Locomotion, Exercise and Wilderness (involving rock climbing, swimming, running) show up? And skiing is pushed down? The content has this information, and the dynamic categorization gets its pulse!

So what is the catch? The catch is that this is dependent on the wikipedia ontology, and can only be as good as that. The wikipedia ontology also has quite a bit of noise, and needs a lot of pruning before it can be used for such purposes. But as far as we have people committing extra-ordinary work to wikipedia, Information Retrieval scientists can get relevance to users on the web. Did you donate to wikipedia at the end of the year? ;)

Tuesday, April 27, 2010

Thank you, Google and Facebook, we'll take it from here

At this stage in the web's magnificent story, getting and staying informed on the web has splintered into many products, and processes. Search engines match a user's intent with content. Social bookmarking tools allow users to tag and share their information consumption experience. Social networks have taken over the dissemination of interesting and relevant information. This has resulted in content consumption islands. Mapping user intent to socially situated, relevant, interesting and experiential content seems like Utopia. We'll reach there when people, situated in social networks, actively map their search intent to their consumption experience, and expose it for others' benefit.

Ideas around this notion are surfacing through different names, with slight variations - some call it the a database of intentions, some call it content curation. I believe that for the web to continue to be a valuable and smooth experience, we the people - embedded in our networks, have to actively begin mapping intent to content. We thank Google for an excellent search service, and Facebook for mapping our network. We should use Google to create an annotated and filtered web of our intent for information, mapped to good content, and embed it into our networks on Facebook or Twitter.

Let me introduce this using an example:

Lets consider an intent among us to learn about the foods that affect moods, and how they do it.

i) Matching intent to content: Using a search engine, we see:


ii) Documenting information consumption: However, several hundreds of people have looked around for this topic, and have bookmarked useful links already on Delicious.


iii) Dissemination through social networks: A casual reader,  who discovers an interesting article on this topic, may share with his social network:

Now compare the above to a carefully chosen list of nuggets and links, I found relevant on this topic:


From a set of pages, I chose some useful nuggets that introduced many facts on the topic, and collated them together to make a report. The nuggets make it easier for readers to judge the quality of the content in the links.

Further, I can share this filtered list of documents - a jist of my information consumption experience, to the public world, or in my social network as Foods affecting moods.

In this exercise, I mapped an intent, "foods affecting moods", to a list of documents, that have funneled through my experience. The mapping is then shared to my social network. This process is different and adds value to any of the experiences we normally go through:

i) When we use a search engine, we rely on the search engine's ranking, and the snippets it provides.
ii) We bookmark only individual pages on Delicious, do not document our intent (tags are too fine-grained), and do not attach good snippets. [Some social bookmarking tools like Diigo do allow selection of snippets from the pages].
iii) We share individual links on Twitter and Facebook, not a chosen lists of documents that are mapped to an intent.

Can we get to an intent-content web that looks like:



Each user (a node in this graph) publishes a set of chosen links and nuggets for his search intentions (shown as blips). We'll see such intent-content mappings proliferate through social networks when they become easy to create. The next time you want to know something, the best human chosen content will be right in your neighborhood.

Intent, and not content drives the web these days

The web has a tendency to bring about massive changes in culture, in rather subtle ways. One thing leads to another, one system exploits another, and very soon we are going down a path not comprehended. Here are some of my thoughts on the cause-effect cycles in the content-web:

  1. Hyperlinking makes it easy for content to be discovered and browsed.
  2. Content authoring takes off. Web sites are created that have hyperlinked content.
  3. Directories come up and organize the web's content into categories - to make the content explosion tractable.
  4. Search engines exploit the link structure, and bring relevance to search.
  5. Directories are not relevant any more. People use search engines for everything.
  6. Blogging software make content creation go through the roof.
  7. Spammers understand how search engines work, know most people only use search engines, and make use of blogging software to game them.
  8. Search engines perennially fight spam.
  9. Content advertising models provide payoffs for people with good content, and traffic.
  10. Spammers and marketers discover that writing "content" is the best way to attract search traffic.
  11. New businesses come up that analyse what people are searching for (intent), and write content to suit that. These are called content farms. Demand media is the biggest example. Here's a piece from Time on this phenomenon. I collated a reasonably comprehensive list of opinions on the topic of content farms.

We are living in a time when companies study our intent, write content tailor made to that, and then make money through ads. While this may not be as bad as it sounds, they are working symbiotically with search engines, and may often compromise on the quality of content that matches user intent. They mostly just do enough to get on top of the search engine listing, and get your click. This spiral will make search engines weaker, and affect our information consumption experience - unless we take it from here on our own!

The importance of us being good hubs

The amount of time and skill people have to write fantastic content probably follows a skewed normal distribution. This means that most of us are average writers. There are a few fantastic writers. There are plenty of bad writers. The fantastic writers command a huge readership. Page quality on the web will follow an even more skewed distribution - tending towards the power law. There are a few fantastically written pages. Many pages are average in their content. There are plenty of useless pages.

In stark contrast to our producing ability, most of us are excellent consumers - and have a great eye at identifying quality. If you realize that you spend more time consuming than producing, then it follows that you are a better consumer than a producer. Unfortunately, this ability has not left its footprint on the web. As discerning consumers, with the ability to identify good from bad, we have to document our consumption experience - and map intent to good content.

Recent trends in Web 2.0 are clearly directed at gathering more data from consumption experience. This started out with rating systems, and now we even have products that document our geographic visits. Facebook's recent announcements convert the whole web to be "likeable". We need to extend this to the intent-content mapping.

A good hub is one that points to good content. Being good hubs is just a natural extension of us putting our best skills (of consuming) to use!

Convinced? Want to get started with the intent-content web?

Mass adoption of an idea depends on extremely easy tools that facilitate its spread. We saw an explosion of content on the web when blogging tools took content authoring away from web masters, to common people. Good content started getting disseminated when social networks facilitated an organized mechanism for spreading. Users started telling their preferences when web sites made it easy to gather public opinions. The intent-content web is also looking for that spark in a tool.

I'd like to talk about Diigo, and Nuggetize (something I've been working on myself). These tools follow different approaches, but can contribute to the intent-content web.
  1. Diigo: Diigo is probably the most well-featured bookmarking tool available on the web. After installing their toolbar, you can not only bookmark and tag web sites, but also select snippets inside them that you found relevant. Diigo organizes these quite well into a library. Further, you can create lists - and group related links into a list. If you map your intent to a list, then all the useful content can be populated into it. You can share these lists with your networks, and help others benefit from your consumption experience. Some examples of Diigo lists are:
  2. Nuggetize: Nuggetize is designed to ease the process of choosing the right set of documents to read. Nuggetize mines interesting nuggets from web pages that match user intent, and organizes them into appropriate categories dynamically. It learns from user preferences and creates an intent-content report. A user can start with a query, and end with a list of chosen nuggets that lead to useful content on the topic.  Nugget reports can easily be published into a social network, or incorporated into any blog. Some examples of Nugget reports are:

Tuesday, April 6, 2010

The cause and effect of short attention spans

  • 80% of the readers of this article will not read further than 3 sentences! They'll quickly scroll through the page, look for interesting pictures, and then move on.
  • Have you noticed that you normally spend more time on a Wikipedia page than other pages - even if they had the same content?
  • Do you find it painful to read through the snippets and page titles beyond the top 3 search results?
  • Do you recall the earlier days of the Internet, when you started with a directory, and spent hours clicking through the listings, reading through the articles, digesting the information available?
  • What happened? Were you not able to read a book for hours? Were you not able to sit in examination halls for 2 hours or more? Why and what in the Internet changed you?

The attention span of the average Internet user is now on the order of nine seconds per page. An entire industry of Search Engine Optimization thrives to get pages within the top 3 results in a search engine - lest the content gets lost in oblivion. Information architects suggest page layouts and designs that consider visitors with short attention span, and try to trap them to stay longer.

Comprehensive usability research by Weinrich et. al. suggests:

"Our results confirm that browsing is a rapidly interactive activity. Even new pages with plentiful information and many links are regularly viewed only for a brief period – an interesting background for Web designers, who could focus on offering concise pages that load fast. The analysis of link click positions shows that users scroll regularly – even on navigation pages. Still, about 45% of selected links reside in the upper left quarter of the browser window."

Why?

A short answer, for the attention span challenged.

Search. Information overload. No quality guarantees. Caution. Skim reading. Short attention spans.

For the few others:

Search: Before search engines took over our lives, the only way we could remember extremely good resources on the web was to list them out on our home pages, or on directory listings. Thus a mass movement of listings in directories started. Personal home pages had lists of good sites. Netscape started its directory service (which became the Open Directory Project). Yahoo published and curated a set of links about everything. In such a world, being linked to was great. If a page was linked to by several pages, it was surely worth something. Search engines exploited this - first by simple in-degree counts, then by hubs and authorities, followed by page rank. A search engine made it easy to try out some keywords, and quickly browse to a page. If that was not the page you wanted, you'd come back, move to the next link, try something else and test your luck. Gone was paradigm of contextual recommendations the listings had. People were on their own - and had to determine a link's relevance on their own. Search engines took over the world so much that people stopped creating those wonderful lists. This implied a degradation in quality of the search results. This implied a higher responsibility on the web searcher to choose what's relevant.

Information overload: With a search engine, any page is just a search away. (especially if some one types that unique set of words which will qualify any page). This suddenly provided a great incentive for people to publish more content, especially with blogging software around. More content came through. More results appeared on search engines. The web searcher had to choose what's good for him.

No quality guarantees: Every page comes with its own layout, content, writing style, and information. Unlike your schooling days, when you'd be recommended a text book, and you knew to expect some standard, the web is full of free information, that makes no quality assurances. With in months of using a search engine, we all realized that we have to put in the due diligence and evaluate individual pages.

Caution: Web pages are unlike books. They have ads, navigation links, logos, banners, and pictures, with text left to fill up the empty spaces. It takes quite some visual effort to identify this area in each page, and then to judge the content for its utility. Hence, when users find an unfamiliar page, they get cautious! Slowly, users have started training themselves to figure ways to identify a page with worthy content. Wikipedia. New York Times. BBC. About.com. Not only can I expect some basic standard in the content here, I am also familiar with the visual layouts of those sites - having been a returning visitor. For everything else, I do skim reading.

Skim reading: Quoting wikipedia, "Skimming is a process of speed reading that involves visually searching the sentences of a page for clues to meaning. For some people, this comes naturally, and usually can not be acquired by practice. Skimming is usually seen more in adults than in children. It is conducted at a higher rate (700 words per minute and above) than normal reading for comprehension (around 200-230 wpm), and results in lower comprehension rates, especially with information-rich reading material.
Another form of skimming is that commonly employed by readers on the Web. This involves skipping over text that is less interesting or relevant. This form of reading is not new but has become increasingly prevalent due to the ease with which alternative information can be accessed online."

Short attention spans: The shortened attention span on the Internet is not a disorder, but a human reaction to the abundance of low quality content online, which requires weeding and mining. However, there's a danger of continued short attention spans crossing over to real life.