<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>Moreover Technologies - Company Blog &#187; aggregation</title>
	<atom:link href="http://blog.moreover.com/tag/aggregation/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.moreover.com</link>
	<description>Premier Purveyor of Real-Time Web Services</description>
	<lastBuildDate>Thu, 26 Nov 2009 14:50:25 +0000</lastBuildDate>
	<generator>http://wordpress.com/</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<cloud domain='blog.moreover.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://www.gravatar.com/blavatar/ccd640139219b25688e060ce1e341d6b?s=96&#038;d=http://s.wordpress.com/i/buttonw-com.png</url>
		<title>Moreover Technologies - Company Blog &#187; aggregation</title>
		<link>http://blog.moreover.com</link>
	</image>
			<item>
		<title>New language auto-detection over Blogs</title>
		<link>http://blog.moreover.com/2009/02/19/new-auto-language-detection-over-blogs/</link>
		<comments>http://blog.moreover.com/2009/02/19/new-auto-language-detection-over-blogs/#comments</comments>
		<pubDate>Thu, 19 Feb 2009 17:27:51 +0000</pubDate>
		<dc:creator>brianmackie</dc:creator>
				<category><![CDATA[aggregation services]]></category>
		<category><![CDATA[aggregator]]></category>
		<category><![CDATA[blogs aggregation]]></category>
		<category><![CDATA[search engine products]]></category>
		<category><![CDATA[social media]]></category>
		<category><![CDATA[aggregation]]></category>
		<category><![CDATA[blog languages]]></category>
		<category><![CDATA[blogs]]></category>
		<category><![CDATA[XML]]></category>

		<guid isPermaLink="false">http://blog.moreover.com/?p=241</guid>
		<description><![CDATA[We are pleased to announce the upcoming launch of improved language detection for blogs in the UGC Metabase in two weeks. We&#8217;re also introducing new blog lists sorted by language, so you can see all the English, French, German, Chinese blogs, etc, in our index.
And we&#8217;re adding a new date field, showing the time we indexed a particular post. This [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.moreover.com&blog=5109471&post=241&subd=moreoverblog&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<div class='snap_preview'><br /><p>We are pleased to announce the upcoming launch of improved language detection for blogs in the UGC Metabase in two weeks. We&#8217;re also introducing new blog lists sorted by language, so you can see all the English, French, German, Chinese blogs, etc, in our index.</p>
<p>And we&#8217;re adding a new date field, showing the time we indexed a particular post. This is in addition to the publish date already provided, as copied from the original XML/RSS feed.</p>
<p> </p>
<p><strong><span style="font-size:small;">1. Improved language detection at post level</span></strong> </p>
<p>Blog feeds normally state which language they are in. However, this isn&#8217;t always reliable &#8211; typically blog publishing platforms have a default language setting, and bloggers do not always update their blogs to give their local language. The result is a significant portion of blog feeds with the wrong language. </p>
<p>We&#8217;ve been working hard in the background to produce a more reliable approach to language detection. We&#8217;ll be rolling this out next month as the basis for setting the post&#8217;s language, as provided in the <strong>&lt;language&gt;</strong> tag. Only when this approach is unable to confidently determine the language, will we revert to using the language tag provided in the original XML as fallback.</p>
<p> </p>
<p><strong><span style="font-size:small;">2. New language tagging at feed level</span></strong></p>
<p> Further to this, we are adding a new <strong>&lt;feedLanguage&gt;</strong> tag, showing the language of the blog <em>feed</em>. This is in addition to the existing <strong>&lt;language&gt;</strong> tag referred to above, which is at <em>post</em> level. </p>
<p>Adding language categorisation at feed level makes it possible to better organise the index by language &#8211; for example we can identify exactly which blogs are in French, which are in English, etc, and provide and manage these in lists.</p>
<p>The new language tag will appear in the UGC XML as follows</p>
<blockquote><p><span style="color:#0000ff;"><span style="color:#000000;">&lt;feedLink&gt;http://blog.moreover.com/feed/&lt;/feedLink&gt; </span><br />
</span><span style="color:#3333ff;"><strong>&lt;feedLanguage&gt;English&lt;/feedLanguage&gt;</strong></span><br />
<span style="color:#000000;">&lt;generator&gt;<span class="tx">http://wordpress.org/?v=MU</span>&lt;/generator&gt;</span></p></blockquote>
<p> </p>
<p><strong><span style="font-size:small;">3. Introducing a new Harvest Date field</span></strong></p>
<p>Lastly, we&#8217;re adding a new <strong>&lt;itemHarvestDate&gt;</strong> field to the feed. This gives the time Moreover actually indexed the item. We already pass on the publish date of the post, as provided in the original XML/RSS feed &#8212; The new index time complements this tag and can provide, for example, additional information about the latency of indexing as it occurs across the feeds.</p>
<p>The new harvest date tag will appear in the UGC XML as follows:</p>
<blockquote><p><span class="135592920-12022009">&lt;pubDate&gt;2009-02-11 14:26:06.0&lt;/pubDate&gt;<br />
</span><strong><span style="color:#0000ff;">&lt;itemHarvestDate&gt;2009-03-13 18:38:21.0&lt;/itemHarvestDate&gt;</span></strong><br />
<span style="color:#000000;">&lt;validDate&gt;2009-03-13 18:37:18.0&lt;/validDate&gt;</span></p></blockquote>
<p>All times are shown in GMT.</p>
<p> </p>
<p><em>We believe in being open and transparent about our crawling performance, and are confident about our technology. We invite comparison with other, similar services (for example, see <a href="http://www.readwriteweb.com/archives/technorati_retiring_old_crawle.php" target="_self">Technorati and a recent comment on ReadWriteWeb</a></em><em>), and welcome any feedback you, as customers and users, have.</em></p>
<p>.</p>
Posted in aggregation services, aggregator, blogs aggregation, search engine products, social media Tagged: aggregation, blog languages, blogs, XML <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/moreoverblog.wordpress.com/241/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/moreoverblog.wordpress.com/241/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/moreoverblog.wordpress.com/241/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/moreoverblog.wordpress.com/241/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/moreoverblog.wordpress.com/241/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/moreoverblog.wordpress.com/241/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/moreoverblog.wordpress.com/241/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/moreoverblog.wordpress.com/241/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/moreoverblog.wordpress.com/241/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/moreoverblog.wordpress.com/241/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.moreover.com&blog=5109471&post=241&subd=moreoverblog&ref=&feed=1" /></div>]]></content:encoded>
			<wfw:commentRss>http://blog.moreover.com/2009/02/19/new-auto-language-detection-over-blogs/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="" medium="image">
			<media:title type="html">mackieb</media:title>
		</media:content>
	</item>
	</channel>
</rss>