I really didn't mean for this to take 3 posts ... but Live Writer wants it to be I guess ... I think I need to download a newer version.
The last entry left off 2 pieces: Creating and saving the opml object hierarchy and saving the not found list to an html file.
In order to get the opml to save, I first needed to creat the object, its hierarchy and add the list of found entries to it - then save it out to a new file. Here is that code:
private static void SaveSubscriptions()
{
Opml o = new Opml();
o.head = new Head();
Body b = new Body();
b.outline = _subscriptions;
o.body = b;
using (FileStream fs = new FileStream(@"c:\_projects\subscriptions.opml", FileMode.Create))
{
XmlSerializer serializer = new XmlSerializer(typeof(Opml), "");
serializer.Serialize(fs, o);
}
string[] notfound = new string[_notFound.Count];
_notFound.CopyTo(notfound);
File.WriteAllLines(@"c:\_projects\notfound.txt", notfound);
GetOriginalLinkHtml(_notFound);
}
And here is the GetOriginalLinkHtml() function that I editted a few times and ended up with a function that doesn't just save the not found items, but also queries the database to get the original link text that I had put in a past entry.
private static void GetOriginalLinkHtml(List<string> urls)
{
int currentCallIndex = 0;
string sqlText = @"select dbo.RegExCompiledMatchText('(<a\s+href=""http://{0}(.*?)"">\s*((\n|.)+?)\s*</a>)', Text, 1) as Link from blog_Content where Title like '%Interesting%' and len(dbo.RegExCompiledMatchText('(<a\s+href=""http://{0}(.*?)"">\s*((\n|.)+?)\s*</a>)', Text, 1) ) > 0";
string linksHtml = string.Empty;
Console.WriteLine("Total of " + urls.Count);
File.AppendAllText(@"c:\_projects\notfound.html", "<html><body>");
using (SqlConnection conn = new SqlConnection())
{
conn.ConnectionString = @"Server=...;Database=blog;Integrated Security=SSPI;";
conn.Open();
// loop throug the urls list
foreach (string url in urls)
{
SqlCommand cmd = new SqlCommand();
cmd.Connection = conn;
cmd.CommandType = CommandType.Text;
cmd.CommandText = string.Format(sqlText, url);
Console.WriteLine("Looking up entry " + currentCallIndex.ToString() + " " + url);
SqlDataReader reader = null;
try
{
reader = cmd.ExecuteReader();
while (reader.Read())
{
linksHtml = reader.GetString(0) + "<br/>";
}
}
catch (SqlException e)
{
Console.WriteLine("Error looking up " + url);
}
finally
{
reader.Close();
}
currentCallIndex++;
// Since so slow, go ahead and write to file
File.AppendAllText(@"c:\_projects\notfound.html", linksHtml);
}
}
File.AppendAllText(@"c:\_projects\notfound.html", "</body></html>");
}
As I'm sure you can guess, the GetOriginalLinkHtml() function is not the fastest function in the world. I originally tried to do it all in memory, but quickly (or maybe not so quickly) found that to be a problem on my laptop - so I moved it to this more io intensive manner of accomplishing the same thing.
That is it ... all in all it was an interesting excercise - and one that could be done at least 5 other ways (as with most things).