How I created my new subscription list: Part 3

I really didn't mean for this to take 3 posts ... but Live Writer wants it to be I guess ... I think I need to download a newer version.

 

The last entry left off 2 pieces: Creating and saving the opml object hierarchy and saving the not found list to an html file.

In order to get the opml to save, I first needed to creat the object, its hierarchy and add the list of found entries to it - then save it out to a new file.  Here is that code:

        private static void SaveSubscriptions()

        {

            Opml o = new Opml();

            o.head = new Head();

            Body b = new Body();

            b.outline = _subscriptions;

            o.body = b;

 

            using (FileStream fs = new FileStream(@"c:\_projects\subscriptions.opml", FileMode.Create))

            {

                XmlSerializer serializer = new XmlSerializer(typeof(Opml), "");

                serializer.Serialize(fs, o);

            }

 

            string[] notfound = new string[_notFound.Count];

            _notFound.CopyTo(notfound);

 

            File.WriteAllLines(@"c:\_projects\notfound.txt", notfound);

 

            GetOriginalLinkHtml(_notFound);

        }

And here is the GetOriginalLinkHtml() function that I editted a few times and ended up with a function that doesn't just save the not found items, but also queries the database to get the original link text that I had put in a past entry.

        private static void GetOriginalLinkHtml(List<string> urls)

        {

            int currentCallIndex = 0;

            string sqlText = @"select dbo.RegExCompiledMatchText('(<a\s+href=""http://{0}(.*?)"">\s*((\n|.)+?)\s*</a>)', Text, 1) as Link from blog_Content where Title like '%Interesting%' and len(dbo.RegExCompiledMatchText('(<a\s+href=""http://{0}(.*?)"">\s*((\n|.)+?)\s*</a>)', Text, 1) ) > 0";

            string linksHtml = string.Empty;

            Console.WriteLine("Total of " + urls.Count);

 

            File.AppendAllText(@"c:\_projects\notfound.html", "<html><body>");

 

            using (SqlConnection conn = new SqlConnection())

            {

                conn.ConnectionString = @"Server=...;Database=blog;Integrated Security=SSPI;";

                conn.Open();

 

                // loop throug the urls list

                foreach (string url in urls)

                {

                    SqlCommand cmd = new SqlCommand();

                    cmd.Connection = conn;

                    cmd.CommandType = CommandType.Text;

                    cmd.CommandText = string.Format(sqlText, url);

 

                    Console.WriteLine("Looking up entry " + currentCallIndex.ToString() + " " + url);

 

                    SqlDataReader reader = null;

                    try

                    {

                        reader = cmd.ExecuteReader();

                        while (reader.Read())

                        {

                            linksHtml = reader.GetString(0) + "<br/>";

                        }

                    }

                    catch (SqlException e)

                    {

                        Console.WriteLine("Error looking up " + url);

                    }

                    finally

                    {

                        reader.Close();

                    }

                    currentCallIndex++;

 

                    // Since so slow, go ahead and write to file

                    File.AppendAllText(@"c:\_projects\notfound.html", linksHtml);

                }

            }

            File.AppendAllText(@"c:\_projects\notfound.html", "</body></html>");

        }

As I'm sure you can guess, the GetOriginalLinkHtml() function is not the fastest function in the world.  I originally tried to do it all in memory, but quickly (or maybe not so quickly) found that to be a problem on my laptop - so I moved it to this more io intensive manner of accomplishing the same thing.

That is it ... all in all it was an interesting excercise - and one that could be done at least 5 other ways (as with most things). 

posted on Saturday, January 20, 2007 6:00 PM

Feedback

No comments posted yet.

Post Comment

Title  
Name  
Url
Comment   
Please enter the following code into the box below to stop spammers

  
Enter Code Here *