Monday, April 28, 2008

HTML Comments and JavaScript

So lets say that you want to get at the comments within an HTML document. Why, you ask, would that be useful? Well for several reasons. Perhaps you have settings, variables, or other things within comments that you would want to gather for processing. Perhaps you have trained your website maintenance staff to put identification information (like the day the page was last updated) within comments.

But if you're a Dreamweaver user comments can contain things like where the start and end of template regions are. That's no small thing, so I want to say it again. Web pages built based on Dreamweaver templates will have little HTML comments in them that identify where the beginning and end of an editable region is -- like this:
<!-- InstanceBeginEditable name="PageContent" -->
<!-- InstanceEndEditable -->

These are just comments, just like any other comment. They don't impact that layout or rendering of the page in the web browser. They're just there to provide a little meta information.

Now lets say that you want to grab the headline of a page -- but it could be styled any number of ways. If the page is based on Dreamweaver templates one approach could be to grab the relevant editable region.

If you read the DOM documentation you'll discover that there are node types called comments. So you might think that you could follow the DOM tree and look for nodes with the right type.
node.type== Node.COMMENT_NODE

The problem is that support for this is spotty even in the best browsers. For the most part, it looks like comments are simply removed from the page before it is processed. You can see that there are no comments within the document's html code -- check your favorite browser.
document.body.outerHTML

Before we can even being to figure out how to read the comments, we have to find them. As it turns out you can do it by using document.documentElement. The complete pre-processed html for the page, including comments, can be found by looking at
document.documentElement.outerHTML

Now all you have to do is match the comment using a regular expression or a couple of indexOf statements. Whichever method you prefer or understand better. *
exp="<"+"!"+"--"+"([\\s\\S]+?)"+"--"+">"
rexComment=new RegExp(exp)

Moreover, the really big win is that you can identify data based on its placement within a template. That's really important. Templates are built to control the display of content within a website to make the pages consistent -- sure. But they allow flexibility so editors can change only the parts of the page that are relevant to theri work -- headlines and bylines and stories. So naturally, if we're interested in the parts of the page that are about the content, then we'll be interested in the stuff that's in the template regions.

This really expanded my way of thinking about templates. They're not just about pretty -- they're about data structure.

* A Note on "exp="
You'll notice that I broke the expression apart into pieces -- that's to avoid any browsers biting on the comment tag if it were all together as one piece. This
exp="<!--([\\s\\S]+)?-->"
is the same as this
exp="<"+"!"+"--"+"([\\s\\S]+?)"+"--"+">"

The Transition is On

OK, so after some pretty extensive discussions and testing I have made a decision not to host my own blog. The blog accounted for some of the largest amounts of traffic while I was regularly updating it. I plan to move the old blog entries -- at least the ones that are relevant to our ongoing discussion -- into the new blog. This will probably take some time and will have to wait until I complete my active development work.

I'm using Blogger now. I decided on that mostly because it integrates with Gmail, which I use for my personal email, and I can post via email from my iPhone on the road. This should make it easy for me to keep content flowing while keeping the tools required to a very minimal and highly portable set. 

And honestly, I like posting from my email -- even from my desktop. I guess I just grew up with email. It is just part of the way that I work and think. I feel pretty comfortable composing my thoughts and sending them off to the world. Emailing is already part of my workflow -- and takes much less effort that launching Dreamweaver and tossing off a page (yes, even with templates). 

I liked my old blogging method because it was a showcase for what you could do with RSS DreamFeeder. Both RSS and Atom feeds were generated from the pages I built in Dreamweaver. I am also a very firm believer in eating your own dog food -- that is, using the tools you build and sell to others for your own work as well. I often find some of the worst bugs myself while I'm using it. And I curse about it just as much as any customer might and perhaps even more so.

So making this switch was a tough choice, but the advantages of posting more will more than offset the drawback of using tools that are not of my own making (and selling). Though it will interest some of you to know that I am using RSS Replay (actually an early beta of version 2) to display this blog on the blog page at RNSoft.

On a programming note: I am going to be posting more of a mix of personal and business stuff here from now on. I tried two blogs and it just doesn't work for me. Many of the folks who read my blog are interested in both topics. And if you're not, well feel free to skip the personal stuff at any point. There are categories listed on this blog's main page (at the bottom-right) and if you click the category you'll see only content related to that topic. You can also subscribe to the category from that page.