Stripping HTML Tags from Textboxes Using JavaScript

One of the applications I support routinely has a problem where users cut and paste text that contains HTML tags from other sources into the text boxes on entry forms in my application. Most of the time, this does not cause a problem. However, if the users do not capture all of the text, they may unwittingly not grab the closing tags. When that happens, all sorts of fun things can happen to the main page on the app, which shows recent entries.

I decided I’d like to remove all HTML, except I’d like to retain line breaks. Based partially on a solution I found on Stack Overflow, I made this function.

this.fixHtml =
	function(strHtml)
	{
		var div = document.createElement('div');
		strHtml = strHtml.replace(/<br>/gi,'\n');
		strHtml = strHtml.replace(/<br \/>/gi,'\n');
		div.innerHTML = strHtml;
		strHtml = div.innerText || div.textContext;
		strHtml = strHtml.replace(/\n/gi,'<br />');
		div.innerHTML = strHtml;
		return div.innerHTML;
	};

I call the fixHtml function whenever an input tag is of type text, or if a div element is of the class “mimicTextArea” (it looks like a text box):

if(oForm!=null)
{
	var oInputs = oForm.getElementsByTagName("input");
	for(var x=0;x<oInputs.length;x++)
	{
		if(oInputs[x].type=="text" && oInputs[x].value.length > 0)
		{
			oInputs[x].value = self.fixHtml(oInputs[x].value);
		}
	}

	var oDivs = oForm.getElementsByTagName("div");
	for(var x=0;x<oDivs.length;x++)
	{
		if(oDivs[x].className == "mimicTextArea" && oDivs[x].innerHTML.length > 0)
		{
			oDivs[x].innerHTML = self.fixHtml(oDivs[x].innerHTML);
		}
	}

}

This seems to serve my purpose well. Should I wish to retain other specific HTML tags, a similar technique may be used as above if they are self-closing tags such as the line break tag. Tags that require an opening and a closing tag would require further development.

Leave a Reply