Using the Bing API to Send Pings to Ping-O-Matic

bing basic search before

Those who use WordPress may be familiar with Ping-O-Matic for keeping blog search engines updated whenever a new post is written.

I discovered that Bing had more of my posts indexed than any other search engine. I could manually submit the link for each post to Ping-O-Matic, but that would be no fun. Instead, I decided to automate the process.

The Bing Search API is free for use up to 5000 queries per month. All you need is a Microsoft account to get a key from the Microsoft Azure Marketplace.

Once you have the key, you can set up two files: the first is called bing_basic.html, and the second is bing_basic.php. These files are a mashup from the Bing Search API guide and a function created for sending pings to Ping-O-Matic.

bing_basic.html:

<html>
	<head>
		<title>Bing Search Tester (Basic)</title>
		<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
	</head>
	<body>
		<h1>Bing Search Tester (Basic)</h1>
		<form method="POST" action="bing_basic.php">
			<label for="service_op">Service Operation</label>
			<input name="service_op" type="radio" value="Web" CHECKED /> Web <input name="service_op" type="radio" value="Image" /> Image 
			<label for="query">Query</label>
			<input name="query" type="text" size="60" maxlength="60" value="" />
			
			<input name="bt_search" type="submit" value="Search" />
		</form>
		<h2>Results</h2> {RESULTS}
	</body>
</html>

bing_basic.php:

<?php

/*
--------------------------------------------
 $title contains the title of the page you're sending
 $url is the url of the page
 $debug true print out the debug and show xml call and answer
--------------------------------------------
 the output is an array with two elements:
 status: ok / ko
 msg: the text response from pingomatic
--------------------------------------------
*/
function pingomatic($title,$url,$debug=false) {
    $content='<?xml version="1.0"?>'.
        '<methodCall>'.
        ' <methodName>weblogUpdates.ping</methodName>'.
        '  <params>'.
        '   <param>'.
        '    <value>'.$title.'</value>'.
        '   </param>'.
        '  <param>'.
        '   <value>'.$url.'</value>'.
        '  </param>'.
        ' </params>'.
        '</methodCall>';

    $headers="POST / HTTP/1.0\r\n".
    "User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1) Gecko/20090624 Firefox/3.5 (.NET CLR 3.5.30729)\r\n".
    "Host: rpc.pingomatic.com\r\n".
    "Content-Type: text/xml\r\n".
    "Content-length: ".strlen($content);

    if ($debug) nl2br($headers);

    $request=$headers."\r\n\r\n".$content;
    $response = "";
    $fs=fsockopen('rpc.pingomatic.com',80, $errno, $errstr);
    if ($fs) {
        fwrite ($fs, $request);
        while (!feof($fs)) $response .= fgets($fs);
        if ($debug) echo "<xmp>".$response."</xmp>";
        fclose ($fs);
        preg_match_all("/<(name|value|boolean|string)>(.*)<\/(name|value|boolean|string)>/U",$response,$ar, PREG_PATTERN_ORDER);
        for($i=0;$i<count($ar[2]);$i++) $ar[2][$i]= strip_tags($ar[2][$i]);
        return array('status'=> ( $ar[2][1]==1 ? 'ko' : 'ok' ), 'msg'=>$ar[2][3] );
    } else {
        if ($debug) echo "<xmp>".$errstr." (".$errno.")</xmp>";
        return array('status'=>'ko', 'msg'=>$errstr." (".$errno.")");
    }
}

/****
* Simple PHP application for using the Bing Search API
*/
$acctKey = 'YourAccountKey';
$rootUri = 'https://api.datamarket.azure.com/Bing/Search';

// Read the contents of the .html file into a string.
$contents = file_get_contents('bing_basic.html');
if ($_POST['query'])
{
// Here is where you'll process the query.
// The rest of the code samples in this tutorial are inside this conditional block.
// Encode the query and the single quotes that must surround it.
$query = urlencode("'{$_POST['query']}'");
// Get the selected service operation (Web or Image).
$serviceOp = $_POST['service_op'];
// Construct the full URI for the query.
$requestUri = "$rootUri/$serviceOp?\$format=json&Query=$query";
// Encode the credentials and create the stream context.
$auth = base64_encode("$acctKey:$acctKey");
$data = array(
'http' => array(
'request_fulluri' => true,
// ignore_errors can help debug – remove for production. This option added in PHP 5.2.10
'ignore_errors' => true,
'header' => "Authorization: Basic $auth")
);
$context = stream_context_create($data);
// Get the response from Bing.
$response = file_get_contents($requestUri, 0, $context);
// Decode the response.
$jsonObj = json_decode($response);
$resultStr = '';
// Parse each result according to its metadata type.
foreach($jsonObj->d->results as $value) {
	$pingresults = pingomatic($value->Url,$value->Description);
	//$arrlength = count($pingresults);
	//$resultStr .= $arrlength . "
\n";
	//for($x = 0; $x < $arrlength; $x++) {
	$resultStr .= $pingresults['status'] . "
\n";
	$resultStr .= $pingresults['msg'] . "
\n";
	//}
	switch ($value->__metadata->type) {
		case 'WebResult':
			$resultStr .=
				"<a href=\"{$value->Url}\">{$value->Title}</a>
{$value->Description}
";
				 break;
				 case 'ImageResult':
				 	$resultStr .=
				 		"<h4>{$value->Title} ({$value->Width}x{$value->Height}) " . "{$value->FileSize} bytes)</h4>" . "<a href=\"{$value->MediaUrl}\">" . "<img src=\"{$value->Thumbnail->MediaUrl}\"></a>
";
				 		 break;
				 	}
				 }
// Substitute the results placeholder. Ready to go.
$contents = str_replace('{RESULTS}', $resultStr, $contents);
}
echo $contents;
?>

Put both files into the same folder on your website. When you open bing_basic.php in your browser, it should look something like this:

bing basic search before

In my case, I wanted to get all the pages that Bing had indexed. I entered “site:deepinthecode.com” into the Query box and clicked Search.

After a moment, the page refreshed with the following results:

bing basic after searching

The “ok” status shows that the ping was received properly and the line below shows the status message. Following those lines are the hyperlinked post titles and the beginning of each post.

Had the ping not been received, the status would be “ko”, and the message would (hopefully) be descriptive of why the ping did not take.

The function in the PHP file can be modified for use with APIs other than the one for Ping-O-Matic, though I haven’t had time to make any changes there yet to see how well it will work with other sites.

Using PHP to Search for Text in Website Source Code

HTML5 validation error on Thesis 2.2.1

As I’ve mentioned before, I use the Thesis premium theme on my WordPress site, and I generally have no problems at all. However, the newest Thesis update came out, and I am getting an HTML5 validation error.

Usually when this sort of thing happens, whether it be with a theme or with a plugin, I’ll try to fix what is causing the error and then report the fix to the author. The validation error I am getting is below.

HTML5 validation error on Thesis 2.2.1
The validation error and the rendered HTML.

The Thesis codebase is fairly complicated and is not easy to decipher if you’ve never seen it before. Even though I’d hacked on it a few times before, I’d never come across the code that generated this bit of HTML.

<style type="text/css">
#thesis_launcher { position: fixed; bottom: 0; left: 0; font: bold 16px/1em "Helvetica Neue", Helvetica, Arial, sans-serif; padding: 12px; text-align: center; color: #fff; background: rgba(0,0,0,0.5); text-shadow: 0 1px 1px rgba(0,0,0,0.75);
#thesis_launcher input { font-size: 16px; margin-top: 6px; -webkit-appearance: none;
</style>

My blog runs on a shared server, and I don’t have SSH enabled currently, so there was no way I could use grep to search for the text, and the cPanel search utility only looks at filenames.

After some searching, I found an article that had code for finding filenames in all subfolders from a path on your site. This code would not search the text itself, but would allow for recursive folder searching.

function rsearch($folder, $pattern) {
    $dir = new RecursiveDirectoryIterator($folder);
    $ite = new RecursiveIteratorIterator($dir);
    $files = new RegexIterator($ite, $pattern, RegexIterator::GET_MATCH);
    $fileList = array();
    foreach($files as $file) {
        $fileList = array_merge($fileList, $file);
    }
    return $fileList;
}

Also, I was able to find another post that explained text searching in a file.

$path_to_check = '';
$needle = 'match';

foreach(glob($path_to_check.'*.txt') as $filename)
{
  foreach(file($filename) as $fli=>$fl)
  {
    if(strpos($fl, $needle)!==false)
    {
      echo $filename.' on line '.($fli+1).': '.$fl;
    }
  }
}

By combining and modifying these, I was able to put together a relatively simple file that will search through all files matching a pattern (in this case, PHP files) and printing instances of the text that contains the search term.

$path_to_check = "(your folder)";
$pattern = "/.*php/";
$needle = $_GET['needle'];

function rsearch($folder, $pattern, $needle) {
    $dir = new RecursiveDirectoryIterator($folder);
    $ite = new RecursiveIteratorIterator($dir);
    $files = new RegexIterator($ite, $pattern, RegexIterator::GET_MATCH);
    //$fileList = array();
    foreach($files as $file) {
        //$fileList = array_merge($fileList, $file);
        foreach($file as $filename) {
           foreach (file($filename) as $fli=>$fl) {
               //echo $filename."

\n\n";
               if(strpos($fl, $needle)!==false) {
	           echo $filename.' on line '.($fli+1).': '.$fl."

\n\n";
               }
           }
        }

    }
    //return $fileList;
    return 0;
}

//var_dump(rsearch($path_to_check,$pattern,$needle));

if (strlen($needle) > 0) {
    rsearch($path_to_check,$pattern,$needle);
}
echo "Search complete.";

The search term currently is entered using the querystring (such as search.php?needle=yoursearchterm), and the path is currently hard coded. The pattern uses a regular expression. I did find that this has the potential to use all of your allotted memory, so use it sparingly. Also, don’t leave this on your site in PHP form, but rename to TXT when not in use so that no one can use it without your knowledge – it could be used to find passwords for databases and other sensitive information.

Incidentally, I did find the code that generates the CSS above; it’s in the wp-content/themes/thesis/lib/core/skin.php file:

echo
	"<style type=\"text/css\">\n",
	"#thesis_launcher { position: fixed; bottom: 0; $position: 0; font: bold 16px/1em \"Helvetica Neue\", Helvetica, Arial, sans-serif; padding: 12px; text-align: center; color: #fff; background: rgba(0,0,0,0.5); text-shadow: 0 1px 1px rgba(0,0,0,0.75); }\n",
	"#thesis_launcher input { font-size: 16px; margin-top: 6px; -webkit-appearance: none; }\n",
	"</style>\n";

Due to the amount of time it would take for me to suss out how to move this into the head without breaking the site, I’m just going to report this one. It should be fixed in the next minor release.

Which is More Important: Newest Data, or Fastest Load Time?

YSlow Chrome Extension

Happy New Year!

As we developers often do, I have recently looked at some code that had aged a bit – more like vinegar and less like wine – and thought, “Did I write that? What was I thinking?” Constantly becoming aware of and implementing best practices, avoiding bad practices, and recovering from failures are all part of growing in many careers, and most certainly must be a part of getting better at software development.

Many of my posts have dealt with Web scraping and using APIs to display profile data from various sites. One question that I’ve never answered is whether or not the data needs to be real-time, or if it can be delayed a bit. The benefits of real-time data are obvious, but the drawbacks can be considerable.

For example, on this blog: until recently, every time the blog loaded, the profile data from several sites was retrieved from these sites on each request. This caused the page to load slowly (aside from my other performance issues), and it’s entirely possible that these other sites would have stopped allowing me to query them, had my traffic been very heavy.

Since it’s far from critical that this profile data be less than a day old, and since I would like my blog to load as quickly as possible, I decided to refactor and then enhance the PHP code I had written to pull this profile data. The new code would store the profile data on my site, and reload the data only if it was older than a given number of days.

YSlow Chrome Extension
Use YSlow to diagnose performance issues on your website.

This data could be stored either as text in a file, or as a value in a database. In this case, I stored it in a text file.

/* initialize variables */
$filename = "whatever.txt";
$html = '';
$norefresh = FALSE;
$days = 1;

/* checks to see if file exists and is current */
if (file_exists($filename)) {
    $stats = stat($filename);
    /* 86400 seconds in one day */
    if ($stats[9] > (time() - (86400 * $days))) {
    	$norefresh = TRUE;
    }
}

/* if $norefresh is still FALSE, file will be created or updated; otherwise, it will be loaded */
if ($norefresh) {
    $html = file_get_contents($filename);
} else {
    /* do whatever needs to be done to build the $html variable */
    // ...
    // ...
    // ...

    /* put the $html value into the file */
    file_put_contents($filename, $html);
}

/* display the $html variable contents */
echo $html;

The above code will check to see if a file with the expected data exists, and if so, whether it is new enough – in this case, less than one day old. If not, the data is retrieved and stored in a file for future use. Lastly, the data – whether cached or newly retrieved – is displayed.