Another Code School Profile Scraper Update

Merry Christmas!

This will be a short post – I guess Code School must make minor changes to their user profile code about this time each year, as it was just a year ago that I had to update this code.

Code School logo

For the codeschool.php file, change this:

/* disabling the anchor tags on each badge by changing to divs */
$full_content = str_replace("<a href","<div class",$full_content);
$full_content = str_replace("</a>","</div>",$full_content);

to this:

/* disabling the anchor tags on each badge by changing to divs */
$full_content = str_replace("<a rel=\"tooltip\" ","<div ",$full_content);
$full_content = str_replace("href=\"/learn","data-href=\"http://codeschool.com/learn",$full_content);
$full_content = str_replace("</a>","</div>",$full_content);

This will eliminate the links that would be intended to point at the Code School paths, but instead end up being broken links pointing at your own site.

Have a Happy New Year!

Using PHP to Scrape the Report Card from a DataCamp Profile

Data Camp logo

UPDATE: This does not currently work, as DataCamp has changed the structure of profile pages. I will revise this as soon as is feasible. (7/26/17)

Just as I have written scripts for displaying report cards for Code School, CodeEval, and Duolingo on my blog, I have written the below script for displaying a DataCamp profile.

DataCamp logo

datacamp.php:

<style>
#datacamp {
   border: 1px solid blue;
   text-align: center;
   vertical-align: middle;
   width: 100%;
}

#datacamp li {
   list-style-type: none;
}

.image-centered {
   display:block;
   margin-left: auto;
   margin-right: auto;
   margin-bottom: 10px;
}

.image-rounded {
    border-radius: 50%;
}

.progress-bar {
    position: relative;
    border: 1px solid #33aacc;
    width: 100%;
    height: 18px;
    margin-bottom: 1rem;
}

.progress-bar .inner {
   position: absolute;
   left: 0;
   top: 0;
   bottom: 0;
   background-color: #33aacc;
   min-width: 5px;
}

.wrapper-scores .container {
   min-width: 150px;
   max-width: 260px;
   width: 100%;
}

.course-block__completed, .course-block__certificate-download, .btn-linkedin-share,
.course-block__description, .course-block__author {
   display: none;
}

.col-sm-4 img {
   display: inline;
   padding: 1px;
}
</style>
<?php
    date_default_timezone_set('America/Los_Angeles');
    
    function getClass($classname, $htmltext)
    {
        $dom = new DOMDocument;
        $dom->loadHTML($htmltext);
        $xpath = new DOMXPath($dom);
        $results = $xpath->query("//*[@class='" . $classname . "']");
        return $results;
    }
    
    
    function buildContent($results)
    {
        $content = "";
        foreach ($results as $node) {
            $partial_content = innerHTML($node);
            $content = $content . $partial_content;
        }
        return $content;
    }
    
    
    /* this function preserves the inner content of the scraped element. 
    ** http://stackoverflow.com/questions/5349310/how-to-scrape-web-page-data-without-losing-tags
    ** So be sure to go and give that post an uptick too 🙂
    **/
    function innerHTML(DOMNode $node)
    {
      $doc = new DOMDocument();
      foreach ($node->childNodes as $child) {
        $doc->appendChild($doc->importNode($child, true));
      }
      return $doc->saveHTML();
    }
    
    $profilename = $_GET['nick'];
    if (strlen($profilename) == 0)
	exit(1);
    $profile_url =  'https://www.datacamp.com/profile/' . $profilename . '/';
    $filename = "datacamp_" . $profilename . ".txt";
    $full_content = '';   
    $norefresh = FALSE;
    $days = 1;
    $updated = 'no date';
    
    /* checks to see if file exists and is current */
    if (file_exists($filename)) {
	    $stats = stat($filename);
	    /* 86400 seconds in one day */
	    if ($stats[9] > (time() - (86400 * $days))) {
	    	$norefresh = TRUE;
	    	$updated = date("Y-m-d H:i:s", $stats[9]);
	    }
    }
    
    /* if $norefresh is still FALSE, file will be created or updated; otherwise, it will be loaded */
    if ($norefresh) {
    	$full_content = file_get_contents($filename);
    } else {         
	$previous_value = libxml_use_internal_errors(TRUE);
	$context = stream_context_create(array(
	'https' => array('ignore_errors' => true),
	));
	$html = file_get_contents($profile_url, false, $context);  
	
	$class = 'profile-page';
	$resultsBucket = getClass($class,$html);
	

	libxml_clear_errors();
	libxml_use_internal_errors($previous_value);
	        
	$full_content = $full_content . buildContent($resultsBucket);
	
	/* making sure correct path exists on images */
	$full_content = str_replace("src=\"/","src=\"http://datacamp.com/",$full_content);
	
	/* changing h2 tags to h1 tags and inserting line breaks */
	$full_content = str_replace("<h2","<br /><h1",$full_content);
	$full_content = str_replace("</h2>","</h1><br />",$full_content);
	
	/* disabling the anchor tags on each badge by changing to divs */
	$full_content = str_replace("<a href","<div class",$full_content);
	$full_content = str_replace("<a class","<div class",$full_content);
	$full_content = str_replace("</a>","</div>",$full_content);
	
	/* adding line breaks */
	$full_content = str_replace("<div class=\"stats\">","<br /><div class=\"stats\">",$full_content);
	$full_content = str_replace("Earned</span>","Earned</span><br />",$full_content);
	$full_content = str_replace("Completed</span>","Completed</span><br />",$full_content);
	$full_content = str_replace("Aced</span>","Aced</span><br /><br />",$full_content);
		
	file_put_contents($filename,$full_content);
	$updated = date("Y-m-d H:i:s");
    }
?>
<a href="<?php echo $profile_url; ?>" target="_blank">
	<div class="wrapper-scores">
	<!-- <?php echo "Last updated: $updated" ?> -->
	<?php
	    	/* return the html */
		echo $full_content;
	?>
	</div>
</a>

If you wish to display this in a WordPress widget, create a Text widget and add this code, replacing “NICKNAME” with a DataCamp username.

<div id="datacamp"></div>
<script>
(function($) {
$("#datacamp").load("/datacamp/datacamp.php?nick=NICKNAME");
})(jQuery);
</script>

Since my primary reason for writing these has been to populate widgets on this WordPress blog, at some point I’ll probably incorporate these into WP plugins.

End of Another Year! … and a Minor Update to the Code School Profile Scraper

Code School Profile with LI Bullet

It’s hard to believe 2016 is already drawing to a close. It seems like just yesterday that I was writing about using PHP to search through my source code!

Though I wouldn’t call this an intractable problem, I did notice something annoying when looking at my Code School profile on the sidebar of this site.

Code School Profile with LI Bullet

Between the badges in the “Master Status” section, white dots had appeared! Upon inspecting these, I saw that these were the bullets on the list items that held the badges.

One article on Stack Overflow suggested that the CSS style for the tag for the unordered list that holds the list items should be include “list-style-type: none;”, but that seemed to have no effect.

After playing with the CSS a bit, I discovered that setting that property on the li tag instead fixed the problem.

Here is the corrected CSS code which updates the code from a past post:

<style>
#codeschool {
   border: 1px solid blue;
   text-align: center;
   vertical-align: middle;
}

#codeschool li {
   list-style-type: none;
}

.badge-img {
   display:block !important;
   margin-left: auto;
   margin-right: auto;
}

.pr-avatar {
   display:block;
   margin-left: auto;
   margin-right: auto;
   margin-bottom: 10px;
}
</style>

By implementing this minor change, the bullets disappeared, and the profile looks as it did originally.

Have a Merry Christmas and a Happy New Year!

Using the Bing API to Send Pings to Ping-O-Matic

bing basic search before

Those who use WordPress may be familiar with Ping-O-Matic for keeping blog search engines updated whenever a new post is written.

I discovered that Bing had more of my posts indexed than any other search engine. I could manually submit the link for each post to Ping-O-Matic, but that would be no fun. Instead, I decided to automate the process.

The Bing Search API is free for use up to 5000 queries per month. All you need is a Microsoft account to get a key from the Microsoft Azure Marketplace.

Once you have the key, you can set up two files: the first is called bing_basic.html, and the second is bing_basic.php. These files are a mashup from the Bing Search API guide and a function created for sending pings to Ping-O-Matic.

bing_basic.html:

&lt;html&gt;
	&lt;head&gt;
		&lt;title&gt;Bing Search Tester (Basic)&lt;/title&gt;
		&lt;meta http-equiv=&quot;Content-Type&quot; content=&quot;text/html; charset=utf-8&quot; /&gt;
	&lt;/head&gt;
	&lt;body&gt;
		&lt;h1&gt;Bing Search Tester (Basic)&lt;/h1&gt;
		&lt;form method=&quot;POST&quot; action=&quot;bing_basic.php&quot;&gt;
			&lt;label for=&quot;service_op&quot;&gt;Service Operation&lt;/label&gt;&lt;br/&gt;
			&lt;input name=&quot;service_op&quot; type=&quot;radio&quot; value=&quot;Web&quot; CHECKED /&gt; Web &lt;input name=&quot;service_op&quot; type=&quot;radio&quot; value=&quot;Image&quot; /&gt; Image &lt;br/&gt;
			&lt;label for=&quot;query&quot;&gt;Query&lt;/label&gt;&lt;br/&gt;
			&lt;input name=&quot;query&quot; type=&quot;text&quot; size=&quot;60&quot; maxlength=&quot;60&quot; value=&quot;&quot; /&gt;&lt;br /&gt;
			&lt;br /&gt;
			&lt;input name=&quot;bt_search&quot; type=&quot;submit&quot; value=&quot;Search&quot; /&gt;
		&lt;/form&gt;
		&lt;h2&gt;Results&lt;/h2&gt; {RESULTS} 
	&lt;/body&gt;
&lt;/html&gt;

bing_basic.php:

&lt;?php

/*
--------------------------------------------
 $title contains the title of the page you're sending
 $url is the url of the page
 $debug true print out the debug and show xml call and answer
--------------------------------------------
 the output is an array with two elements:
 status: ok / ko
 msg: the text response from pingomatic
--------------------------------------------
*/
function pingomatic($title,$url,$debug=false) {
    $content='&lt;?xml version=&quot;1.0&quot;?&gt;'.
        '&lt;methodCall&gt;'.
        ' &lt;methodName&gt;weblogUpdates.ping&lt;/methodName&gt;'.
        '  &lt;params&gt;'.
        '   &lt;param&gt;'.
        '    &lt;value&gt;'.$title.'&lt;/value&gt;'.
        '   &lt;/param&gt;'.
        '  &lt;param&gt;'.
        '   &lt;value&gt;'.$url.'&lt;/value&gt;'.
        '  &lt;/param&gt;'.
        ' &lt;/params&gt;'.
        '&lt;/methodCall&gt;';
 
    $headers=&quot;POST / HTTP/1.0\r\n&quot;.
    &quot;User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1) Gecko/20090624 Firefox/3.5 (.NET CLR 3.5.30729)\r\n&quot;.
    &quot;Host: rpc.pingomatic.com\r\n&quot;.
    &quot;Content-Type: text/xml\r\n&quot;.
    &quot;Content-length: &quot;.strlen($content);
 
    if ($debug) nl2br($headers);
 
    $request=$headers.&quot;\r\n\r\n&quot;.$content;
    $response = &quot;&quot;;
    $fs=fsockopen('rpc.pingomatic.com',80, $errno, $errstr);
    if ($fs) { 
        fwrite ($fs, $request); 
        while (!feof($fs)) $response .= fgets($fs); 
        if ($debug) echo &quot;&lt;xmp&gt;&quot;.$response.&quot;&lt;/xmp&gt;&quot;;
        fclose ($fs);
        preg_match_all(&quot;/&lt;(name|value|boolean|string)&gt;(.*)&lt;\/(name|value|boolean|string)&gt;/U&quot;,$response,$ar, PREG_PATTERN_ORDER);
        for($i=0;$i&lt;count($ar[2]);$i++) $ar[2][$i]= strip_tags($ar[2][$i]);
        return array('status'=&gt; ( $ar[2][1]==1 ? 'ko' : 'ok' ), 'msg'=&gt;$ar[2][3] );
    } else { 
        if ($debug) echo &quot;&lt;xmp&gt;&quot;.$errstr.&quot; (&quot;.$errno.&quot;)&lt;/xmp&gt;&quot;; 
        return array('status'=&gt;'ko', 'msg'=&gt;$errstr.&quot; (&quot;.$errno.&quot;)&quot;);
    } 
}





/****
* Simple PHP application for using the Bing Search API
*/
$acctKey = 'YourAccountKey';
$rootUri = 'https://api.datamarket.azure.com/Bing/Search';

// Read the contents of the .html file into a string.
$contents = file_get_contents('bing_basic.html');
if ($_POST['query'])
{
// Here is where you'll process the query.
// The rest of the code samples in this tutorial are inside this conditional block.
// Encode the query and the single quotes that must surround it.
$query = urlencode(&quot;'{$_POST['query']}'&quot;);
// Get the selected service operation (Web or Image).
$serviceOp = $_POST['service_op'];
// Construct the full URI for the query.
$requestUri = &quot;$rootUri/$serviceOp?\$format=json&amp;Query=$query&quot;;
// Encode the credentials and create the stream context.
$auth = base64_encode(&quot;$acctKey:$acctKey&quot;);
$data = array(
'http' =&gt; array(
'request_fulluri' =&gt; true,
// ignore_errors can help debug – remove for production. This option added in PHP 5.2.10
'ignore_errors' =&gt; true,
'header' =&gt; &quot;Authorization: Basic $auth&quot;)
);
$context = stream_context_create($data);
// Get the response from Bing.
$response = file_get_contents($requestUri, 0, $context);
// Decode the response.
$jsonObj = json_decode($response);
$resultStr = '';
// Parse each result according to its metadata type.
foreach($jsonObj-&gt;d-&gt;results as $value) {
	$pingresults = pingomatic($value-&gt;Url,$value-&gt;Description);
	//$arrlength = count($pingresults);
	//$resultStr .= $arrlength . &quot;&lt;br /&gt;\n&quot;;
	//for($x = 0; $x &lt; $arrlength; $x++) {
	$resultStr .= $pingresults['status'] . &quot;&lt;br /&gt;\n&quot;;
	$resultStr .= $pingresults['msg'] . &quot;&lt;br /&gt;\n&quot;;
	//}  
	switch ($value-&gt;__metadata-&gt;type) { 
		case 'WebResult': 
			$resultStr .= 
				&quot;&lt;a href=\&quot;{$value-&gt;Url}\&quot;&gt;{$value-&gt;Title}&lt;/a&gt;&lt;p&gt;{$value-&gt;Description}&lt;/p&gt;&quot;;
				 break;
				 case 'ImageResult':
				 	$resultStr .=
				 		&quot;&lt;h4&gt;{$value-&gt;Title} ({$value-&gt;Width}x{$value-&gt;Height}) &quot; . &quot;{$value-&gt;FileSize} bytes)&lt;/h4&gt;&quot; . &quot;&lt;a href=\&quot;{$value-&gt;MediaUrl}\&quot;&gt;&quot; . &quot;&lt;img src=\&quot;{$value-&gt;Thumbnail-&gt;MediaUrl}\&quot;&gt;&lt;/a&gt;&lt;br /&gt;&quot;;
				 		 break;
				 	} 
				 } 
// Substitute the results placeholder. Ready to go. 
$contents = str_replace('{RESULTS}', $resultStr, $contents);
}
echo $contents;
?&gt;

Put both files into the same folder on your website. When you open bing_basic.php in your browser, it should look something like this:

bing basic search before

In my case, I wanted to get all the pages that Bing had indexed. I entered “site:deepinthecode.com” into the Query box and clicked Search.

After a moment, the page refreshed with the following results:

bing basic after searching

The “ok” status shows that the ping was received properly and the line below shows the status message. Following those lines are the hyperlinked post titles and the beginning of each post.

Had the ping not been received, the status would be “ko”, and the message would (hopefully) be descriptive of why the ping did not take.

The function in the PHP file can be modified for use with APIs other than the one for Ping-O-Matic, though I haven’t had time to make any changes there yet to see how well it will work with other sites.