Another Code School Profile Scraper Update

CodeSchool logo

Merry Christmas!

This will be a short post – I guess Code School must make minor changes to their user profile code about this time each year, as it was just a year ago that I had to update this code.

For the codeschool.php file, change this:

/* disabling the anchor tags on each badge by changing to divs */
$full_content = str_replace("<a href","<div class",$full_content);
$full_content = str_replace("</a>","</div>",$full_content);

to this:

/* disabling the anchor tags on each badge by changing to divs */
$full_content = str_replace("<a rel=\"tooltip\" ","<div ",$full_content);
$full_content = str_replace("href=\"/learn","data-href=\"http://codeschool.com/learn",$full_content);
$full_content = str_replace("</a>","</div>",$full_content);

This will eliminate the links that would be intended to point at the Code School paths, but instead end up being broken links pointing at your own site.

Have a Happy New Year!

Using PHP to Scrape the Report Card from a DataCamp Profile

DataCamp logo

UPDATE: This does not currently work, as DataCamp has changed the structure of profile pages. I will revise this as soon as is feasible. (7/26/17)

Just as I have written scripts for displaying report cards for Code School, CodeEval, and Duolingo on my blog, I have written the below script for displaying a DataCamp profile.

datacamp.php:

<style>
#datacamp {
   border: 1px solid blue;
   text-align: center;
   vertical-align: middle;
   width: 100%;
}

#datacamp li {
   list-style-type: none;
}

.image-centered {
   display:block;
   margin-left: auto;
   margin-right: auto;
   margin-bottom: 10px;
}

.image-rounded {
    border-radius: 50%;
}

.progress-bar {
    position: relative;
    border: 1px solid #33aacc;
    width: 100%;
    height: 18px;
    margin-bottom: 1rem;
}

.progress-bar .inner {
   position: absolute;
   left: 0;
   top: 0;
   bottom: 0;
   background-color: #33aacc;
   min-width: 5px;
}

.wrapper-scores .container {
   min-width: 150px;
   max-width: 260px;
   width: 100%;
}

.course-block__completed, .course-block__certificate-download, .btn-linkedin-share,
.course-block__description, .course-block__author {
   display: none;
}

.col-sm-4 img {
   display: inline;
   padding: 1px;
}
</style>
<?php
    date_default_timezone_set('America/Los_Angeles');

    function getClass($classname, $htmltext)
    {
        $dom = new DOMDocument;
        $dom->loadHTML($htmltext);
        $xpath = new DOMXPath($dom);
        $results = $xpath->query("//*[@class='" . $classname . "']");
        return $results;
    }

    function buildContent($results)
    {
        $content = "";
        foreach ($results as $node) {
            $partial_content = innerHTML($node);
            $content = $content . $partial_content;
        }
        return $content;
    }

    /* this function preserves the inner content of the scraped element.
    ** http://stackoverflow.com/questions/5349310/how-to-scrape-web-page-data-without-losing-tags
    ** So be sure to go and give that post an uptick too 🙂
    **/
    function innerHTML(DOMNode $node)
    {
      $doc = new DOMDocument();
      foreach ($node->childNodes as $child) {
        $doc->appendChild($doc->importNode($child, true));
      }
      return $doc->saveHTML();
    }

    $profilename = $_GET['nick'];
    if (strlen($profilename) == 0)
	exit(1);
    $profile_url =  'https://www.datacamp.com/profile/' . $profilename . '/';
    $filename = "datacamp_" . $profilename . ".txt";
    $full_content = '';
    $norefresh = FALSE;
    $days = 1;
    $updated = 'no date';

    /* checks to see if file exists and is current */
    if (file_exists($filename)) {
	    $stats = stat($filename);
	    /* 86400 seconds in one day */
	    if ($stats[9] > (time() - (86400 * $days))) {
	    	$norefresh = TRUE;
	    	$updated = date("Y-m-d H:i:s", $stats[9]);
	    }
    }

    /* if $norefresh is still FALSE, file will be created or updated; otherwise, it will be loaded */
    if ($norefresh) {
    	$full_content = file_get_contents($filename);
    } else {
	$previous_value = libxml_use_internal_errors(TRUE);
	$context = stream_context_create(array(
	'https' => array('ignore_errors' => true),
	));
	$html = file_get_contents($profile_url, false, $context);  

	$class = 'profile-page';
	$resultsBucket = getClass($class,$html);

	libxml_clear_errors();
	libxml_use_internal_errors($previous_value);

	$full_content = $full_content . buildContent($resultsBucket);

	/* making sure correct path exists on images */
	$full_content = str_replace("src=\"/","src=\"http://datacamp.com/",$full_content);

	/* changing h2 tags to h1 tags and inserting line breaks */
	$full_content = str_replace("<h2","<br /><h1",$full_content);
	$full_content = str_replace("</h2>","</h1><br />",$full_content);

	/* disabling the anchor tags on each badge by changing to divs */
	$full_content = str_replace("<a href","<div class",$full_content);
	$full_content = str_replace("<a class","<div class",$full_content);
	$full_content = str_replace("</a>","</div>",$full_content);

	/* adding line breaks */
	$full_content = str_replace("<div class=\"stats\">","<br /><div class=\"stats\">",$full_content);
	$full_content = str_replace("Earned</span>","Earned</span><br />",$full_content);
	$full_content = str_replace("Completed</span>","Completed</span><br />",$full_content);
	$full_content = str_replace("Aced</span>","Aced</span><br /><br />",$full_content);

	file_put_contents($filename,$full_content);
	$updated = date("Y-m-d H:i:s");
    }
?>
<a href="<?php echo $profile_url; ?>" target="_blank">
	<div class="wrapper-scores">
	<!-- <?php echo "Last updated: $updated" ?> -->
	<?php
	    	/* return the html */
		echo $full_content;
	?>
	</div>
</a>

If you wish to display this in a WordPress widget, create a Text widget and add this code, replacing “NICKNAME” with a DataCamp username.

<div id="datacamp"></div>
<script>
(function($) {
$("#datacamp").load("/datacamp/datacamp.php?nick=NICKNAME");
})(jQuery);
</script>

Since my primary reason for writing these has been to populate widgets on this WordPress blog, at some point I’ll probably incorporate these into WP plugins.

End of Another Year! … and a Minor Update to the Code School Profile Scraper

CodeSchool logo

It’s hard to believe 2016 is already drawing to a close. It seems like just yesterday that I was writing about using PHP to search through my source code!

Though I wouldn’t call this an intractable problem, I did notice something annoying when looking at my Code School profile on the sidebar of this site.

Code School Profile with LI Bullet

Between the badges in the “Master Status” section, white dots had appeared! Upon inspecting these, I saw that these were the bullets on the list items that held the badges.

One article on Stack Overflow suggested that the CSS style for the tag for the unordered list that holds the list items should be include “list-style-type: none;”, but that seemed to have no effect.

After playing with the CSS a bit, I discovered that setting that property on the li tag instead fixed the problem.

Here is the corrected CSS code which updates the code from a past post:

<style>
#codeschool {
   border: 1px solid blue;
   text-align: center;
   vertical-align: middle;
}

#codeschool li {
   list-style-type: none;
}

.badge-img {
   display:block !important;
   margin-left: auto;
   margin-right: auto;
}

.pr-avatar {
   display:block;
   margin-left: auto;
   margin-right: auto;
   margin-bottom: 10px;
}
</style>

By implementing this minor change, the bullets disappeared, and the profile looks as it did originally.

Have a Merry Christmas and a Happy New Year!