Another Code School Profile Scraper Update

Merry Christmas!

This will be a short post – I guess Code School must make minor changes to their user profile code about this time each year, as it was just a year ago that I had to update this code.

Code School logo

For the codeschool.php file, change this:

/* disabling the anchor tags on each badge by changing to divs */
$full_content = str_replace("","

to this:

/* disabling the anchor tags on each badge by changing to divs */
$full_content = str_replace("","

This will eliminate the links that would be intended to point at the Code School paths, but instead end up being broken links pointing at your own site.

Have a Happy New Year!

Using PHP to Scrape the Report Card from a DataCamp Profile

Data Camp logo

UPDATE: This does not currently work, as DataCamp has changed the structure of profile pages. I will revise this as soon as is feasible. (7/26/17)

Just as I have written scripts for displaying report cards for Code School, CodeEval, and Duolingo on my blog, I have written the below script for displaying a DataCamp profile.

DataCamp logo


#datacamp {
   border: 1px solid blue;
   text-align: center;
   vertical-align: middle;
   width: 100%;

#datacamp li {
   list-style-type: none;

.image-centered {
   margin-left: auto;
   margin-right: auto;
   margin-bottom: 10px;

.image-rounded {
    border-radius: 50%;

.progress-bar {
    position: relative;
    border: 1px solid #33aacc;
    width: 100%;
    height: 18px;
    margin-bottom: 1rem;

.progress-bar .inner {
   position: absolute;
   left: 0;
   top: 0;
   bottom: 0;
   background-color: #33aacc;
   min-width: 5px;

.wrapper-scores .container {
   min-width: 150px;
   max-width: 260px;
   width: 100%;

.course-block__completed, .course-block__certificate-download, .btn-linkedin-share,
.course-block__description, .course-block__author {
   display: none;

.col-sm-4 img {
   display: inline;
   padding: 1px;
    function getClass($classname, $htmltext)
        $dom = new DOMDocument;
        $xpath = new DOMXPath($dom);
        $results = $xpath->query("//*[@class='" . $classname . "']");
        return $results;
    function buildContent($results)
        $content = "";
        foreach ($results as $node) {
            $partial_content = innerHTML($node);
            $content = $content . $partial_content;
        return $content;
    /* this function preserves the inner content of the scraped element. 
    ** So be sure to go and give that post an uptick too :)
    function innerHTML(DOMNode $node)
      $doc = new DOMDocument();
      foreach ($node->childNodes as $child) {
        $doc->appendChild($doc->importNode($child, true));
      return $doc->saveHTML();
    $profilename = $_GET['nick'];
    if (strlen($profilename) == 0)
    $profile_url =  '' . $profilename . '/';
    $filename = "datacamp_" . $profilename . ".txt";
    $full_content = '';   
    $norefresh = FALSE;
    $days = 1;
    $updated = 'no date';
    /* checks to see if file exists and is current */
    if (file_exists($filename)) {
	    $stats = stat($filename);
	    /* 86400 seconds in one day */
	    if ($stats[9] > (time() - (86400 * $days))) {
	    	$norefresh = TRUE;
	    	$updated = date("Y-m-d H:i:s", $stats[9]);
    /* if $norefresh is still FALSE, file will be created or updated; otherwise, it will be loaded */
    if ($norefresh) {
    	$full_content = file_get_contents($filename);
    } else {         
	$previous_value = libxml_use_internal_errors(TRUE);
	$context = stream_context_create(array(
	'https' => array('ignore_errors' => true),
	$html = file_get_contents($profile_url, false, $context);  
	$class = 'profile-page';
	$resultsBucket = getClass($class,$html);

	$full_content = $full_content . buildContent($resultsBucket);
	/* making sure correct path exists on images */
	$full_content = str_replace("src=\"/","src=\"",$full_content);
	/* changing h2 tags to h1 tags and inserting line breaks */
	$full_content = str_replace("<h2","<br /><h1",$full_content);
	$full_content = str_replace("</h2>","</h1><br />",$full_content);
	/* disabling the anchor tags on each badge by changing to divs */
	$full_content = str_replace("<a href","<div class",$full_content);
	$full_content = str_replace("<a class","<div class",$full_content);
	$full_content = str_replace("</a>","</div>",$full_content);
	/* adding line breaks */
	$full_content = str_replace("<div class=\"stats\">","<br /><div class=\"stats\">",$full_content);
	$full_content = str_replace("Earned</span>","Earned</span><br />",$full_content);
	$full_content = str_replace("Completed</span>","Completed</span><br />",$full_content);
	$full_content = str_replace("Aced</span>","Aced</span><br /><br />",$full_content);
	$updated = date("Y-m-d H:i:s");
<a href="<?php echo $profile_url; ?>" target="_blank">
	<div class="wrapper-scores">
	<!-- <?php echo "Last updated: $updated" ?> -->
	    	/* return the html */
		echo $full_content;

If you wish to display this in a WordPress widget, create a Text widget and add this code, replacing “NICKNAME” with a DataCamp username.

<div id="datacamp"></div>
(function($) {

Since my primary reason for writing these has been to populate widgets on this WordPress blog, at some point I’ll probably incorporate these into WP plugins.

%d bloggers like this: