Using PHP to Scrape the Report Card from a Code School Profile – Part 2

CodeSchool logo

Earlier this week, I described how to use PHP to scrape the report card from a Code School profile. Now, it must be displayed. I chose to display mine in the sidebar of my blog. To do this, jQuery and CSS will be your friends. It’s pretty simple, and this isn’t the only way to do it. However, in this implementation, it is important that the name of the querystring parameter used in the PHP script (in this case, “nick”) matches the one in the jQuery function call below. Likewise, the id attribute of the div element must also match the one in the jQuery statement.

(Update: the CSS code below should be updated in accordance with this change to prevent bullets from appearing between the badges in the “Master Status” section.)

<style>
#codeschool {
   text-align: center;
   vertical-align: middle;
}

.badge-img {
   display:block !important;
   margin-left: auto;
   margin-right: auto;
}

.pr-avatar {
   display:block;
   margin-left: auto;
   margin-right: auto;
   margin-bottom: 10px;
}
</style>
<div id="codeschool">
</div>
<br />
<script>
(function($) {
$("#codeschool").load("/codeschool/codeschool.php?nick=DeepInTheCode");
})(jQuery);
</script>

Well, that’s it. If you debug the client-side code on both your page and the Code School profile page, you’ll see that there are path elements in the Code School script that cause the partial opacity on uncompleted paths. This is presumably done with other JavaScripts and CSS on the Code School site. I haven’t tried bringing that functionality here as yet. Perhaps for another time…

Using PHP to Scrape the Report Card from a Code School Profile – Part 1

CodeSchool logo

I have been manually placing my Master badges from Code School onto my blog which, being a WordPress blog, runs on PHP. I don’t have the script running that shows the fraction completed on Paths that I haven’t yet mastered, but I can get the ones that I’ve completed.

The PHP script I’ve built so far is below. Due to some CSS I want to change, I haven’t implemented it yet. But I can say that it does indeed scrape the page. The jQuery required to display it in the sidebar I’ll share once I get the CSS issues worked out. This is very similar to the code I used to get the CodeEval profile, but it’s been refactored and modified for use with the Code School page.

<?php
    function getClass($classname, $htmltext)
    {
        $dom = new DOMDocument;
        $dom->loadHTML($htmltext);
        $xpath = new DOMXPath($dom);
        $results = $xpath->query("//*[@class='" . $classname . "']");
        return $results;
    }

    function buildContent($results)
    {
        $content = "";
        foreach ($results as $node) {
            $partial_content = innerHTML($node);
            $content = $content . $partial_content;
        }
        return $content;
    }

    /* this function preserves the inner content of the scraped element.
    ** http://stackoverflow.com/questions/5349310/how-to-scrape-web-page-data-without-losing-tags
    ** So be sure to go and give that post an uptick too 🙂
    **/
    function innerHTML(DOMNode $node)
    {
      $doc = new DOMDocument();
      foreach ($node->childNodes as $child) {
        $doc->appendChild($doc->importNode($child, true));
      }
      return $doc->saveHTML();
    }

    $previous_value = libxml_use_internal_errors(TRUE);
    $profilename = $_GET['nick'];
    $profile_url =  'https://www.codeschool.com/users/' . $profilename . '/';
    $context = stream_context_create(array(
        'https' => array('ignore_errors' => true),
    ));
    $html = file_get_contents($profile_url, false, $context);  

    $class = 'bucket';
    $resultsBucket = getClass($class,$html);

    $class = 'mbl tac';
    $resultsMaster = getClass($class,$html);

    $class = 'pr-pathStatus';
    $resultsPath = getClass($class,$html);

    libxml_clear_errors();
    libxml_use_internal_errors($previous_value);
?>

<a href="<?php echo $profile_url; ?>" target="_blank">
    <div class="wrapper-scores">
        <?php
        $full_content = "";

        $full_content = $full_content . buildContent($resultsBucket);
        $full_content = $full_content . buildContent($resultsMaster);
        $full_content = $full_content . buildContent($resultsPath);

        /* changing h2 tags to h1 tags and inserting line breaks */
        $full_content = str_replace("<h2","
<h1",$full_content);
        $full_content = str_replace("</h2>","</h1>
",$full_content);

	/* disabling the anchor tags on each badge by changing to divs */
        $full_content = str_replace("<a rel=\"tooltip\" ","<div ",$full_content);
        $full_content = str_replace("href=\"/learn","data-href=\"http://codeschool.com/learn",$full_content);
	$full_content = str_replace("</a>","</div>",$full_content);

        /* changing text on heading of Path Status */
        $full_content = str_replace("Path Status","Paths In Progress",$full_content);

        /* return the html */
        echo $full_content;
        ?>
    </div>
</a>

PHP code updated on 2017.12.28.

Ignoring Errors Returned by PHP file_get_contents HTTP Wrapper

PHP logo

I discovered that CodeEval was down briefly for maintenance, which caused my badge (in the sidebar of this blog) to be populated with the error message associated with an HTTP 503 error. Since I didn’t want error messages to be returned by my PHP script, I added one line and changed the line containing the file_get_contents statement, as seen below.

$context = stream_context_create(array(
    'http' => array('ignore_errors' => true),
));
$html = file_get_contents($score_url, false, $context);

Now, if there’s an HTTP error, the badge will be blank, but no error will appear. I was fortunate to find this quick fix on StackOverflow.