I have been manually placing my Master badges from Code School onto my blog which, being a WordPress blog, runs on PHP. I don’t have the script running that shows the fraction completed on Paths that I haven’t yet mastered, but I can get the ones that I’ve completed.
The PHP script I’ve built so far is below. Due to some CSS I want to change, I haven’t implemented it yet. But I can say that it does indeed scrape the page. The jQuery required to display it in the sidebar I’ll share once I get the CSS issues worked out. This is very similar to the code I used to get the CodeEval profile, but it’s been refactored and modified for use with the Code School page.
<?php
function getClass($classname, $htmltext)
{
$dom = new DOMDocument;
$dom->loadHTML($htmltext);
$xpath = new DOMXPath($dom);
$results = $xpath->query("//*[@class='" . $classname . "']");
return $results;
}
function buildContent($results)
{
$content = "";
foreach ($results as $node) {
$partial_content = innerHTML($node);
$content = $content . $partial_content;
}
return $content;
}
/* this function preserves the inner content of the scraped element.
** http://stackoverflow.com/questions/5349310/how-to-scrape-web-page-data-without-losing-tags
** So be sure to go and give that post an uptick too 🙂
**/
function innerHTML(DOMNode $node)
{
$doc = new DOMDocument();
foreach ($node->childNodes as $child) {
$doc->appendChild($doc->importNode($child, true));
}
return $doc->saveHTML();
}
$previous_value = libxml_use_internal_errors(TRUE);
$profilename = $_GET['nick'];
$profile_url = 'https://www.codeschool.com/users/' . $profilename . '/';
$context = stream_context_create(array(
'https' => array('ignore_errors' => true),
));
$html = file_get_contents($profile_url, false, $context);
$class = 'bucket';
$resultsBucket = getClass($class,$html);
$class = 'mbl tac';
$resultsMaster = getClass($class,$html);
$class = 'pr-pathStatus';
$resultsPath = getClass($class,$html);
libxml_clear_errors();
libxml_use_internal_errors($previous_value);
?>
<a href="<?php echo $profile_url; ?>" target="_blank">
<div class="wrapper-scores">
<?php
$full_content = "";
$full_content = $full_content . buildContent($resultsBucket);
$full_content = $full_content . buildContent($resultsMaster);
$full_content = $full_content . buildContent($resultsPath);
/* changing h2 tags to h1 tags and inserting line breaks */
$full_content = str_replace("<h2","
<h1",$full_content);
$full_content = str_replace("</h2>","</h1>
",$full_content);
/* disabling the anchor tags on each badge by changing to divs */
$full_content = str_replace("<a rel=\"tooltip\" ","<div ",$full_content);
$full_content = str_replace("href=\"/learn","data-href=\"http://codeschool.com/learn",$full_content);
$full_content = str_replace("</a>","</div>",$full_content);
/* changing text on heading of Path Status */
$full_content = str_replace("Path Status","Paths In Progress",$full_content);
/* return the html */
echo $full_content;
?>
</div>
</a>
PHP code updated on 2017.12.28.
2 Replies to “Using PHP to Scrape the Report Card from a Code School Profile – Part 1”