Using PHP to Scrape the Report Card from a Code School Profile – Part 1

I have been manually placing my Master badges from Code School onto my blog which, being a WordPress blog, runs on PHP. I don’t have the script running that shows the fraction completed on Paths that I haven’t yet mastered, but I can get the ones that I’ve completed.

The PHP script I’ve built so far is below. Due to some CSS I want to change, I haven’t implemented it yet. But I can say that it does indeed scrape the page. The jQuery required to display it in the sidebar I’ll share once I get the CSS issues worked out. This is very similar to the code I used to get the CodeEval profile, but it’s been refactored and modified for use with the Code School page.

loadHTML($htmltext);
        $xpath = new DOMXPath($dom);
        $results = $xpath->query("//*[@class='" . $classname . "']");
        return $results;
    }
    
    
    function buildContent($results)
    {
        $content = "";
        foreach ($results as $node) {
            $partial_content = innerHTML($node);
            $content = $content . $partial_content;
        }
        return $content;
    }
    
    
    /* this function preserves the inner content of the scraped element. 
    ** http://stackoverflow.com/questions/5349310/how-to-scrape-web-page-data-without-losing-tags
    ** So be sure to go and give that post an uptick too :)
    **/
    function innerHTML(DOMNode $node)
    {
      $doc = new DOMDocument();
      foreach ($node->childNodes as $child) {
        $doc->appendChild($doc->importNode($child, true));
      }
      return $doc->saveHTML();
    }
    
    
    $previous_value = libxml_use_internal_errors(TRUE);
    $profilename = $_GET['nick'];
    $profile_url =  'https://www.codeschool.com/users/' . $profilename . '/';
    $context = stream_context_create(array(
        'https' => array('ignore_errors' => true),
    ));
    $html = file_get_contents($profile_url, false, $context);  

    $class = 'bucket';
    $resultsBucket = getClass($class,$html);
    
    $class = 'mbl tac';
    $resultsMaster = getClass($class,$html);
    
    $class = 'pr-pathStatus';
    $resultsPath = getClass($class,$html);
    
    libxml_clear_errors();
    libxml_use_internal_errors($previous_value);
?>


    ",$full_content);
        
        /* changing text on heading of Path Status */
        $full_content = str_replace("Path Status","Paths In Progress",$full_content);
        
        /* return the html */
        echo $full_content;
        ?>
    

PHP code updated on 2017.12.28.

One Year of Blogging at “Deep In The Code”!

I’ve been writing this blog for just over a year now; my first post was on May 23, 2012. I haven’t written as many articles as I would have liked over the last year, I think due to the fact that I’ve been too selective in what I wrote about (for fear of moving the blog off-topic) and some of the articles have been too long. I will endeavor to write shorter, but more frequent articles for the second year of this blog!

I enjoy the feedback I get (when it’s not spam) as it tells me if I’m writing about something that people care about, so please continue writing back!

My goals from last year never fully materialized, as I intended to learn OS X / iOS / Objective-C programming when I bought my MacBook Pro last year. While I did dip into the pool of Xcode programming, it was only a shallow dip. Instead, I have focused more on learning open source technologies – mostly Python and Ruby on Rails. Once I get a handle on these two, I intend to revisit iOS programming – though I may end up using RubyMotion instead of Objective-C. Only time will tell!

Thanks for reading my blog, and please feel free to make suggestions on what you would like to read about in the future!

%d bloggers like this: