ericchiang's comments

ericchiang · on May 24, 2015

This is a very small amount of boilerplate around the golang.com/x/net/html package. If you need the huge feature set of goquery, use that. But I find this pretty suitable for my day to day problems.

ericchiang · on May 24, 2015

  rows := scrape.FindAll(table, scrape.ByTag(atom.Tr))
  cols := []*html.Node{}
  for _, row := range rows {
      // Find returns the first result
      col, ok := scrape.Find(row, scrape.ByTag(atom.Td))
      if ok {
          cols = append(cols, col)
      }
  }

chrissnell · on May 24, 2015

Thanks!

ericchiang · on June 17, 2014

> You can wip up REST service very easily that wraps sk-learn predictor and I would bet it's actually much easier to do than writing PMML exporters.

So as it turns out I spend my days building the very product you're describing (yhathq.com; a REST API-ifier for R and Python). The scikit-learn community alone are a wonderful group who do a hell of a job. It's kinda crazy that most products won't let you use that awesomeness and instead choose to build out their own machine learning libraries to work within their system.

This article got passed around the office this morning and it seems to encompass the general theme of most ML tools. They empower you to do cool things with machine learning/general data analysis, but at the expense of being able to use the libraries that most people use to do machine learning/general data analysis. Don't know if I'd consider that poor design, but yeah, it's definitely a tradeoff.

Hmm, maybe I should be reaching out to airbnb's data science team?

ericchiang · on May 24, 2014

Probably the best explanation of the intuition behind Benford's law. Worth a watch if you've got the time:

https://www.youtube.com/watch?v=XXjlR2OK1kM

ericchiang · on May 15, 2014

Hi there.

That ml problem is more for example than for rigor. In fact that particular problem would probably be better suited for other algorithms (eg, random forest).

My background's in biomedical imaging, so I'm quite fond of problems with skewed class distributions. Though I didn't have time to explore this particular one further.

The code's all openly available if you want to give it a go though :)