Outsider Data Science

Putting what's in there, out there. With R.

What Do The Ramones Want?

Recently I saw a tweet that shared this hilarious poster of Ramones “wants”. Dan Gneiding (aka Grayhood) is the graphic designer who created this. You can buy it here Very cool, but how accurate is it? I asked Dan and says he took some artistic license, as he should! You may accuse me of being that pedantic “Comic Book Guy” from “The Simpsons” but, when I saw it, I immediately wondered how I could tally these Ramones lyrics myself or, rather, get R to do it for me.

State Taxes: It's not just about Income

Which States Impose the Most “Tax Pain?” Much of the discussion around tax burdens focuses on income taxes but, at the state level, that leaves out two other big sources of tax liability, sales and property taxes. Here we’ll quickly look at the interplay of all three taxes in a graphical way. This can inform our thinking about how attractive it is to live in each state and on public policy questions involving tax fairness.

Gender Diversity in R and Python Package Contributors

Introduction Over the last few years I have really enjoyed becoming part of the R community. One of the best things about the community is the welcoming, inclusive and supportive nature of it. I can’t speak for other communities in the computer or data science worlds but I am well aware of the “brogrammer” culture in some circles that can be off-putting at times. The rise of codes of conduct across the open source world is changing things for the better, I think.

Why I migrated from Excel to R

I’ve been a spreadsheet power user from the days of Visicalc for the Apple ][. I migrated to Lotus 1-2-3, to Borland Quattro and finally to Excel. With Excel, I’ve bludgeoned Visual Basic to create some pretty complicated dashboards and analytics. When I started using R I used tools like RExcel that plug R in as an analytic server within Excel, or I would use Excel to download data from investment databases and export it for use in R.

Solving the Letterboxed Puzzle in the New York Times

What is the difference between “computer programming” and “data science?” To someone not invovled in either they look much the same. Most data scientists are also coders, though they don’t need to be. Data scientists (especially amateurs like me) don’t need to be concerned with pointers, stacks, heaps, recursion, etc., but this is not a data science post. For this post, I go back to my roots in the 1980s as an amateur computer scientist to solve a new New York Times puzzle called “Letterboxed.

Where Are The Libertarians?

…or… The Tough Road Ahead for Howard Schultz …or… My Preconceived Notions Are Shattered At the risk of losing half my readers in the first paragraph, I’ll share my political views. Generally, I believe in “free people and free markets.” That makes me a small-L libertarian. I stress “generally” does not mean everywhere, all the time. I find it useful to think of, not a political spectrum from left to right, but a compass with four points.

Rick and Morty Palettes

This was just a fun morning exercise. Let’s mix multiple images to make a palette of their principal colors using k-means. We’ll also use the totally awesome list-columns concept to put each image’s jpeg data into a data frame of lists that we can map to a function that turns the jpeg data into a list of palette colors in a new data frame. This more-or-less copies http://www.milanor.net/blog/build-color-palette-from-image-with-paletter/ with the added twist of using multiple images before creating the palette.

Is Free Pre-K in NYC Favoring the Rich?

Introduction A hallmark of mayoral administration of NYC Mayor Bill DeBlasio has been free pre-K for all New York families. When the program was initially rolled out there were complaints in some quarters that upper-income neighborhoods were getting more slots. This is an exploration comparing income to pre-K seats by neighborhoods. It was done mainly to help me practice with the whole workflow of data gathering, document parsing, and data tidying - plus making cool bi-variate choropleth maps!

New Winter Sports for New Countries

Looking at Winter Olympic Medal Rankings by Vintage of Sport Introduction Norway is a tiny country that punches way above its weight in the Winter Olympic medal count. We are not surprised as those folks are practically born on skis. At the same time, toussle-haired surfer dudes and dudettes from the US seem to be all over the hill when snowboards are involved. Notably, the sports where the US is most visible are sports which arose fairly recently.

Live Fast, Die Young, Stay Pretty?

Analyzing deaths of rock musicians Live fast, die young, stay pretty? That’s the stereotype for rockers, or it was. We only need to look at Keith Richards, over 70 and going strong, to find a major counterexample. Do rockers die young? What do they die of? How does that compare to the broader population (in the U.S., anyway). It turns out there are some suprising answers to those questions. Along the way we’ll learn something about web scraping, html parsing and some ggplot2 tricks.