using Gadfly
using DataFrames
birthdays = readtable("data-birthdays.csv")
typeof(birthdays)
First things first, we want to count how many users wish other users a happy birthday. My gut feeling is BadgerCat will win by a landslide.
wishingUsers = birthdays[:, :user]
wishingUsersCounts = countmap(wishingUsers)
You might not believe me, but doing DataFrame(keys(...), values(...)) didn't actually work because of the associated types, but if we map them out to a simple array it's all good.
There might be a better way to convert them if I knew more about Julia's type system, but I'm fine with this.
df = DataFrame(User=map((x) -> x, keys(wishingUsersCounts)), Wishes=map((x) -> x, values(wishingUsersCounts)))
set_default_plot_size(800px, 400px);
Gadfly.with_theme(:default) do
plot(df, x="User", y="Wishes", color="User",
Geom.bar,Guide.xlabel(nothing), Guide.ylabel("Wish count"),
Guide.colorkey(title=nothing, labels=[""], pos=[-100mm, -100mm]))
end
okay, that was only somewhat more difficult than expected. First, the results: Not too surprising, BadgerCat is the clear victor, with BenjaminGruenbaum eating dirt behind, flanked by SomeGuy. The rest are simply noise.
Next up we have how many users were wished a happy birthday. We'll try to extract that from the message body, and work on from there.
msg = birthdays[1, :]
match(r"happy birthday @(\S+)"i, msg[1, :body])
map((body) -> match(r"@(\S+)|happy birthday @?(\S+)"i, body), birthdays[:, :body])
That gets most of them, let's see which ones it didn't get
for row in eachrow(birthdays)
if match(r"@(\S+)|happy birthday @?(\S+)"i, row[:body]) === nothing
println(row)
end
end
Well that's a bit disheartening. First one's fine, we'll need to fix the regex for the 3rd one and correct some anomalies (e.g. bae-dger and "me").
birthdays[46, :body] = "Happy Birthday... @monners. (ಥ﹏ಥ)"
birthdays[9, :body] = "HAPPY BIRTHDAY @BadgerCat... GET IT!?"
for row in eachrow(birthdays)
if match(r"@(\S+)|happy birthday[\s\.,]+@?(\S+)"i, row[:body]) === nothing
println(row)
end
end
We're likely to discover more when we actually plot. We'll deal with them as they come. Let's create our data frame.
wishesCount = countmap(sort(map(function (body)
m = match(r"@(\w+)|happy birthday[\s\.,]+@?(\w+)"i, body)
if m === nothing
return ""
end
if m.captures[1] === nothing
lowercase(m.captures[2])
else
lowercase(m.captures[1])
end
end, birthdays[:, :body]).data))
wishesdf = DataFrame(User=map((x) -> x, keys(wishesCount)), Wishes=map((x) -> x, values(wishesCount)))
plot(sort(wishesdf, cols=[order(:User)]), x="User", y="Wishes", color="User",
Geom.bar, Guide.xlabel(nothing), Guide.ylabel("Wishes received"),
Guide.colorkey(title=nothing, labels=[""], pos=[-100mm, -100mm]))
If only gadfly had support for pie charts, but alas, we'll have to make do.
Now that we have that out of the way, let's do something a bit more interesting and sum up the amount of stars the birthday-recipients received.
Because I totally thought this through, we'll have to re-use our user-matching function from before.
birthdayStars = map(function (row)
m = match(r"@(\w+)|happy birthday[\s\.,]+@?(\w+)"i, row[:body])
if m === nothing
return ["", 0]
end
[if m.captures[1] === nothing
lowercase(m.captures[2])
else
lowercase(m.captures[1])
end, row[:stars]]
end, eachrow(birthdays))
starsCount = foldl(function reducer(accum, item)
accum[item[1]] = get(accum, item[1], 0) + item[2]
accum
end, Dict(), birthdayStars)
starsdf = DataFrame(User=map((x) -> x, keys(starsCount)), Wishes=map((x) -> x, values(starsCount)))
plot(sort(starsdf, cols=[order(:User)]), x="User", y="Wishes", color="User",
Geom.bar, Guide.xlabel(nothing), Guide.ylabel("Wishes received"),
Guide.colorkey(title=nothing, labels=[""], pos=[-100mm, -100mm]))
Now ordered by star count:
plot(sort(starsdf, cols=[order(:Wishes)], rev=true), x="User", y="Wishes", color="User",
Geom.bar, Guide.xlabel(nothing), Guide.ylabel("Wishes received"),
Guide.colorkey(title=nothing, labels=[""], pos=[-100mm, -100mm]))
But that's not fair, as some users may have been wished a happy birthday less often! Let's do a half-assed job of accounting for that:
totalStars = sum(birthdays[:stars])
weightedStarsCount = map(pair -> Pair(pair[1], pair[2] / totalStars), starsCount)
Now we have the percentage of how each user's stars accounted for the total stars.
This did not explain anything but I have the flu and what I'm doing is graphs so flip off.
weighteddf = starsdf = DataFrame(User=map((x) -> x, keys(weightedStarsCount)), Weight=map((x) -> x, values(weightedStarsCount)));
plot(sort(starsdf, cols=[order(:Weight)], rev=true), x="User", y="Weight", color="User",
Geom.bar, Guide.xlabel(nothing), Guide.ylabel("Stars weight"),
Guide.colorkey(title=nothing, labels=[""], pos=[-100mm, -100mm]))
In conclusion, it's obvious to see that loktar is the most loved room member. 2.9 loktar.