January 04, 2016

When Do I Send Emails?

January 04, 2016/ E.Z. Hart

Work and the holidays have been distracting me from blogging and fun side projects for a couple of months, so I'm easing back into it with a really quick and easy Gmail data-wrangling post.

When I first started playing around with my Gmail data, I mentioned that I wanted to get some of the stats that Xobni used to provide before they were swallowed by the Yahoo! black hole. A couple of the simpler stats to compile are "what days of the week do I send most of my email?" and "what time of day do I send the most emails?".

In order to do any time-based analysis on my emails, I'm going to need the dates and times they were sent. So I've taken the PowerShell script from a while back and made a slight modification; in the $props hash I'm adding a field called SentDate:

$props = @{
    Id = $mimeMessage.MessageId
    To = $mimeMessage.To.ToString()
    From = $mimeMessage.From.ToString()
    FromEmail = $fromEmail
    Subject = $mimeMessage.Subject
    Body = $bodyText
    Format = $actualFormat
    SentDate = $mimeMessage.Date.ToUniversalTime()
}

MimeKit provides the sent date of the email as a DateTimeOffset; to keep things consistent, I'm converting everything to UTC at this stage.

From there, I import the data into pandas as per usual and filter it down to just the emails sent by me:

import pandas as pd
import numpy as np
import humanfriendly
import matplotlib.pyplot as plt
plt.style.use('ggplot')

df = pd.read_csv('../times.csv', header = 0)

fromMe = df.query('FromEmail == "[my email]"')

It turns out that indexing your data by date/time in pandas is pretty easy; you just create a DateTimeIndex:

temp = pd.DatetimeIndex(fromMe['SentDate'], tz = 'UTC').tz_convert('America/Denver')

Here I'm telling pandas to create an index by the SentDate field, and that the field is already in UTC. Then I'm converting all of those dates and times to my local timezone so that the data makes sense from my local perspective. This mostly works, because I mostly live in the Mountain timezone. Some of my data will be a little skewed because of emails sent while traveling and a few months where I lived in the Eastern time zone, but not so much that I care. In a later post I might look at how this data changes over time, which is more interesting (I might even be able to identify when and where I was traveling based on that data).

But for now, let's break down the data in temp and shove it back into the original dataset:

fromMe['DayOfWeek'] = temp.dayofweek
fromMe['Hour'] = temp.hour
fromMe['Year'] = temp.year

Now for each email from me, I've got a column that tells me what hour of the day and day of the week it was sent. From there, aggregating it and charting it are a snap:

# Number of emails sent by day of week
sentDayOfWeek = fromMe.groupby(['DayOfWeek']).agg({'Id' : 'count'})
sentDayOfWeek['Id'].plot(kind='bar', figsize=(6, 6), title='Emails Sent By Day Of Week')
plt.show()

# Number of emails sent by hour of day
sentHourOfDay = fromMe.groupby(['Hour']).agg({'Id' : 'count'})
sentHourOfDay['Id'].plot(kind='bar', figsize=(6, 6), title='Emails Sent By Hour Of Day')
plt.show()

The data is about what I'd expect; more emails on Monday than any other day (0 == Monday on this chart) and the majority of emails sent during the workday (with a dip around lunch).

Aggregating by year provides a bit of mystery, though:

sentYear = fromMe.groupby(['Year']).agg({'Id' : 'count'})
sentYear['Id'].plot(kind='bar', figsize=(6, 6), title='Emails Sent By Year')
plt.show()

The numbers vary quite a bit more than I'd expect. 2004 makes sense; I only started using Gmail in July of that year. And the next couple of years shows me using Gmail more and more over my old Lycos account. The spike in 2011 also seems reasonable, as that's when I stopped working at an office with an Exchange server, so my day-to-day email load shifted. But the dips in 2012 and 2015? No idea. I'll have to dig further into those.

December 28, 2015

Windows Phone 8 Emulator Issues After Visual Studio Update 1

December 28, 2015/ E.Z. Hart

A week or two ago I applied Visual Studio Update 1 on my main development machine and I ran into a problem with my Windows Phone emulator. Since it took me most of a morning to find the solution, I thought I'd post it here to give it a little more Google juice and hopefully save others some time.

Before the update, I could launch a Windows Phone 8 project and it would run in any of my Windows Phone 8.1 emulators (I have several installed). After the update, the device selector in the Visual Studio toolbar didn't list my 8.1 emulators anymore (the only option was "Start"), and every launch would fail with HRESULT 0x89721800.

I reinstalled the emulators and did a repair installation of Visual Studio to no avail. I also found some posts suggesting that having Apache Cordova installed might be causing this, so I uninstalled that - still no luck. Eventually I found this TechNet article about a similar problem with Windows 8.1 Preview suggesting that this might be an issue with a corrupted Visual Studio data store.

The solution ended up being similar to Workaround #1 in the article, except that instead of the "10.0" folder, I needed to delete "%LOCALAPPDATA%\Microsoft\Phone Tools\CoreCon\11.0". After deleting that folder and restarting Visual Studio, everything worked just as it had before the update.

November 16, 2015

PneumaticTube Update

November 16, 2015/ E.Z. Hart

Last year around this time I created a little command-line application to upload files to Dropbox from Windows. Since then I've added a few features, so I figured it was time to post an update.

The biggest change is support for chunked uploading. Dropbox requires chunked uploading for files larger than 150MB, and even for smaller files it's a nice feature to have because you can get progress updates on your upload. This feature took a little work, because the original library PneumaticTube was based on, DropNet, didn't have any async support and its successor, DropNetRT didn't have any chunked uploading support. But after a little porting work, PneumaticTube can handle uploading large files just fine and now has a couple of different progress indicator options.

The other big feature change is the ability to upload a whole folder to Dropbox. This makes PT a much more convenient tool for backup operations. Right now it only supports the top level of a folder (no recursion), but that may change in the future.

There've also been several bug fixes. One of these bugs was somewhat surprising: the progress indicators during chunked uploading caused crashes under PowerShell ISE. I'm posting a little more about this bug because info about it was scarce on Google, and maybe this will help someone else who has a similar problem.

In order to show upload progress indicators in-place (rather than writing "X % Complete" on a new line over and over again), I use Console.SetCursorPosition to move the cursor back to the beginning of the line and overwrite the old value. It turns out that the PowerShell console in PS ISE isn't a real console; among the differences between the standard terminal and ISE is the fact that ISE doesn't support SetCursorPosition or other cursor operations. Calls to those methods will throw IOExceptions.

PneumaticTube now handles this case by trying to access Console.CursorTop in the constructor of each progress display implementation. If access to CursorTop throws an exception, the progress display classes stop trying to report progress using those methods. This keeps scripts which call PneumaticTube under ISE (or another console which doesn't support those cursor operations) from failing. I also added command line switches for disabling progress reporting entirely as a workaround for other possible console support issues.

If you want to give PneumaticTube a try, you can download it directly from the releases page or install it using chocolatey. Happy uploading!

November 09, 2015

Creating A Fake Me From My Emails

November 09, 2015/ E.Z. Hart

Some twitter robot or another got me thinking about Markov Chains the other day (in the text generator sense), and it occurred to me that it shouldn't be too hard to create one which (poorly) simulates me.

Markov chains are basically a set of states and probabilities of moving from one state to another. If you build one out of a body of text, you can map the likelihood of a given set of words following another set of words. The upshot of this is that if you start from a random set of words and follow the map (choosing your next state at each node randomly in proportion to its likelihood from the initial text), you can end up with something that sounds (sort of) like it came from the original body text. It's a popular way to create twitter bots. Markov chains have other, much more practical uses, but I'm not concerned about them today.

I've got 10 years of my emails already sitting in a .csv file; step one was loading them up and discarding all the ones I didn't send. After that, most of the work was cleaning up the data - most of the stuff in the bodies of my emails is actually pretty useless for this purpose. I had to remove all the quotes parts of other people's messages, all the HTML messages (even when cleaned up, they polluted the Markov chain too much), 'Forwarded Message' sections, URLs, and my own signatures.

After passing all the emails through the removeJunk function below, I globbed all the texts together into one giant string and fed it into this Markov generator from Amanda Pickering. With that done, I could just call generate_words() over and over to see what kind of nonsense fake me would spew out.

So here's my final code for taking my emails and creating a fake me, Black Mirror-style:

import pandas as pd
import numpy as np
import re
from markovgenerator import MarkovGenerator

# Read in our email data file
df = pd.read_csv('../bodytext.csv', header = 0)

# Only use mail I sent 
emails = df.query('FromEmail == "[my email]"').copy()

# Blank out any missing body text
emails.Body.fillna(' ', inplace = True)

# Regexes for truncating messages
# If any of these are found, the rest of the message is stuff I didn't write
quoteHeaderRegex = re.compile('On.*?wrote:', re.DOTALL)
originalMessageRegex = re.compile('^\s?\-.*?(Original|Forwarded) Message.*?\-\s?$', re.MULTILINE | re.IGNORECASE)
htmlRegex = re.compile('^\<html\>', re.MULTILINE)
googleReaderRegex = re.compile('^E\.Z\. Hart - Google Reader', re.MULTILINE)

# Other things in emails that aren't relevant
# If these are found, replace them with empty string
fromAndToRegex = re.compile('^(from:|to:|sent:).*?$', re.MULTILINE | re.IGNORECASE)
sigRegex = re.compile('^\-[\-\s]{1,4}E\.Z\.', re.MULTILINE)
dividerRegex = re.compile('\-{3,}')
urlRegex = re.compile('http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+')


def markovText(row):
    text = row['Body']

    if(row['Format'] == 'Html'):
        return ''

    return removeJunk(text)

def removeJunk(text):
    text = stripAfter(text, quoteHeaderRegex)
    text = stripAfter(text, originalMessageRegex)
    text = stripAfter(text, googleReaderRegex)
    text = stripAfter(text, htmlRegex)

    text = re.sub(fromAndToRegex, '', text)
    text = re.sub(sigRegex, '', text)
    text = re.sub(dividerRegex, '', text)
    text = re.sub(urlRegex, '', text)

    return text

def stripAfter(text, regex):
    target = regex.search(text)
    if(target):
        return text[:target.start()]
    return text

# Run all the emails through the cleanup function
emails['Markov'] = emails.apply(markovText, axis=1)

# Concatenate all the emails into one giant input string
input = emails['Markov'][:].str.cat()

markov_gen = MarkovGenerator(input, 200, 3)
markov_gen.generate_words()

And here are a few of my favorite phrases from the results:

"Use cheap rum. Cheap rum is going to get the crab wontons- otherwise I can't guarantee your safety:)"

"And more important than anything else, has been what has kept me employed and made me successful. Anyway, I'm glad you took your flashlight."

"We will begin working on the changes Tory has asked for, and I'll eventually start going full troll without her around:) Okay."

"You're receiving this email because you're rewriting 10,000 lines of code that solved the first two weeks of August while in between leases."

"At this point I'll need jumping and, ideally, that signs be installed? How would I go about making that request? Again, we completely understand if you don't have any SharePoint development experience, just experience as a user in each role just to make sure it was knitting and not crocheting- I don't know football that well, but any of them, they might actually turn into assets , though."

Have fun creating your own email doppelgängers, but remember - cheap rum is going to get the crab wontons. I can't guarantee your safety.

CodeWise

CodeWise

The CodeWise Blog

CodeWise

When Do I Send Emails?

Windows Phone 8 Emulator Issues After Visual Studio Update 1

PneumaticTube Update

Creating A Fake Me From My Emails

CodeWise