Viewing CSV Files Without Excel

An installation issue with Office 2010 on the computer I built last weekend (I've hit my activation limit for the product key) got me to evaluate whether an Office 365 purchase would be worthwhile. Thinking back, I realized that over the past 9 months the only time I'd opened up any office program was to view CSV files in Excel. That's not enough to justify $99 a year right now, so I'm going to hold off on a new Office purchase for a bit (2010 is still on my laptop if I really, really need it).

That bit of backstory is just in the service of pointing out that if all you need is a nice grid for viewing, sorting, and filtering CSV files, you can use Out-GridView in PowerShell. Just use Import-Csv to read the CSV file and pipe it into Out-GridView:

Import-Csv .\Batting.csv | Out-GridView

Example Out-GridView Output

Backing Up Bitbucket Code to a Synology

In my last post, I talked about getting my Synology to do a nightly backup of all my GitHub repos. Since all of them are public, it's a pretty simply process to clone them all locally. This time, I want to talk about Bitbucket, which is where I keep all my private repos. Because they're private, and because some of them are owned by teams, the process is a bit more involved.

I started with an excellent Bash script from the BibSonomy team which covers most of the bases; it can retrieve private repos along with all of their issues and wiki entries, and it works with both git and mercurial. My Synology runs BusyBox, which means that the default shell is Almquist, not Bash. Rather than install Bash, I just rewrote the script a bit so that it would work in ash - mostly this was just a matter of changing how the functions were defined.

The other challenge was that the original script assumed that the box it was running on already had the correct SSL CA Certificates installed for curl. This wasn't the case for my Synology, so I grabbed cacert.pem from the cUrl site and dropped it into the folder with the script. Then I modified the script to call cUrl with the --cacert cacert.pem option. It's quick and dirty, but it worked for getting this process up and running.

The script basically has four dependencies: cUrl, jq, git, and hg (mercurial). cUrl and jq were already installed on my Synology. Setting up git and mercurial is easy - just use the packages provided by SynoCommunity. The mercurial package added hg to my path; for git you'll need to add /volume1/@appstore/git/bin to your path or set it before you call the script (export PATH="/volume1/@appstore/git/bin:$PATH").

The full version of the modified bitbucket-backup.sh script can be found here.

To retrieve your personal repos, you can set USER_NAME to your Bitbucket user name and the API_KEY to your password before calling the script:

export USER_NAME=hartez
export API_KEY=[password]
./bitbucket-backup.sh

For teams, you'll need to generate an API key by going to "Manage Team" and selecting API key under "Access Management". Then you can retrieve everything for your team:

export USER_NAME=[team name]
export API_KEY=[api key]
./bitbucket-backup.sh

I do this in a script which handles all of my teams (4 at the moment), zips everything up, and cleans up older archives:

#!/bin/ash

cd /volume1/homes/ez/documents/bitbucket-backup

# Add git to the path temporarily
export PATH="/volume1/@appstore/git/bin:$PATH"

# Use the current date and time to name the working dir and archive
DEST=$(date "+%Y%m%d%H%M%S")
export BACKUP_LOCATION=$DEST

# My Stuff
export USER_NAME=hartez
export API_KEY=[password]
./bitbucket-backup.sh 

# Teams
export USER_NAME=[team 1]
export API_KEY=[api key]
./bitbucket-backup.sh 

#... 

# Archive everything
BACKUP_FILE="${DEST}.tar.xz"
tar -Jcf "$BACKUP_FILE" "$DEST"

# Clean up the working directory
rm -rf $DEST

# Clean up backups over 30 days old
find /volume1/homes/ez/documents/bitbucket-backup/*.xz -type f -mtime +30 -delete

That's all there is to it.

It's actually quite a relief to know that I've got all this code backed up locally and off-site. I don't really expect anything to go tragically wrong with either Bitbucket or GitHub, but until I got these backups running I hadn't realized how much psychic weight there was in that tiny, nagging worry that just maybe all my work would get lost. It's definitely worth a little time to set this sort of thing up, if just for the peace of mind. Hopefully these articles will make it a bit easier for someone else.

Update: I was checking up on my backup the other day and noticed that my personal projects (the ones which weren't under a group) weren't backing up anymore. It turns out that the above solution isn't compatible with personal accounts if you've got 2-factor authentication enabled. Since there's no option to set up an API key for personal accounts at this point, getting this working with 2FA in place would require jumping through a bunch of OAuth hoops. Hopefully application specific passwords will be implemented soon and I can move to using that feature; if not, eventually I'm going to be spending some painful hours dealing with OAuth. Either way, I'll put up another post detailing the process.

Backing Up GitHub Code to a Synology

One of the items that's been lurking on my todo list for a long, long time is setting up a nightly backup of all the code I have hosted on GitHub and Bitbucket. I'm getting more and more concerned with the stability, security, and longevity of cloud services every day. While I think both these organizations are doing an excellent job, it's always possible that something catastrophic will happen, especially if attackers become sufficiently determined.

The goal of this little project was to get my Synology (a DS414slim) to retrieve all my source code each day and include it in my nightly offsite backup (using CrashPlan). That way, there are three copies of my work in three different locations at any given time. An event big enough to make me lose all my source code at once is probably an event big enough that my source code no longer matters.

I'll start with my GitHub backup - it's the simpler of the two, since I only host publicly available code there (no private repositories). Because my Bitbucket repos are private, the process of retrieving them is a bit more involved.

I started with Petr Trofimov's GitHub backup script and made some modifications to get it working on my Synology. I'm using the SynoCommunity git package, so step one was to add the full path to git when calling xargs (I was too lazy to muck around with getting the PATH set correctly). Next was adding the per_page=100 parameter to the initial retrieval of the repo list. This assures that I get all my repos in one go, and don't have to deal with multiple pages. It'll break when I hit the 101st repo, but that's likely a long way off.

I hard-coded my username and the destination folders into the script to make things simpler, and (after a few tries) modified the tar command to use bzip2 to save some space. I also added a clean-up step to remove archives older than 30 days at end of the script.

Another big change is the string of grep -v commands which the list of repositories gets piped through. A few of the repos I've forked are pretty large, and I'm pretty sure that Stripe, for example, takes care of their own backups. So I've filtered out the bigger repos that I feel are probably safe from a major GitHub hiccup in order to reduce my bandwidth and storage a bit.

With this script running nightly in the task scheduler, I'm now comfortable that my hard work won't be lost if something goes terribly, terribly wrong with GitHub. And it was pretty easy to set up; I'm mostly annoyed that I spent so long with this sitting on my todo list.

Here's my full modified script:

#!/bin/sh

set -ex

USER="hartez"
API_URL="https://api.github.com/users/${USER}/repos?type=owner&per_page=100"
DATE=$(date +"%Y%m%d")
TEMP_FOLDER="backup"
TEMP_PATH="/volume1/homes/ez/documents/github-backup"
TEMP_FULL_PATH="${TEMP_PATH}/${TEMP_FOLDER}"
BACKUP_FILE="github_${USER}_${DATE}.tar.xz"

# Clean up previous backup stuff in case something went wrong (e.g. power outage)
rm -rf "$TEMP_FULL_PATH"

mkdir "$TEMP_FULL_PATH" && cd "$TEMP_FULL_PATH"
curl -s "$API_URL" | grep -Eo '"git_url": "[^"]+"' | grep -v "[Ff]ubu" | grep -v "bottles" | grep -v "stripe\.net" | awk '{print $2}' | xargs -n 1 /volume1/@appstore/git/bin/git clone
cd ${TEMP_PATH}
tar -Jcf "$BACKUP_FILE" --directory="$TEMP_PATH" "$TEMP_FOLDER"
rm -rf "$TEMP_FULL_PATH"

# Clean up backups over 30 days old
find /volume1/homes/ez/documents/github-backup/*.xz -type f -mtime +30 -delete

Next time I'll talk about the Bitbucket process, which took considerably more work.

Dumping SQL Server Tables to CSV Files (Part 2)

There's more than one way to do it isn't just a Perl thing; just about every automation task you can imagine can be done in more ways than you can imagine. I just posted a script the other day to dump tables from a SQL Server database into CSV files using PowerShell and the venerable sqlcmd Utility. One of the challenges is getting the various parameters for sqlcmd passed in properly from PowerShell, and I did it using an array of arguments and some awkward string escaping.

Today I learned that PowerShell already has the Invoke-Sqlcmd cmdlet, which is designed to (mostly) replace sqlcmd. By using Invoke-Sqlcmd, we can bypass all the argument array ugliness and write a simpler script:

$tableQuery = @"
SET NOCOUNT ON
SELECT sobjects.name 
FROM sysobjects sobjects 
WHERE sobjects.xtype = 'U' 
"@

$loc = Get-Location

$server = ".\SQLEXPRESS"
$db = "[dbname]"

$tables = Invoke-Sqlcmd -ServerInstance $server -Database $db -Query $tableQuery

$tables | select -ExpandProperty name | select @{Name="filename";Expression={Join-Path -Path $loc -ChildPath "$_.csv"}}, @{Name="query";Expression={"SET NOCOUNT ON`nSelect * From $_"}} | % {
    Invoke-Sqlcmd -ServerInstance $server -Database $db -Query $_.query | Export-Csv $_.filename -NoTypeInformation
}

Set-Location $loc

This is much more readable. The only part that might not be immediately obvious is the use of the $loc variable for storing the current location when we start the script.

The reason for this is that Invoke-Sqlcmd takes advantage of PowerShell's ability to treat everything kind of like a file system. PowerShell can change context to work within things like the registry as easily as it can change to different drive volumes (e.g., open PowerShell and try cd HCKU:, then try dir, then be very, very careful). In the same way, it can also work within SQL Server. Using Invoke-Sqlcmd will change the context to be the SQL Server instance, which is super convenient if you're doing a lot of SQL Server stuff.

But here we want to dump all the tables to files in the folder where we ran the script. So we need to keep track of that location (using Get-Location) and make sure that we have an absolute path for the output location of Export-Csv (using Join-Path). And, at the end, we want to make sure to return to our original location instead of hanging around in SQL Server space.

Even with that little bit of overhead, Invoke-Sqlcmd is a much cleaner and easier way of performing this kind of task.