As an administrator of a 37signals account, I’ve wanted to get my hands on all the data from their services numerous times. Well, now we are moving from 37signals to Google Apps and I NEED to get my data out. Why are we moving? Numerous reasons.
- We use Fogbugz, Trello, and Google Calendar. These 3 tools essentially render Basecamp useless.
- Campfire is a distant second to Hipchat for team collaboration.
- Highrise isn’t something we need for our business.
- Backpack in the hands of end users has been by far one of the most confusing and frustrating services we’ve ever used. A standard Mediawiki install was the first attempt we made for this kind of service. Next was Confluence (gah). Finally, Backpack. Now we’re using Google sites and Google docs. None of these are perfect, but at least Google works.
Regardless of our reasoning. Let’s start by getting data out of Campfire. For whatever mysterious reasons, 37signals doesn’t have a feature to export any data from Campfire. Contacting their customer support is usually a dead end, they won’t help. It surprising they would prefer that you hammer their servers like my example below instead of simply providing a “zip all my campfire history” link and running the zip offline.
Do not fear, there is a simple solution. It involves three things.
Create a blank folder somewhere on your computer. Put a copy of wget.exe into this folder.
Log into your 37signals Campfire account. Once logged in, click on the Cookie.txt export button in Chrome.
Copy the contents of the window that pops up from the cookies.txt plugin into notepad. Save this file to the same folder you put wget.exe into as (surprise!) cookies.txt.
- While logged into your 37signals Campfire account, click on the “Files, Transcripts & Search” tab.
- Select a single room from the “Which Room?” Dropdown (NOTE: This will be time consuming for you if you have LOTS of rooms. We only had about 7-10 rooms.
- Click on the topmost “Read the transcript” link. This will open a separate page to the most recent activity in your room. Copy this ROOM URL somewhere, you’ll need it later. It will look something like this.
Open up a command prompt. (Windows+R, cmd, enter.)
Change into the folder that your created. (“cd\” enter, then “cd [your folder]”)
Type or paste the following wget command (it’s case sensitive!):
wget -e robots=off -x -m -L -E -p -k --no-check-certificate --load-cookies cookies.txt [Your ROOM URL]
Your final wget command line should look like this:
wget -e robots=off -x -m -L -E -p -k --no-check-certificate --load-cookies cookies.txt https://[your company].campfirenow.com/room/######/transcript/2012/05/16
wget will run for a while (depending on how many files and days you have in your individual transcripts)
Once done, you’ll be able to navigate to the folder you created, drill down into /folder/[your company]/room/######/transcript/year/month/##.html to view an individual day in your transcript. All your images, stylesheets, and links should work just fine.
Repeat Steps 4-5 for each room you want to download.
Here is a summary of the wget command line options we are using:
- -e robots=off: This command works around a the Robots exclusion standard that would otherwise cause your wget command to halt after getting a single file.
- -x: Forces directory creation to match the 37signals site layout. Not using this would cause all the files you download to land in a single folder. If you don’t use this command, you’re going to have a bad time.
- -m: This command instructs wget to mirror the entire site. This is why we drill down to the individual ROOM transcript URLs instead of just going to campfire.com and wgetting from there. Otherwise, you’ll end up downloading the entire Campfire site to your machine. Not so useful.
- -L: Forces wget to use only relative links. This way it won’t jump to that youtube video somebody linked in your transcript and download all of youtube.com
- -E: This command adds .html to files that are detected as HTML. Campefire’s actual transcript files don’t have a html extension on their server, so this command is necessary for ease use later.
- -p: This command forces Wget to download all necessary files to display a given HTML page. CSS files, images, etc…
- -k: Converts all the links in the downloaded files to make sure everything works locally.
- –no-check-certificate: Since we are using HTTPS, wget isn’t setup to to verify 37signals security certificate. This will bypass requiring the SSL cert to verify.
- –load-cookies cookies.txt: Again, since we are using SSL. We’ll need to be logged in with valid credentials in order to access the site. This step essentially uses the cookies you use when logging into Campfirenow.com and passes them into wget.
- [ROOM url]: This tells wget where to start pulling files. Always start from an individual ROOM transcript. Not the main “Files, Transcripts & Search” tab.
- Wget downloads a login.html page and nothing else: Check your cookie credentials, 37signals might have logged you out. You might need to re-login to your Campfire account and store the cookies.txt again.
- Wget downloads a single file and stops: Ensure the command line -e robots=off is included. If you don’t, wget will find a robots.txt file that will halt further recursion. Some versions of wget apparently do not have this feature in it, so be sure you locate one that has a -e option available.
- Campfirenow stopped responding and blocked me temporarily. If 37signals detects large content slurps in the future they can potentially block activity. If this happens, I suggest adding the –random-wait command line option. This should keep their site from exploding and stop you from getting blocked.
Next up, getting data from Backpack and Highrise. A bigger challenge indeed.