Archiving Your Ficlets.com Stories

(How to "Save Your Babies")

By Robotech_Master (aka Chris Meadows)

Ficlets.com goes dark on the 15th. You don't have much time left to save your "babies".

If you were like me, then you might have groaned at the daunting task before you. How were you going to pull down all of your ficlets, including those marked "Mature" which can't be read unless you're logged in?

What about the ficlets you write in cooperation with someone else? Or the stories you had no part in but liked enough to want to keep around?

Well, I've found a solution, and that solution's name is HTTrack.

HTTrack is a free, open-source web spidering application, available for most desktop platforms, which has options to set very detailed filter rules to limit your spidering to just what you want to retrieve. And through its proxy capture system, it can "borrow" your login information for ficlets and pretend to be you, logged in, while surfing the site—thus, it can suck down all the "mature" ficlets you want and archive them on your hard drive.

Through extensive trial and error, I've managed to come up with a set of rules that will fetch all the stories I want and not too many that I don't want. And as the doom of Ficlets draws nigh, I figure it would be best to get this slightly imperfect set out there now, so people can save their stuff right away, and perhaps worry about refining it later. If anyone who knows HTTrack better than I do can send me tips or corrections, I'd be thrilled to update this post with them.

This is going to be a bare-bones sort of guide; if I don't mention some option that's on the screen, just leave it set where it is. Also, I'm assuming a certain level of technical know-how in the reader. Unfortunately I don't have the time to help everybody who needs it—this tutorial is the best I can do. If it's too complicated for you, hopefully you can find someone else who will understand it.

Setting Up

First of all, download and install HTTrack. There are versions for most OSes, so get the one that's right for you.

Create a new project. Name it whatever you want. Click "next".

The next thing you need to do is capture your login URL, so that HTTrack can pretend to be you. Someone else has already written a tutorial on this part, which you should refer to. I'll summarize it here.

Note: If you originally posted your stuff from an OpenID based account, you may need to create an AIM account for this. I have not been able to log in successfully with either of my OpenID based accounts for several months now. If you create a new AIM based account, be sure to go to the Profile and turn the "View mature posts" option on. You will still be able to capture all "Mature" content you posted, though you will not be able to snag private things like your notes.

  1. Go to your web browser and navigate to ficlets.com. If you are signed in already, click "Sign Out".
  2. Navigate to your "My Ficlets" page. (It won't be "My Ficlets" for you when you are not logged in, but it will still be a page with your ficlets in it.) The URL will be something like http://ficlets.com/authors/robotech.
  3. Next, go to the Ficlets login page, click "Sign In Now" under the "AIM" section, fill in your userID and password, then don't click "Sign In". Leave the page alone for now.
  4. Go back to HTTrack and click "Add URL".
  5. On the form that appears, click "Capture." Ignore everything else.
  6. Go into your web browser's proxy configuration panel and set your proxy to the proxy HTTrack gives you. (It will probably be the IP of your own computer, port 8080.) If you're not sure how to do this, open a new browser tab or window and google on "[your browser name] proxy settings". Save the setting.
  7. Once that setting has been saved, go back to your web browser session with the Ficlets login up, and click "Sign In". You should be directed to a page that tells you your information has been captured. However, you may see the somewhat ominous "Sorry, at this time, the system is currently unavailable. Please try again later" message appear in the AOL login box instead. This is because of the way that page is set up. Either way, if a long URL appears in the "Insert URL" form in HTTrack, it probably worked.
  8. If the URL has been captured, you can now go back and return your browser's proxy setting to whatever it was.
  9. Go back to HTTrack and click "OK" on the "Insert URL" form. Your captured URL will show up as a big long URL in the URL list window.

Next, you will want to insert the URLs from which to start spidering.

I had hoped to be able to just put the authors page, i.e. http://ficlets.com/authors/robotech, and have it spider from there—but for some reason, I've not been able to refine my rule set enough to make it follow the 1, 2, 3, 4… page links at the bottom of the page. (Maybe someone else can help me with that.) So for now, you need to include the URL for each page of your ficlets list. For example:

http://ficlets.com/authors/robotech?page=1
http://ficlets.com/authors/robotech?page=2
http://ficlets.com/authors/robotech?page=3
http://ficlets.com/authors/robotech?page=4
http://ficlets.com/authors/robotech?page=5
http://ficlets.com/authors/robotech?page=6
http://ficlets.com/authors/robotech?page=7
http://ficlets.com/authors/robotech?page=8
http://ficlets.com/authors/robotech?page=9
http://ficlets.com/authors/robotech?page=10
http://ficlets.com/authors/robotech?page=11
http://ficlets.com/authors/robotech?page=12

Just replace "robotech" with your author identity (or the identity of the author whose posts you want to grab) and put the URLs for as many pages of that author as there are.

If you want to grab some story in which you had no part, add the URL of a single post from that story. It doesn't matter if it's the beginning, middle, or end; HTTrack will spider it in both directions. For instance, http://ficlets.com/stories/36096.

Setting the Rules

Once you've added all the URLs you want, click the "Set Options" button. Click on the "Scan Rules" tab (second from the left, after "Proxy"), make sure all the boxes are un-checked, and paste in the following rules.

Note that some of these rules may be redundant, unnecessary, or even counterproductive—but again, this is what I came up with in a day or so of trial and error, and it fetched all my stuff without fetching too much stuff that wasn't mine. I'd be happy if someone could refine it further for me, but for the emergency situation of the moment, I'll live with it.

-*ficlets.com*
-peopleconnectionblog.com/*
+*.css -ad.doubleclick.net/* -mime:application/foobar
-http://ficlets.com/stories/
-http://ficlets.com/stories
+*stories*
+*notes*
+ficlets.com/stylesheets*
+ficlets.com/authors/[your name]*
-*signin*
-*calendar*
-*page*
-*favorite*
-*authors*
-*drafts*
-*clippings*
-*inspiration*
-*compose*
-*tags*
-*report*
-*edit*
-*blog*
-*reply*
-*feedback*
-*feed*
-*help*
-*delete*
-*ratings*
-*contacts*
-*sitemap*
-*sunset-util*
-http://ficlets.com/author/edit
-*aol*
-*aim*
-*circavie*
-http://flickr.com/
+*.gif +*.jpg +*.png +*.tif +*.bmp
-*flickr.com*
+*static.flickr.com*

Replace [your name] above with, naturally, your name (or that of the author whose works you are archiving, if not you). Include this listing once for each separate author you are archiving.

Apart from these rules, you will also want to add rules to cut out any links in your profile. For instance, my profile lists my homepage as http://terrania.us/talkshoe so I added a -*terrania* rule too. Technically, this shouldn't be necessary if you instruct HTTrack not to follow external links—but for some reason it follows them even when I tell it not to, so better safe than sorry.

What these rules are supposed to do is cut out every part of the Ficlets site from being spidered except for your personal content and your ficlets, and any ficlets that are linked to yours no matter at what distance. For instance, if you wrote the last ficlet of a hundred-ficlet chain, it will trawl that chain all the way back to the beginning, but it won't branch out into other ficlets that the authors of other ficlets in the chain wrote, or other ficlets that were written that day, or whatever.

At least, that's how they're supposed to work in theory. And to be fair, they do work pretty well. The one problem is that, despite the rules I've put in, I've not been able to prevent HTTrack from jumping to the "home" page (at http://ficlets.com/), and spidering the featured, today's, and your contacts' recent ficlet chains there. On the one hand, a little extra stuff is a small price to pay for getting all of your stuff, even the "mature" bits, safely on your hard drive. On the other hand, if someone could figure out how to eliminate that without losing any desired stuff (or, for that matter, figure out how to make it so I don't have to spider from each of my contents pages), I'd really appreciate it.

The above rules will fetch your notes and comments pages, but are designed to exclude "Clippings" and "Favorites" because when I looked at mine, they seemed to be full of a lot of junk I had no memory of putting there. If your clippings and favorites are precious to you, remove the -*clippings* and -*favorite* rules.

Other Options

Here are some other options to set:

In the "Limits" tab, set "Maximum External Depth" to 0. (Technically it should be set to that anyway, but it keeps spidering external content along with the internal, so I don't know.)

In "Links," put a check mark next to "Get non-HTML files related to a link."

In "Build" check "No external pages".

Click "OK". Then click "Next" and "Finish" and the spidering should begin.

Spidering

The spidering process may take a few hours. You might want to plan on doing this overnight so it will be done by the time you get up in the morning. After the spidering is complete, your webpages will be stored in a hierarchical structure in the "My Web Sites" folder (or wherever you told HTTrack to store them when you installed it) and you'll be able to browse them on your browser just as if you were viewing them on ficlets.com.

Good luck! And don't panic too much even if you can't get the above to work…I've passed on the information about HTTrack to Kevin Lawver, and he will hopefully now be able to snag everything, even the "Mature" ficlets, for his "ficlets graveyard."

"Requiem for Ficlets.com" on TeleRead.

Return to Robotech_Master's homepage.