LJ 2011-04-03 17:27:00

Apr 3rd, 2011 | Filed under LiveJournal Import

Being the master of data that I am, I have decided to embark on a project.

My project is – programmatically transfer the entire contents (including comments) of my journal to WordPress.

I’ll be honest – the weekly e-mails from LJ chronicalling the escapades of “Frank and Meme” genuinely persuade me that LJ and I are no longer compatible. I can’t explain it in any more detail than to say that I find the whole vibe weird and alienating, and I object to having my content stored in a place that I no longer feel comfortable with.

So the mission is to grab all 7,000 entries, and 20,000 or so comments, and forcibly inject them into a WordPress blog.

So the wp_posts table is easy enough, need to give it an ID (which auto0-increments anyway), some date bits (which I can easily grab from each LJ entry), content, title, an appropriate status, a name (which I can randomly generate), modification timestamp, a permalink URL (which can be derived from the autonumber ID), and a type. Plus a few other bits and bobs.

Then I’ll chuck a load of stuff into a reference data table that maps from the autonumber ID to the LJ ID, which then makes it easier to track between everything. Simple…

Tags can be brought across using the wp_terms table and associated stuff, as far as I can tell…

Then to comments, where I can map names, URLs back to that person’s LJ, their IP, timestamp, content, parent IDs, and so on. Again, it’s going to take reference data tables to map everything in properly, but it’s simple enough once those are in place..

I understand that something sort of like this exists already, taking LJ export XML files and doing this, but when last I checked (half a decade ago probably), that involved exporting stuff month by month. And I’ve been here for 128 months or thereabouts, so that’s impractical. It also wouldn’t cover comments, so why bother with that? More fun to build something myself :o)

Tags:
  1. Duane
    Apr 3rd, 2011 at 16:45
    Reply | Quote | #1

    It also wouldn’t cover comments, so why bother with that?

    I always thought that was a pain in the ass. LJ, being the social experience that it is, is largely useless without comments. It strikes me as odd that a satisfactory way to export the entire contents of one’s journal (comments included) hasn’t been made available by LJ itself. I’ve long wanted to do so with each of my accounts (excluding sock puppets…) in order to merge them in to a single account. The fact that you can probably merge them in to a single blog account elsewhere, but not on LJ itself is baffling.

    Besides the bit where all of the data gets cleanly and accurately posted to wordpress, didn’t you already have everything you needed to grab the contents of your journal in the way you’re describing here? I seem to recall each entry in AMA, along with comments, being available even after someone deleted it.

    What I’m asking is, are you building upon existing work, or did you start from scratch? If so, was there a technical reason for doing so?

    Also! Please tag entries in which you discuss projects like these? I’m fascinated and don’t want to miss them!

    • James
      Apr 3rd, 2011 at 20:18
      Reply | Quote | #2

      Yeah, it’s quite inflexible.. The advantage of WordPress is that if you run it yourself, you control the entire database structure that puts it together, so you can really do whatever you want. You can also edit the text of comments people have left… ;o)

      I’m building all this from scratch. The stuff in AMA was designed in a pretty different way.. The goal in AMA was that as soon as possible after a comment was posted, that fact would be recorded and a load of data would be downloaded. Emphasis was very much on working out how best to get hold of that data ASAP.

      So in that case, the way in which my database knew an entry had been posted was because AMA was structured in order to notify it. It was all very reactive. It was crucial to the system that if I went back and posted a comment on a two-year-old post, that should be recorded.

      In the case of what I’m doing here, I have to start from scratch and work out how best to get hold of all the data in one go, without worrying too much about capturing live data. So it’s going in steps:

      1. Interface with the Month view to get hold of a list of all entries ever posted
      2. Interface with the Entry view for each entry to get hold of its content and metadata
      3. Further interface with the Entry view to get a list of every comment on every post
      4. Further interface with the Entry view to download the content of each comment

      Another reason why it needs building from scratch is that it was only a couple of weeks before I left AMA that I even found a way to access friends-only content, so that has to be integrated into everything from the start.

      So I’m now at a point where I have the IDs of every entry I’ve ever posted in my journal. The process that grabs the data will also post it to the backend WordPress tables entry by entry, so in the next couple of days, it should sort itself out..

      As for tagging – if it all goes well, there won’t be too much content here for much longer, so you won’t have much to read :o)

      • Duane
        Apr 3rd, 2011 at 20:27
        Reply | Quote | #3

        it was only a couple of weeks before I left AMA that I even found a way to access friends-only content, so that has to be integrated into everything from the start.

        I wondered about that. In order to be sure you get all of your entries, Month View is going to need to show private & friends only entries, or the entire thing is bunk. It was something that would have been really handy in AMA, and its absence stood out at the time.

        The process that grabs the data will also post it to the backend WordPress tables entry by entry, so in the next couple of days, it should sort itself out..

        Do you expect any weird formatting issues to come of this? Or are your entries generally clean enough to make an elegant transfer?

        As for tagging – if it all goes well, there won’t be too much content here for much longer, so you won’t have much to read :o)

        =( Well, I can subscribe to your blog via RSS, yeah?

        After you’ve worked all of this out, are you going to try to find a way to monetize your clever work?

        • James
          Apr 4th, 2011 at 07:20
          Reply | Quote | #4

          It was something that would have been really handy in AMA, and its absence stood out at the time.

          Sadly, the way to do it involves using a very poorly documented feature, which made it kind of tricky. ’twas a shame, knowing what I do now, I could have made the whole thing much better..

          Do you expect any weird formatting issues to come of this? Or are your entries generally clean enough to make an elegant transfer?

          I don’t expect any formatting issues whatsoever. The great thing about LJ is that the use of friends pages etc. means that only utter cretins put fancy formatting into their posts, and I’m not one of those. So I don’t have any entries where I hardcoded a pink font colour into the entry itself, on account of how anybody whose friends page has a pink background would therefore be unable to see it. The format promotes clean formatting, so it should all be fine. Probably.

          Well, I can subscribe to your blog via RSS, yeah?

          I’ll probably have something that crossposts entries or similar – I wouldn’t want to abandon my audience after all ;o)

          As for monetising the whole thing, nah.. That would require me to build it in a way that others could use, and I haven’t the patience for that… :o)

  2. Mr Flagg
    Apr 3rd, 2011 at 16:49
    Reply | Quote | #5

    it would be easier to move the to dreamwidth and then get emails about their busted ass coding efforts