Tagline

The Studio of Eric Valosin

Tuesday, February 24, 2015

(Web) Crawling Out From Between a QR and a Hard Place

So you want to make a QR code that points to a randomized website, eh?

So did I, and when I was in grad school for my MFA I hacked together a number of somewhat unstable solutions. I recently overhauled my approach to this challenge, when bits of my artwork started falling apart around me and I found I needed to take matters into my own hands. Disclaimer: I'm not a programmer by trade AT ALL, but just a quick learner who gets obsessed easily and can share what knowledge I've accumulated. Here's a bit of a narrative tutorial that can help you learn from my trial and error.

----

Now then,
If you're looking to make a random QR code there are 3 things to consider:
  1. the destination url - how to get a single url to redirect to another randomly selected url (the hard part)
  2. the short link to that url - in order to make the destination url short enough to conveniently fit within a QR code's character limits
  3. The QR code itself - translating that url into the pattern of black and white squares. (the easy part, sort of)
There are a few ways to do this, depending how hacked and how proprietary you want the solution to be. 

THE EASY WAY... (aka learn from my mistakes)


...is simply to find someone else's pre-existing website that has a link to a random website, and copy their link address into a QR code generator, and there you go. When I first created Meditation 1.1 that's what I did. It works excellently in the short term but it ended up giving me a lot of issues in the long run because I didn't have any control over the back end of things. 

Problem 1: Last summer, during an exhibition, it came to my attention that my piece was leading people to error 404 messages. As it turned out, the QR code generator I had used (sparqcode.com) was bought out by another company and then migrated and dissolved. The code itself was still scannable but because the QR code generator automatically shortened my destination url to its own proprietary short link (to get it to fit within the dimensions of their QR code), when the company went down it took all its short links with it. Even though my destination url (the randomizing link) was still functioning, the QR code no longer pointed people to it.

Solution: I ended up having to choose a different QR code generator (qrstuff.com) and remake all my QR codes (a huge headache when those codes are hand drawn into artwork and published in print...)

Problem 2: Once I reconnected my codes to the destination url,, the person who ran that website whose randomizing link I used (the now defunct randomwebsite.com) decided to shut down his website, so that link went dead (in the middle of another exhibition of mine, I might add!) My QR code still functioned, and the short link encoded into it still worked, but this time when viewers landed on the destination url that was supposed to redirect them to a random website, they'd get another a error 404 message. 

Solution: found a new randomizing destination website link (like uroulette.com for example), and start all over - new QR, new short link, etc.

Problem 3: This wouldn't be so bad if you could simply adjust the destination url, keeping the short link the same (that way the link encoded into the QR code doesn't change and you don't have to redraw the code, it just points somewhere different.) However most free QR code generators online will automatically shorten the link you enter to make it fit into the QR code, but they don't let you adjust the destination url that the short link points to. When my randomizing destination url went dead, I had no way of changing it, thus I had to entirely remake the QR code every time.

Solution: I switched from relying on a QR code generator's proprietary link shortener, to using the goo.gl link shortener, which lets you adjust the destination url each short link points to. That way if randomwebsite.com goes down, I can switch the link to uroulette.com without changing the QR code. I also figured Google's going to stick around longer than, say, bit.ly or tinyurl.


THE HARD WAY (aka, the real way to do it)


That's ok if you don't mind some instability, but with all these things falling apart, I really wanted to take all of this into my own hands to get rid of the proprietary uncertainty. So I decided to code my own randomizing url, link shortener, and QR code generator from scratch and host them on my own website, so nobody can mess me up but me. This is the solution I now use:

QR Code Generator: Coding this is the easy one, because you don't need to do it! In my research I've learned that a QR code is a QR code is a QR code, and it doesn't matter who makes it. Once its made it will always be readable even if the generator goes out of business (like a poem will always be readable even if the author dies.). Besides, to actually translate the url into the black and white pixels yourself involves a ridiculous amount of advanced algebra and calculus. Seriously. I tried, and it's not worth it. (if you're curious, you can learn about it here). So just use someone else's QR code generator, but don't let them shorten your link (do that part yourself)

Link Shortener: This one's not too bad either. There are open source programs you can download, like the php/mySQL based yoURLs. This works great for most people and puts all the control in your hands, letting you adjust anything you want, host it on your own website, and even make it public for other people to use (creating your own bit.ly etc) if you want. For some reason it didn't work for me though (some unidentifiable configuration discrepancy, I don't know).

My workaround: The maximum character limit for a QR code varies by the dimensions of the code and error correction level, but for what I wanted It just so happened that the domain I already had was short enough that I could use it, with an extension of up to 3 other characters and still have it fit within a 25 x 25 QR code grid. (i.e. http://mydomain.com/123 does not need to be shortened further).

So I created a subdomain on my website that would be used specifically just for this hacked link shortener, and filled it with pages titled with no more than 3 letters, each holding a php header redirect script that redirects to whatever given destination url I want. (I also had to add an .htaccess file on my site that gets rid of the ".php" in the url to make sure it didn't get too long -"http://mydomain.com/123" instead of  "http://mydomain.com/123.php") 

If I want to change where the short link points, all I have to do is open the file and adjust the redirect code. I designated one of these "short links" on my site to where I had stored the following web scraper that does the heavy lifting...

Random Website: This was the hard part, really. One possibility is to create a program that essentially compiles random letters and numbers until it stumbles onto a valid url (the monkey typing shakespeare approach), but that hardly seemed practical and exceeded my knowledge of coding. So, unless you want to hand compile an exhaustive list of all possible extant websites and enter them into your array of possible sites to redirect to randomly, you have to rely on preexisting lists. There's really no such thing as an exhaustive list of valid urls, really, but I've found a few useful, legit websites that are trying their darnedest to index the internet. the DMOZ project has been around for a decade or so and seems pretty stable, and it catalogs thousands and thousands of sites and organizes them by category. In fact, I discovered that uroulette.com actually uses their database to point viewers to the random site. There's also the Internet Archive (which is like a time capsule of the internet) and The Alphabetical Web Directory (which is a little easier to search but seemed less extensive).

So I decided to go straight to the source and build a php web crawler that scrapes all the data off the DMOZ home page and looks for links. It picks one of them at random and uses a regular expression to decide whether that link is external (to a site outside of the DMOZ directory) or internal (another subcategory within the DMOZ directory). If it's external, it redirects the browser to that site (ta-daa!!). If it's internal, it follows that link to the next page and scrapes it all over again looking for the next randomly selected link, repeating the process further down the rabbit hole until it eventually lands on some random external link and redirects there (ta-daaaa!!!). I still need to tweak the code a bit, but it works pretty well for the most part.

FOR DETAILS ON HOW I BUILT THIS WEB CRAWLER, CLICK HERE

And there you have it! If any part of any of this breaks on me now - if DMOZ shuts down, or if my short link goes dead, etc - I can adjust any part of this process inside the code on my website without effecting any of the QR codes pointing to them!

SO:


If you want less of a headache, go with the easy route (good luck finding one though). But if you want something more stable or are a more competent programmer than me, I advise the hard way.

GOOD LUCK!

No comments:

Post a Comment