Wikipedia Offline

Seems like the wikipedia's having a problem. Just when I was gearing up to do some scraping... bummer.

wikipedia offline

In lighter news, the classic search engine Altavista was recently hacked. I visited night before last and their portal page had been blanked and replaced with a single heading 'do you yahoo?'.

[read on...]

Sexbot v2.0

This is genius. Unmitigated freaking genius. Mad props to whoever this is.



[read on...]

Google based thesaurus

I was thinking today about language and grammar recognition by machines, as used for auto-translating, document rewriting, etc. I need a thesaurus of phrases for rewriting documents. I'm trying work out how a bot could compile one and it occurs to me that google has a huge database of English text from which to derive rules.

Suppose you were to search for ''A banana is a'' (with the double quotes). Taking only the sentences which begin with that phrase. Google returns results containing:

a banana is a banana
a banana is a fruit
a banana is a tropical herbaceous plant
a banana is a good source of water
a banana is a tropical fruit
a banana is a phallic symbol
a banana is a monoecious plant
a banana is a healthy snack

If a bot trims out the rest of the sentences then this can be used to create relationships between nouns.

banana --> fruit
banana --> phallic symbol

If this is done for other nouns we might get:

apple --> fruit
apple --> computer

So having done this for a bunch of words we can make a list of things 'which are' fruit. So far we've got 'apple' and banana', and we can do some text substitutions. 'she was eating a banana' can be substituted with 'she was eating a fruit'. If we substitute 'she was eating a phallic symbol' it's still gramatically correct (and sounds kinda sexy) but we've lost the meaning of the original phrase. Which is no good if we're rewriting document that humans will read. So how's a computer going to know which is the better substitution?

It's a tough. My best answer at the moment is to have the bot see what humans use more often, ie, Google both terms and see which comes up more often.

'she was eating a fruit' => 95
'she was eating a phallic symbol' => 0
'she was eating a snack' => 164
'she was eating a monoecious plant' => 0

Now we have a score for each substituion. To make a general case for each word (so as not to have to search each time, and because many phrases will not exist at all) we could search nouns against verbs for proximity and the number of Google matches will be the score for how appropriate they are to each other.

''eat * banana'' OR ''banana * eat'' => 142,000
''eat * snack'' OR ''snack * eat'' => 269,000

We can also test verb substitutions from a regular thesaurus this way. For example Roget's lists nosh, chow and masticate as alternatives to 'eat'.

'eat * banana' => 124,000
'nosh * banana' => 13
'chow * banana' => 303
'masticate * banana' => 4

So 'chow' is the most likely substitute for 'eat' out of these three (personally I prefer masticate) but it's not a very common switch. If chow and eat had simmilar scores (say within 66% of each other) then that would likely be a better substitution.

Ultimately I'd like to be able to make a bot rewrite text into infinite permutations retainging the original English (human) meaning as well as some of its nuance.

I'm sure it's possible, I'm not sure how. Try 'He attended MIT to study' (remove object from sentence). Googling for 'He * MIT to study' gives:

he was at MIT to study
he was accepted at MIT to study
he came to MIT to study
he entered MIT to study

As well as a buch of bad substitutions, most of which can be filtered out by context. 'he returned to MIT to study' would be harder for a machine to spot as a bad substitution because it changes the meaning.

Thinking.... thinking.... thinking....

Any thoughts or ideas, email me!

[read on...]

National Gorilla Suit Day

National Gorilla Suit Day, which mysteriously falls on January 31 of each year, is perhaps the important holiday of the year. Every National Gorilla Suit Day, people of all shapes and colors around the world get their gorilla suits out of the closet, put them on and go door-to-door.
That's really all there is to it. You don't have to buy gifts. You don't have to fast, although some Orthodox Gorilla Suiters do. If you want to have a parade, fine. Just make sure all the marchers are wearing gorilla suits and that all the balloons are giant, inflatable gorillas.
– Mark Evanier

Now that's a holiday worth celebrating!

from the Daily Monkey

[read on...]

The Warlike Percys

I just read something wonderful in the Wikipedia article for the town of Alnwick:

The history of Alnwick is the history of the castle and its lords, from the days of Gilbert Tyson, variously known as Tison, Tisson, and De Tesson, one of the Conqueror's standardbearers, upon whom this northern estate was bestowed, until the present time. After being held by the family of De Vesci (of which the modern rendering is Vasey — a name found all over south-east Northumberland) for over two hundred years, it passed into the hands of the house of Percy in 1309.

At various points in the town are memorials of the constant wars between Percys and Scots in which so many Percys spent the greater part of their lives.

Wars between the Percys and the Scots. I dont know about y'all, when I imagine someone called Percy the picture is of a small, effete man in a tuxedo. Someone with unfashionable glasses, a high nasal voice and a penchant for stamp collecting. Maybe I'm being terribly unfair.

Now imagine them at war. A green field under an overcast Northumberland sky. Stretched out across it is a single line of thousands of Percys, all in their tuxedos and clutching their fencing rapiers or whatever weapons they own. They're nervously singing the Percy family hymn and stuttering.

Charging down the hillside on the other end of the field are two thousand red haired, kilt wearing, half-drunk Scots. They're shouting and swearing in gaelic over the noise of fifty clan pipers on the hill behind them. A muddy, sweaty wave of wildmen sweeping onto the field, armed with rocks and sharpened poles.

It may have been a short battle. AaaarrrrrGH!

[read on...]

ballot eating

Turns out that it's illegal to eat your ballot card in Canada:

Q: Is someone allowed to eat a ballot?
A: Eating a ballot, not returning it or otherwise destroying or defacing it constitutes a serious breach of the Canada Elections Act.


[read on...]

one xbox 360


silic0nsilence: So it's black friday at CompUSA.
Slider: Yea
silic0nsilence: We were to open up at 12am. It's 11:58pm and there is a HUGE line of blood-thirsty, hard drive-wanting, maniacs. So my friend dares me to scream we have one xbox360.
Slider: Holy shit.
silic0nsilence: So he gives me $20. I go up to the gate and scream, 'LADIES AND GENTLEMEN, WE HAVE JUST RECIEVED ONE XBOX360!!' Immediatly people are storming the gate, passing me money through the cage to get it. They were screaming and knocked over this old lady. My boss just looks at me with these red eyes. In them, I saw fear and rage.
Slider: Omg you dumb shit!
Slider: Wait a second, it's 12:46A, and it's black Friday. What did this happen minutes ago? Shouldn't you be at work?
silic0nsilence: Yeah..
silic0nsilence: Pretty sure I don't work at CompUSA any more..

[read on...]

make your own banner

Hey all, after some tinkering I've put the webcams back online. For those who haven't used them before, these are random Panasonic webcams from around the world that can often be controlled (pan/tilt/zoom) from an interface on this site.

I've had a lot of fun with them and you can see some amazing things if you're patient. Figure out where they are in the world, work out the time difference and wait to see sunrises, gridlock, wildlife, etc.

There is now also an option to set this site's banner to a still from any cam, have fun :-)


[read on...]

Golfland Terrorist Threat

The US Department of Homeland security was put on alert earlier this week as reports surfaced that Al Quaeda, Islamic Jihad and other terrist organisations were planning a major attack on, er, Golfland, in San Jose.

The three acre miniature golf course, described as San Jose's equivalent of the White House or Sears Tower is an obvious target for those who wish to do America lasting, devastating harm.


The Department of Homeland Security's crack squad of anti-terrorist intelligence analysts have been vigilantly guarding a miniature golf course near San Jose, California, having identified it as a prime target for an attack on America. Imagine the symbolism of a miniature windmill in flames -- truly such would be a spiritual blow from which America could never recover.


[read on...]

Happy Gray Tuesday

Today is Dean Gray Tuesday, a day for angst and the dowloading of free music. So for the next couple hours, I'll be getting to the spirit and sharing with you some illegal MP3s here or you can go to
to find more mirrors of this years album.

Fuck Warner! Up the Revolution!

[read on...]

Retro Cell Handset

I am so getting one of these! Apparently it's a japanese fashion trend; get a big old handset and have it slung on a belt hook for your cell phone. Wonder if it can be modded into one of those styling 1960s handsets, like with a metal rim. hmmm...


This and other fun peacocking stuff at:

[read on...]


This was on

Primus521: hey dude the funniest thing happened to me today
Primus521: im at walmart and this chick is buying a box of tampons and they are missing the upc and wont ring up
Primus521: so the cashier tells his buddy to get a price check on tampax
Primus521: the dude looks at him and says, ''the kind u push in, or the kind you hammer in?''
Primus521: lol
Primus521: turns out he misheard him
Primus521: he thought he said thumbtacs
Primus521: you should have seen the look on the chicks face

[read on...]

Customisable Google Logo is a site which allows users to customise the google logo, like so:


Its freaking genius. The ability to change the logo is so-so as far as cool things go, technically easy, not amazing. As a business idea its totally awesome, here we have a way to take advantage a neat little meme to harvest money from google, simply by putting ads in the same place google does, and charging them for their own adsense :-) or yahoo ads! And its hella good viral marketing, because it takes advantage of poeple's exhibitionism and hubris, they're going to work hard to drive people to your site to see 'their' logo. I absolutely love this site, for sheer chutzpah and business sense. Its technically boring but psychologically brilliant as a get-rich-quick scheme. I need to come up with some ideas like this.


[read on...]

BBC Documentaries as Torrents has a huge listing of fresh documentaries. This is like finding bittorrent gold :-) Not that any of you good people would download an illegal BBC documentary of course... You'd buy a television, an then buy an ariel and a freeview box, and then pay a licence fee, which is used for god knows what. You wouldn't use the internet to get hold of something your tax money payed for...



UPDATE: May 2007, newnova is no more, however there is now a huge array of downloadable BBC and Channel4 documentaries available on Google Video.

If that doesn't suit you list over 600 files for the phrase 'BBC Documentary'.

[read on...]

Chinese Print Flip-Flops

This company called Unica Home makes flip flops that you press out of a printed sheet of foam. Nothing new about the product, but the designs are beautiful, check out this print made from chinese newspaper ads... Link


[read on...]

Dean Gray Tuesday - 13 December 2005 will be participating in Dean Gray Tuesday, an event to protest the censoring of mash-up albums (an excellent and worthy form of art) by music labels, and to flip the bird at all those persons and organisations who would use the law to restrict creativity, art or science. So, to artists, musicians and my fellow man: you kick ass! great to share the world with you. To the RIAA, MPAA, Warner Bros, Sony, et al, suck on this:


Tracks will be available for download for one day only. It truly blows my mind that that original and authentic art can be illegal due to quirks in copyright law. There are whole genres of music created by the skillful re-editing and mixing together of sounds from various sources, including maintream tracks. People make this stuf for the love of doing it and release the albums for free on the 'net, and it's illegal just because the big studios dont like it and start harrumphing about licences and uncleared samples and garbage. I call bullshit on that.

link to

[read on...]


Something that's always bugged me a little are the censorship functions on search engines, those righteous little options with names like 'Family Filter' or 'Safe Search'. Little munchkin hats for search engines to tip at puritans. I dont think it's good to blinker people so that they can maintain a fantasy of living in some 1950s white picket fence world, better to expose people to the real world so they can love life for what it really is.

Anyhoo, so I wrote a script to search both with and without the family filtering, and then return only those results that are 'potentially offensive'. I was amazed. Google's SafeSearch(tm) is one wacky piece of kit. Offensive inappropriate pages it's protecting people from include The Children's Hospital of Philadelphia, The UN International Court of Justice and this Wikipedia article about kittens. There were also a few consipracy kook sites about secret government censorship, happy crazy irony there, slightly more profound, the Feminists Against Censorship was taken out of the 'safe' results. Hmmm...

Of course, the script has practical uses as well, it produces near perfect adult searches, uncluttered by irrelvant vanilla stuff. Just the thing when you're googling for... um... treasure. Porntastic!

script here. go play...

[read on...]

Going Camping

Summer is here! It's wet and rainy and awesome. Surfing, beach parties, topless sunbathing. Of course, none of there things are happening at my house so I'm moving down the camp site until further notice. So call me on my cell phone, not my land line. Bye.

[read on...]


Yeah! Whooo! There is this most kickass electrical storm going on outside. It's out in the sea to the southwest, clear, dark night with a big cumulus clouds rolling in towards the island, kicking off sheet lightning up and down the front, disappearing in and out of the thunderheads, lighting them up. It was a perfect moment, standing ot in a field at the top of the cliffs, grass up to my knees whipped up by this warm dry wind the storm was pulling into it, drinking a warm can of Bav NA (a Dutch non-alcoholic beer, I'm about the only person I know who likes it), dark sky overhead, lighthouses behind me and to my left, and these amazing shapes lit up in the clouds. Awesome.

[read on...]

Bots Broken

Looks like the ad remover's broken again. This is about the fourth time in a month, Yahoo must be up to something to keep rearranging their systems like this. The first couple lockdowns shut out my stream ripper :-(, now they've got another new layout for their ASX files. Oh well. For those of you who don't care, here's some pictures of a squirrel on water-skis.

Everyone else: The original ASX were pretty cool, standard mms: streams together with comments that could be used to figure out what the streams were, then they went and replaced them with HTTP streams that have these big long cryptic querystrings, presumably to keep track of users and resources and to tip off the FBI that you're spanking it to JoJo, who's only 15.

In either case the ads were served as separate streams and were easily taken out. Now it looks like they might be combining the ads into the same stream as the content, but I'm not sure yet. I've tried to take the new setup apart a little, but I grow bored with fixing this over and over.

I think the answer may be to quit using their own playlists, which are full of ads and Jennifer Lopez, and make my own playlist generator that feeds off the 'simmilar artists' feature. Be good practice for the new generation of spiders and scrapers I'm designing, the interact an absolute minimum with other sites, for less disruption and a smaller footprint in the logs. But not today. It's a beautiful afternoon and I'm going outside for awhile. Bye.

[read on...]

