scraping websites...

many times, i find myself having to scrape a website for any particular reason. now a days, if i need to do it, i’d probably do it with some version of mechanize (www::mechanize in perl, hpricot in ruby, etc). when i was looking for a bug in one of the scrapers i’d written a long time ago, what took me by surprise was that i’d written a lexer to do it.

i guess this was shortly after i’d taken “languages and interpreters” in college, in which we used [f]lex and yacc/(bison). i just figured it was interesting, after working with a technology, that we try to utilize it. its not necessarily a bad way to do things, i just would do things differently now…

quran facebook application

i figured i’d play around with the facebook api today, so i wrote a little facebook quran app for displaying verses from the quran on your profile page. not very polished if i should say so myself, but… it works (at least for me). if you try it, please let me know if you find any bugs or have any feature suggestions.

you can test it here: http://facebook.cafesalam.net/quranapp.

zomg lolcats ftw!

this is off the hook!

HAI
CAN HAS STDIO?
PLZ OPEN FILE "LOLCATS.TXT"?
  AWSUM THX
      VISIBLE FILE
  O NOES
      INVISIBLE "ERROR!"
KTHXBYE

they actually have a few test interpreters for it and more examples here.

scratching my head over a c problem...

today, i wanted to try out my test arabic gtk program to see if behdad’s new changes to pango magically fixed the renown arabic shaping issue [in short, it had nothing to do with it]. anyway, i discovered that i needed to install libquran, and to make a long story short, my test program, which used to work before, segfaulted. i ran gdb and valgrind only to find the segfault happening within libquran at the closing of the configuration file (noting this libquran code hasn’t been changed in 3 years now).

i looked at the source, and discovered that the file pointer was becoming null after a call to getline. i tried to see if i could reproduce this in a smaller test program, and i discovered that i indeed could -

#include <stdio.h>

int main(void){
   int n = 0;
   char* tmp;
   FILE* fp = fopen("./testfile", "r");
   getline(&tmp, &n, fp);
   printf("got a str of: %s\n", tmp);
   printf("now fp is: %s\n", (fp==NULL)? "null" : "not null");
   fclose(fp);
   free(tmp);
   return 0;
} 

the program displayed the first line from testfile, but unexpectedly displayed that fp is null and segfaulted at the fclose. checking the return from getline, i see that it returns successfully (the number of characters it read).

while i got around this problem by modifying the library to do a malloc followed by an fgets, i am just confused -this library code hasn’t been touched in 3 years, it used to work before, and i just repulled it from cvs when i discovered this. so why is it broken now? the only thing that i can think of being different is that my box now runs a 64 bit version of linux, but would that break it?

any ideas?

arabic answers rip off

one of my coworkers sent me this today. pretty funny that they blatantly ripped off the images and such. on a similar note, it seems as though there’s way too much red tape to go through in order to get something like this officially done.

taxes on softdrinks... what next?

today, i went to the supermarket to buy some stuff, and i picked up a 12 pack of pepsi on the way out. when i looked at my receipt, i noticed that there was an additional tax of $0.48 for the pepsi, separate from the overall tax at the bottom (not sure if it is included in the overall x% tax that you pay at the end or not, but regardless, it is more).

safeway receipt

oddly enough, i could buy 1.92 cans of pepsi for the amount of the tax. actually, for the total of $3.98, i could have bought 15.92 cans from the vending machine at work… (although ideally, i should stop drinking pepsi altogether).

ruby and rcairo

i’ve recently been playing more and more with ruby and i really like it. at the same time, i’ve been loving launchy, an open source application launcher for windows (currently using it on my work laptop).

since i love launchy so much, i started to wonder, “why not write something similar for linux?” - now i know one will say that deskbar does the job, but its not quite the same. so anyway, because i wanted transparency and so on, i decided to look into cairo, and the rcairo ruby bindings.

here’s a screenshot of what i’ve been playing with so far:

rcairo demo

note the very right side is the part showing of my open gvim window. while i have yet to organize the code and start writing it for real, the ruby file i have is a proof of concept of all the required pieces i can think of (cairo wise anyway) working together.

some neat videos

this video from steve jobs’ stanford commencement speech (2005) was pretty good:

in addition, the following videos are worth watching (in my opinion):

  • killing us softly - an excellent talk about how today’s advertising schemes are harmful to women and to the society in general (pictures of ads included, so muslim guys may just want to listen rather than watch).

  • don’t buy stuff you can’t afford - a short, yet hilarious, snl skit.

  • pirates of silicon valley - this movie is good for cs majors who want to know how apple/microsoft started.

while most of those were posted in my google reader shared items, no one checks those, do they? :p

a qualm...

i have a qualm with today’s traffic lights… sometimes, they are absolutely ludicrous and preposterous. you would think that with today’s sensor technology and stuff, you’d be able to drive and not wait at lights at all, especially really early in the morning or really late at night… but unfortunately, this is not the case.

one of the traffic lights very close to my house is totally insane. its at a major intersection across a railroad track. during normal hours, if you miss it, you may wait 2 and a half minutes to turn at it. if you are unlucky and a train happens to pass, you miss your next turn most of the time, therefore resulting in one having to wait about 5 minutes if not more to make a left turn at that light. consequently, people just drive to the next light and make a left and/or u-turn from there.

this morning, i waited almost 4 minutes to make a left turn, despite the fact that the road from the other 3 directions was literally empty. ugh… maybe that light either doesn’t have a sensor or has a really bad sensor installed on it?

on email spam...

in most cases, i find that i am much more effective than a spam detector in figuring out what’s spam and what’s not just by looking at the subject line (and, in extreme cases, at the sender as well). in very few cases does one have to open the email to be sure.

recently, i got this email (twice) on my y! mail account (which, by the way, will soon offer unlimited email storage). i wasn’t totally sure by just looking at the subject and sender (although the last name was pretty fake), so i opened it - i was pretty surprised at a few things in the email (a spin-off of the classic “millionaire bank account” email).

My dearest Ahmed,

May Allah bless and guide you and family for me!

I am highly compelled upon strict recommendation, to write you this very urgent and confidential letter. I do hope my letter will not embarrass you since I had no previous correspondence with you. It is because of my deepest faith in Alla and also in you as a true Islamic sibling made me stood firm to write you this email. I have been tormented, tortured, maltreated and humiliated along side with my kids for no just reason.

wow… a “true” greeting, and a prayer too… could be convincing if one didn’t know better. secondly, the leaving out of an h in “Allah” in the first paragraph… third, the playing on emotions (because you’re my “Islamic sibling” - we as muslims do consider each other brothers and sisters). interesting.

I want you to assist me acquiring and safe keeping my only remaining inheritance from a security company in London. I will send you detail of my investment plans in my next correspondence to you. I solely want to invest this money in your country with all legal backs. We are suffering under the leadership of the bad government and this is why I want to run away from my country to settle in your country for high security reasons.

so… what country is “your country” - but the typical reader may not notice.

I am sending this proposal with a broken heart and I believe that you are going to give me a sympathetic attention. I regret the inconvenience it might cause you based on the condition that we have not met before. But I so much believe that Allah is in total control.

hmm, making people sympathize and mentioning, “Allah is in total control.”

Intro:

I am Hajia . Munirat Fatima Abacha, the wife of Mohammed Abacha, son of the late head of state of Federal Republic of Nigeria - General Sani Abacha.I am contacting you in view of the fact that we will be of great assistance to each other like wise developing a cordial relationship as true Islamic families.

My husband along with his late father and top officials of their past administration has been accused of looting several Billion United States Dollars from the Nigeria Government. The current attitude of the present government towards my family has indeed made life quite unbearable and sad to us trying to leave us on empty handed. Make reference to this BBC report: http://news.bbc.co.uk/2/hi/africa/2282366.stm

hey, now i have proof that her husband really did have this case, including a bbc report…

for us. Fortunately, Mohammed my Husband has Eighteen million and Six hundred thousand and United States Dollars (US$18.6 million) cash, which he intended to use for investment purposes in an Islamic nation abroad for hotel and recreation reservation centre, while part will be diverted into charity and destitute homes to assist the life of the needy and helpless. This money is kept in a private security company in London now. This money was deposited for safe keeping in the security vault of a freighting agency here by my husband preparatory to being air lifted abroad for investment purposes before his arrest leaving a clause that it could only be claimed by an expatriate partner. It is only my husband and myself that know where the money is kept with full information on documents with our trusted family lawyer.

what’s this “for us” that starts the paragraph? - there’s nothing missing above it or below it… it doesn’t make any sense to start that sentence with “for us,” but uh… okay. centre? confidential information like this sent in an email? using stolen money for charity? yeah…

Due to the current situation in the country concerning government’s vendettas towards my family, we seek your assistance to transfer this money out of London for the purpose of investment as intended by my husband. Note that my family is currently being probed by this present government for alleged involvement in misappropriation of public funds during my father-in-law’s regime. Towards this effect, an embargo restricting my family members from traveling or carrying out financial transactions without their express permission is in force.

Right now, my husband (Mohammed) is under arrest and is being detained in connection to the above and other activities of his late father. However, I have an arrangement on how to freight this money to you after receiving some assurances from you of the safety our own share and that you will only take the commission that we will offer you. This money personally belongs to my husband and he intends that it still be used for investment in Islamic nations only.

investing in islamic nations? don’t they realize that this money, which they admit is stolen and provide an article from the bbc proving it, is 7aram (unlawful) and unusable for investment in islamic countries and islamic projects?

The freighting company to be used has now been introduced to me and as soon as we receive your readiness to assist us receives this fund we shall formalize the deposit documents in your name and reach an agreement with them to air lift the consignment for your pick up in your address. Bearing in mind that your assistance is needed to transfer this fund, wepropose a commission of 15% (Fifteen Percent) of the total sum to you for the expected services and assistance. While extra 5% is mapped out for miscellaneous expenses.

heyyy! 15% cut of stolen money… no thank you!

On your positive consent, I shall expect you to contact me urgently to enable us discuss about this. Your urgent response is highly needed. I must use this opportunity to implore you to exercise utmost indulgence to keep this matter extraordinarily confidential, while I await your prompt response. I have to remind you that this information should be on a top confidential level. Best regards,

if it has to be kept confidential, its probably illegal… but do people realize this?

MRS. MUNIRAT FATIMA ABACHA

PROFILE: <http://profiles.yahoo.com/hajiafatimaabacha>

so a few interesting things… check out her profile on yahoo (link above). not very realistic that someone in royalty will set up a yahoo profile and set their homepage to be a link to a bbc article accusing their husband of stealing money from the government…

the problem is, with such types of “realistic” and “targeted” messages, many people (especially from the uncle/auntie generation and the non computer savvy people) may believe such messages, especially when they’re addressed as personally as these messages are and when they try to pull people’s help for a “humanitarian cause” and for “the sake of religion” and “doing what’s right” while at the same time presenting a partially accurate (albeit disconnected) view reflective of the recipient’s beliefs.

update (4/13/2007) - this morning, i got two new versions of this email. one to y! mail that is very similar to the other one except with an email address included, and another on gmail which is a different version that includes 1. a new url for the news url (link to cnn this time), 2. a phone number, 3. a “please email me here for security and privacy” with a link to a yahoo.fr email account (except with horrible spelling). 4. a different last name, 5. different percentage (50% rather than 15%), 6. addressing me as “my mr. ahmed” and “dear”…

this is totally ludicrous…