Archive for January, 2010

iMacros addon for Firefox

iMacros for Firefox is a helpful little tool for scraping data of off web pages.  In order to get data to geocode for Google Maps, I needed some way to get the data.  I looked into some other solutions, and many were quite costly.  The learning curve for iMacros is not too steep.  Here is the code I wrote to scrape off of the page



Avatar in 3D

I went to see Avatar today.  I had been warned previously that there is a new definition of IMAX that doesn’t include a really big screen.  I tried to get tickets for the one really big IMAX screen in Manhattan, but it was sold out two days in advance.  So, a word of warning before buying tickets for the IMAX version and getting your hopes up, you may want to call the theater and ask if their IMAX screens are any bigger than their regular screens.

Anyway, my roommate had seen it, and told me he loved the effects, but the story left something to be desired.  He had never seen Titanic, though, because he dislikes Leonard DiCaprio, while I thought Titanic was a great love story.  My expectations for the film were pretty low, being that I had never even seen a preview, and based on my roommate’s review.  I was pleasantly surprised.  I loved the story.  I have done a few American Indian sweat lodges, and am very attracted to their spirituality and the way it ties in with nature.  I found the culture of the aliens to be very appealing in this sense.  I think the film made some great political commentary as well about modern imperialism and terrorism as seen from the other side.  I also fell in love with the female lead, which is a big plus, and loved the male lead’s story of “going native.”  All that combined with the best effects ever and 3d, which I really like in a film, and I highly recommend this movie.

My first professional job in the IT field was for a law firm in Valley Stream, NY in 1995. I inherited a half done Access database and had to learn VBA fast. Since then I found my VBA skills to be very useful when working with small to medium sized companies, as anyone with Microsoft Office has a full development kit along with a very good IDE already installed. Due to my familiarity with the language, while working on the Google Maps project, I decided that for me the easiest way to parse the data into a CSV format, Excel VBA was the way to go.

There are a few things to note in the code below. The first is the use of constants to promote readability. I definitely could have used some more, specifically to map columns in the spreadsheet, but I guess I was going for quick and dirty. Perhaps in the next pass I will clean it up some more.

The next thing I want to discuss is the declaration of variables. I didn’t include an Option Explicit command, which would then require that all variables be
declared in advance, but I could have, since all variables have been declared in this case. Also, I have used camel case as a naming convention for variables, with the variable type being used as a prefix withinu the name. For example, lngRawRow is shown to be of the type Long, and when using the variable in the code that is apparent by use of this naming convention. Speaking of naming conventions, notice that the constants are all capitals, seperated by underscores. This is not necessary for the code to run, but is what I consider to fall under the realm of best practices.

The With statement which begins the actual code can also be considered a best practice, but I just use it do I don’t have to write the same code over and over again.

The Do While…Loop is used to iterate through each row on the spreadsheet. There are other ways to go about iterating through cells on a spreadsheet, such as assigning a Range object to a variable and using a For Each…Next loop. This method can be useful for exposing the methods of the range object, but in this case since the only property of the cell I am interested in is the Value property (which is the default), the Do While…Loop is sufficient.

Next we come to the parsing of the text within the cells using various string functions. First the InStr function is called to find the start of the first tag. This function looks through an entire string and returns the number of the first occurance of the string you are looking for. So, if I would like to find the position in the string “coffee” of the letter “f”, I could use the function in this way:

InStr("coffee", "f")

This function would return the number 3, as the first occurance of the letter “f” in “coffee” is the third letter. Another example is as follows:

InStr("concatenate", "cat")

This would return the number 4, as that is the beginning of the string “cat” in the string “concatenate”.

There is one other optional parameter in the InStr function, and that is the Start parameter. So, for example, if I wanted to find the string “sea” within the string “She sells seashells at the seashore”, and used the function:

InStr("She sells seashells at the seashore", "sea")

the function would return 11. If I wanted to find the same string, but start searching after the 15th character in the string, I could use the function:

InStr(15, "She sells seashells at the seashore", "sea")

returning the number 27.

In the code we are writing, we are looking for the end of the tag so that we can use the Mid function to strip the tag out. So in the line:

lngTitleStart = InStr(.Cells(lngRawRow, 1), "") + BOLD_TAG_LENGTH

we add the length of the tag, which in this case is 3, and we get a number representing the place in the string at the end of the tag. In the following line we are looking for the end tag , and we get it’s position in the string. Now we are ready to parse out the text between the tags with the Mid function, and we do this by passing the string to the function, along with the starting position, and the length of the string we would like to return. Here is an example of this function in action:

Mid("New York City", 5, 4)

would return the string “York”, which starts on the 5th character of the string and is 4 characters long. If we didn’t know the length if the string, however, we might calculate it by taking the position of the end of the string and subtracting it from the position of the beginning of the string. In the code below, we have the position of the end of the tag and of the beginning of the tag, and we want the string in between those tags, so we find the length by subtracting those values, as shown:

strTitle = Mid(.Cells(lngRawRow, 1), lngTitleStart, lngTitleEnd - lngTitleStart)

For the next bit of code, where we are trying to parse out the street address, we had to do a bit of gymnastics because the data is not uniform and the street can end with a comma, an open parethesis, or a
tag. Once it discovers the end of the string, the Mid function is called again to parse the street address.

Here is the code:

Sub Parse()

   Const BOLD_TAG_LENGTH = 3

   Dim lngRawRow As Long Dim lngImportRow As Long Dim lngRawColumn As Long
   Dim lngTitleStart As Long
   Dim lngTitleEnd As Long
   Dim strTitle As String
   Dim lngStreetStart As Long
   Dim lngStreetEnd As Long
   Dim strStreet As String
   Dim lngZipStart As Long
   Dim strZip As String
   Dim strDescription As String
   Dim strGroup As String
   Dim lngTimeStart As Long
   Dim lngTimeEnd As Long
   Dim strTime As String

   With ThisWorkbook.Worksheets("NyMetroIntergroup")
      lngRawRow = 1
      lngImportRow = 2
      Do While .Cells(lngRawRow, 1) <> ""
         ' Get Title
         lngTitleStart = InStr(.Cells(lngRawRow, 1), "<b>") + BOLD_TAG_LENGTH
         lngTitleEnd = InStr(.Cells(lngRawRow, 1), "</b>")
         strTitle = Mid(.Cells(lngRawRow, 1), lngTitleStart, lngTitleEnd - lngTitleStart)
         ' Get Street
         lngStreetStart = InStr(.Cells(lngRawRow, 1), "<br>") + BREAK_TAG_LENGTH
         lngStreetEnd = InStr(lngStreetStart, .Cells(lngRawRow, 1), ",")
         If (InStr(lngStreetStart, .Cells(lngRawRow, 1), "(") <> 0 And InStr(lngStreetStart, .Cells(lngRawRow, 1), "(") < lngStreetEnd) Or lngStreetEnd = 0 Then
            lngStreetEnd = InStr(lngStreetStart, .Cells(lngRawRow, 1), "(")
            If lngStreetEnd = 0 Then
               lngStreetEnd = InStr(lngStreetStart, .Cells(lngRawRow, 1), "<br>")
            End If
         End If
         strStreet = Mid(.Cells(lngRawRow, 1), lngStreetStart, lngStreetEnd - lngStreetStart)
         If InStr(strStreet, "<br>") <> 0 Then
            lngStreetStart = InStr(strStreet, "<br>") + BREAK_TAG_LENGTH
            strStreet = Right(strStreet, Len(strStreet) - lngStreetStart)
         End If
         strStreet = Trim(strStreet)

         lngZipStart = InStrPat(1, .Cells(lngRawRow, 1), "#####")
         strZip = Mid(.Cells(lngRawRow, 1), lngZipStart, 5)

         For lngRawColumn = 2 To 8
            If InStr(.Cells(lngRawRow, lngRawColumn), "<br>") Then
               Select Case lngRawColumn
               Case 2
                  strGroup = "Sunday"
               Case 3
                  strGroup = "Monday"
               Case 4
                  strGroup = "Tuesday"
               Case 5
                  strGroup = "Wednesday"
               Case 6
                  strGroup = "Thursday"
               Case 7
                  strGroup = "Friday"
               Case 8
                  strGroup = "Saturday"
               End Select

               lngTimeStart = InStr(.Cells(lngRawRow, lngRawColumn), ">") + 1
               lngTimeEnd = InStr(.Cells(lngRawRow, lngRawColumn), "</td>")

               strTime = Mid(.Cells(lngRawRow, lngRawColumn), lngTimeStart, lngTimeEnd - lngTimeStart)
               strTime = Trim(strTime)

               strDescription = Right(.Cells(lngRawRow, 1), _
                  Len(.Cells(lngRawRow, 1)) - InStr(.Cells(lngRawRow, 1), _
                  "<br>") - 3) & "<br>" & strTime

               ThisWorkbook.Worksheets("Import").Cells(lngImportRow, 1) = strTitle
               ThisWorkbook.Worksheets("Import").Cells(lngImportRow, 2) = strDescription
               ThisWorkbook.Worksheets("Import").Cells(lngImportRow, 3) = strStreet
               ThisWorkbook.Worksheets("Import").Cells(lngImportRow, 4) = "New York"
               ThisWorkbook.Worksheets("Import").Cells(lngImportRow, 5) = "New York"
               ThisWorkbook.Worksheets("Import").Cells(lngImportRow, 6) = strZip
               ThisWorkbook.Worksheets("Import").Cells(lngImportRow, 7) = "US"
               ThisWorkbook.Worksheets("Import").Cells(lngImportRow, 8 ) = strGroup

               lngImportRow = lngImportRow + 1

             End If

          Next lngRawColumn

          lngRawRow = lngRawRow + 1
    End With

End Sub

Public Function InStrPat(Start As Variant, String1 As Variant, Optional String2 As Variant) As Variant

   Dim lngStart As Long
   Dim strText As String
   Dim strPat As String
   Dim Lg As Long, K As Long

   InStrPat = Null

   If IsMissing(String2) Then

      If IsNull(Start) Or IsNull(String1) Then Exit Function
      lngStart = 1
      strText = Start
      strPat = String1
     If IsNull(Start) Or IsNull(String1) Or IsNull(String2) Then Exit Function
     lngStart = Start
     strText = String1
     strPat = String2
   End If

   Lg = Len(strPat)
   InStrPat = 0
   For K = lngStart To Len(strText) - Lg
      If Mid(strText, K, Lg) Like strPat Then
         InStrPat = K
         Exit For
      End If
   Next K
End Function

An example of the data in the column where the raw data needing to be parsed:

St. Monica's Church<br>   413 East 79th Street,  Basement (Betw 1st &amp; York Avenues) 10021 <br>       *11th S, **1st S, !!As Bill Sees It, +Men, ++Women, @Round Robin<br>    %Promises, ~Children Welcome, +++S 1-3, #Daily Reflections<br>               <font color="#666666">Last Update: 12/07/09</font></p><br>@C-<br>    6:00a<br>    C-<br>    7:15a<br>    BB-<br>    8:30a<br>    T-<br>    10:00a<br>    C-<br>    12:30<br>    C-<br>    4:00<br>

Stupidity and the New York Post

I was walking on the Upper East Side this morning and caught a glimpse of the New York Post on a newstand.  The sight of it brought to mind a documentary I watched on Netflix a few days ago called “Stupidity.”  The film attempts to define and explain human stupidity.

One of it’s premises is that the media is responsible for much of the stupidity to be found in the world today.  That the idiot box truly turns people into idiots, and that newspapers are written with an eighth grade audience in mind so as not to alienate any of it’s readers.  A journalist is interviewed in the documentary and states that for every publication he has written for, he has been instructed to write for such an audience.

Some years ago, I read somewhere that the New Yok Post was subjected to a test to discover what the Flesch-Kincaid readability score is, and is was discovered to be written for fourth graders.  The National Center for Education did a study to find that the average reading level for people in the US was 7th grade.  It was shown in this study that when readers come across text that exceeds their reading level, they stop reading.  I would assume that in order to improve one’s reading level, you must read above your level.

Is it the job of the media, or the New York Post, in this case to imporve the reading level of it’s readership?  Hardly.  I only know that when I see a copy of the New York Post with it’s stupid headlines, I feel a little sick to my stomach.  Here are some examples:

Regarding the last headline, I remember how this “news” story about a woman who was driving drunk showed up as headlines for about a week, while in Iran people were protesting in the street about their corrupt elections.  Granted, the New York Post is a tabloid, and as such, I guess has no responsibility to report real news.  As part of the News Corporation, run by Rupert Murdoch, their responsibility is to their shareholders, and to get those shareholders as much profit as they can.

So, whose responsibility is it to inform the public?  Is it an individuals’ responsibility?  The information is out there, if you search for it.  The age of information has as it’s downfall that the loudest are usually the one that gets heard.  These headlines are loud.

If I am screaming in your ear, am I at fault, or are you for not moving out of range of my voice?  What if I have people on every corner screaming?  How does one get away?  If, in order to maximize profits for my shareholders, I pollute the air, am I not responsible for the pollution?  The New York Post is polluting the information stream, so where is the Environmental Protection Agency to shut them down?  I say that the New York Post, and other media outlets, by this analogy, are indeed responsible for making people stupid.

I have been watching a great deal of Netflix movies lately, without loading a single DVD.  How, you might ask?  The answer can be found in the Watch Instantly feature of Netflix.  My queue currently has 262 titles in it, of which the top 20 can be seen in the RSS feed on this blog’s homepage.  Two hundred and sixty two movies or television series to watch, with no waiting.  I have had an XBOX360 for quite some time, and when I first found out that Netflix was coming to the XBOX, I got very excited.  Now I mostly watch movies on my computer.

I have been waiting for this killer app for over 10 years, and although it has been out for some time now, I am still excited by it.  Not every Netflix title is available to Watch Instantly, but with 262 current titles that I want to see, I can’t really complain.  Prior to using this, I would regularly download movies I wanted to watch via torrents, but I haven’t done that in months.  It goes to show you that piracy is only a problem when there are no other services available.  Please the customer, make money.  Not a concept many of the media companies in this country seem to get.  Now, I only hope that Netflix doesn’t sell out to CBS, NBC, or the like, because they would probably ruin the whole thing by trying to milk money from it without concern for the consumer.  Killing the goose that lays the golden egg, so to speak.  Anyway, if you are a film buff, I highly recommend Netflix.  And I’m not even getting paid for the recommendation, it’s that good.

In further posts, I will offer some thoughts on movies I am watching on Netflix.

This evening I had to wait for the N train at 34th Street for about 20 minutes.  It is, by far, my favorite station to wait for a train, thanks to an interactive art installation on the track.  The interface is quite simple, there are 8 holes placed about 4 feet apart approximately 7 feet above the platform.  Coming out of each hole is a red beam of light, which, when broken by, say, a person’s hand, creates a series of sounds.  These sound range from marimba to pipes to birdsounds.  The notes change with each wave of the hand, so you can create interesting melodies by varying the rhythm of your hand swipes.  There is another device on the opposite track, and the sounds from the speakers play their interactions along with yours.  I have experimented with playing two holes at once, but it is no different from just playing the one.

With this setup is that you can have up to 16 strangers interacting and playing music together on two seperate platforms.  Tonight there was a moment where a guy joined me on my platform while a couple was playng the other.  I had already set a rhythm, and they were able to work within it, so at some point I stared counterpointing their rhythms, creating the experience of playing music with others.  Which is one of my favorite things to do on this planet, make music with others.  The non verbal communication, the experience of the sum being greater than the parts, the intense attention to the present moment, and the instant gratification of hearing beautiful sounds being made are among the things I love about the experience.  The installation at Herald’s Square is called Reach, aptly named in that you reach up to play it and that the experience of playing music is a reach for something greater than oneself.