Category: Uncategorized


iMacros addon for Firefox

iMacros for Firefox is a helpful little tool for scraping data of off web pages.  In order to get data to geocode for Google Maps, I needed some way to get the data.  I looked into some other solutions, and many were quite costly.  The learning curve for iMacros is not too steep.  Here is the code I wrote to scrape off of the page http://www.nyintergroup.org/meetinglist/zone.cfm?zone=01&boro=M


VERSION BUILD=6251204 RECORDER=FX
SET !EXTRACT_TEST_POPUP NO
SET !ERRORIGNORE YES
TAB T=1
TAG POS=1 TYPE=B ATTR=TXT:SAT
TAG POS=R{{!LOOP}} TYPE=P ATTR=TXT:* EXTRACT=HTM
TAG POS=R1 TYPE=TD ATTR=TXT:* EXTRACT=HTM
TAG POS=R1 TYPE=TD ATTR=TXT:* EXTRACT=HTM
TAG POS=R1 TYPE=TD ATTR=TXT:* EXTRACT=HTM
TAG POS=R1 TYPE=TD ATTR=TXT:* EXTRACT=HTM
TAG POS=R1 TYPE=TD ATTR=TXT:* EXTRACT=HTM
TAG POS=R1 TYPE=TD ATTR=TXT:* EXTRACT=HTM
TAG POS=R1 TYPE=TD ATTR=TXT:* EXTRACT=HTM
'
SAVEAS TYPE=EXTRACT FOLDER=* FILE=NyMetroIntergroup.csv

Advertisements

Avatar in 3D

I went to see Avatar today.  I had been warned previously that there is a new definition of IMAX that doesn’t include a really big screen.  I tried to get tickets for the one really big IMAX screen in Manhattan, but it was sold out two days in advance.  So, a word of warning before buying tickets for the IMAX version and getting your hopes up, you may want to call the theater and ask if their IMAX screens are any bigger than their regular screens.

Anyway, my roommate had seen it, and told me he loved the effects, but the story left something to be desired.  He had never seen Titanic, though, because he dislikes Leonard DiCaprio, while I thought Titanic was a great love story.  My expectations for the film were pretty low, being that I had never even seen a preview, and based on my roommate’s review.  I was pleasantly surprised.  I loved the story.  I have done a few American Indian sweat lodges, and am very attracted to their spirituality and the way it ties in with nature.  I found the culture of the aliens to be very appealing in this sense.  I think the film made some great political commentary as well about modern imperialism and terrorism as seen from the other side.  I also fell in love with the female lead, which is a big plus, and loved the male lead’s story of “going native.”  All that combined with the best effects ever and 3d, which I really like in a film, and I highly recommend this movie.

My first professional job in the IT field was for a law firm in Valley Stream, NY in 1995. I inherited a half done Access database and had to learn VBA fast. Since then I found my VBA skills to be very useful when working with small to medium sized companies, as anyone with Microsoft Office has a full development kit along with a very good IDE already installed. Due to my familiarity with the language, while working on the Google Maps project, I decided that for me the easiest way to parse the data into a CSV format, Excel VBA was the way to go.

There are a few things to note in the code below. The first is the use of constants to promote readability. I definitely could have used some more, specifically to map columns in the spreadsheet, but I guess I was going for quick and dirty. Perhaps in the next pass I will clean it up some more.

The next thing I want to discuss is the declaration of variables. I didn’t include an Option Explicit command, which would then require that all variables be
declared in advance, but I could have, since all variables have been declared in this case. Also, I have used camel case as a naming convention for variables, with the variable type being used as a prefix withinu the name. For example, lngRawRow is shown to be of the type Long, and when using the variable in the code that is apparent by use of this naming convention. Speaking of naming conventions, notice that the constants are all capitals, seperated by underscores. This is not necessary for the code to run, but is what I consider to fall under the realm of best practices.

The With statement which begins the actual code can also be considered a best practice, but I just use it do I don’t have to write the same code over and over again.

The Do While…Loop is used to iterate through each row on the spreadsheet. There are other ways to go about iterating through cells on a spreadsheet, such as assigning a Range object to a variable and using a For Each…Next loop. This method can be useful for exposing the methods of the range object, but in this case since the only property of the cell I am interested in is the Value property (which is the default), the Do While…Loop is sufficient.

Next we come to the parsing of the text within the cells using various string functions. First the InStr function is called to find the start of the first tag. This function looks through an entire string and returns the number of the first occurance of the string you are looking for. So, if I would like to find the position in the string “coffee” of the letter “f”, I could use the function in this way:

InStr("coffee", "f")

This function would return the number 3, as the first occurance of the letter “f” in “coffee” is the third letter. Another example is as follows:

InStr("concatenate", "cat")

This would return the number 4, as that is the beginning of the string “cat” in the string “concatenate”.

There is one other optional parameter in the InStr function, and that is the Start parameter. So, for example, if I wanted to find the string “sea” within the string “She sells seashells at the seashore”, and used the function:

InStr("She sells seashells at the seashore", "sea")

the function would return 11. If I wanted to find the same string, but start searching after the 15th character in the string, I could use the function:

InStr(15, "She sells seashells at the seashore", "sea")

returning the number 27.

In the code we are writing, we are looking for the end of the tag so that we can use the Mid function to strip the tag out. So in the line:

lngTitleStart = InStr(.Cells(lngRawRow, 1), "") + BOLD_TAG_LENGTH

we add the length of the tag, which in this case is 3, and we get a number representing the place in the string at the end of the tag. In the following line we are looking for the end tag , and we get it’s position in the string. Now we are ready to parse out the text between the tags with the Mid function, and we do this by passing the string to the function, along with the starting position, and the length of the string we would like to return. Here is an example of this function in action:

Mid("New York City", 5, 4)

would return the string “York”, which starts on the 5th character of the string and is 4 characters long. If we didn’t know the length if the string, however, we might calculate it by taking the position of the end of the string and subtracting it from the position of the beginning of the string. In the code below, we have the position of the end of the tag and of the beginning of the tag, and we want the string in between those tags, so we find the length by subtracting those values, as shown:

strTitle = Mid(.Cells(lngRawRow, 1), lngTitleStart, lngTitleEnd - lngTitleStart)

For the next bit of code, where we are trying to parse out the street address, we had to do a bit of gymnastics because the data is not uniform and the street can end with a comma, an open parethesis, or a
tag. Once it discovers the end of the string, the Mid function is called again to parse the street address.

Here is the code:

Sub Parse()

   Const BOLD_TAG_LENGTH = 3
   Const BREAK_TAG_LENGTH = 4

   Dim lngRawRow As Long Dim lngImportRow As Long Dim lngRawColumn As Long
   Dim lngTitleStart As Long
   Dim lngTitleEnd As Long
   Dim strTitle As String
   Dim lngStreetStart As Long
   Dim lngStreetEnd As Long
   Dim strStreet As String
   Dim lngZipStart As Long
   Dim strZip As String
   Dim strDescription As String
   Dim strGroup As String
   Dim lngTimeStart As Long
   Dim lngTimeEnd As Long
   Dim strTime As String

   With ThisWorkbook.Worksheets("NyMetroIntergroup")
      lngRawRow = 1
      lngImportRow = 2
      Do While .Cells(lngRawRow, 1) <> ""
         ' Get Title
         lngTitleStart = InStr(.Cells(lngRawRow, 1), "<b>") + BOLD_TAG_LENGTH
         lngTitleEnd = InStr(.Cells(lngRawRow, 1), "</b>")
         strTitle = Mid(.Cells(lngRawRow, 1), lngTitleStart, lngTitleEnd - lngTitleStart)
         ' Get Street
         lngStreetStart = InStr(.Cells(lngRawRow, 1), "<br>") + BREAK_TAG_LENGTH
         lngStreetEnd = InStr(lngStreetStart, .Cells(lngRawRow, 1), ",")
         If (InStr(lngStreetStart, .Cells(lngRawRow, 1), "(") <> 0 And InStr(lngStreetStart, .Cells(lngRawRow, 1), "(") < lngStreetEnd) Or lngStreetEnd = 0 Then
            lngStreetEnd = InStr(lngStreetStart, .Cells(lngRawRow, 1), "(")
            If lngStreetEnd = 0 Then
               lngStreetEnd = InStr(lngStreetStart, .Cells(lngRawRow, 1), "<br>")
            End If
         End If
         strStreet = Mid(.Cells(lngRawRow, 1), lngStreetStart, lngStreetEnd - lngStreetStart)
         If InStr(strStreet, "<br>") <> 0 Then
            lngStreetStart = InStr(strStreet, "<br>") + BREAK_TAG_LENGTH
            strStreet = Right(strStreet, Len(strStreet) - lngStreetStart)
         End If
         strStreet = Trim(strStreet)

         lngZipStart = InStrPat(1, .Cells(lngRawRow, 1), "#####")
         strZip = Mid(.Cells(lngRawRow, 1), lngZipStart, 5)

         For lngRawColumn = 2 To 8
            If InStr(.Cells(lngRawRow, lngRawColumn), "<br>") Then
               Select Case lngRawColumn
               Case 2
                  strGroup = "Sunday"
               Case 3
                  strGroup = "Monday"
               Case 4
                  strGroup = "Tuesday"
               Case 5
                  strGroup = "Wednesday"
               Case 6
                  strGroup = "Thursday"
               Case 7
                  strGroup = "Friday"
               Case 8
                  strGroup = "Saturday"
               End Select

               lngTimeStart = InStr(.Cells(lngRawRow, lngRawColumn), ">") + 1
               lngTimeEnd = InStr(.Cells(lngRawRow, lngRawColumn), "</td>")

               strTime = Mid(.Cells(lngRawRow, lngRawColumn), lngTimeStart, lngTimeEnd - lngTimeStart)
               strTime = Trim(strTime)

               strDescription = Right(.Cells(lngRawRow, 1), _
                  Len(.Cells(lngRawRow, 1)) - InStr(.Cells(lngRawRow, 1), _
                  "<br>") - 3) & "<br>" & strTime

               ThisWorkbook.Worksheets("Import").Cells(lngImportRow, 1) = strTitle
               ThisWorkbook.Worksheets("Import").Cells(lngImportRow, 2) = strDescription
               ThisWorkbook.Worksheets("Import").Cells(lngImportRow, 3) = strStreet
               ThisWorkbook.Worksheets("Import").Cells(lngImportRow, 4) = "New York"
               ThisWorkbook.Worksheets("Import").Cells(lngImportRow, 5) = "New York"
               ThisWorkbook.Worksheets("Import").Cells(lngImportRow, 6) = strZip
               ThisWorkbook.Worksheets("Import").Cells(lngImportRow, 7) = "US"
               ThisWorkbook.Worksheets("Import").Cells(lngImportRow, 8 ) = strGroup

               lngImportRow = lngImportRow + 1

             End If

          Next lngRawColumn

          lngRawRow = lngRawRow + 1
       Loop
    End With

End Sub

Public Function InStrPat(Start As Variant, String1 As Variant, Optional String2 As Variant) As Variant

   Dim lngStart As Long
   Dim strText As String
   Dim strPat As String
   Dim Lg As Long, K As Long

   InStrPat = Null

   If IsMissing(String2) Then

      If IsNull(Start) Or IsNull(String1) Then Exit Function
      lngStart = 1
      strText = Start
      strPat = String1
   Else
     If IsNull(Start) Or IsNull(String1) Or IsNull(String2) Then Exit Function
     lngStart = Start
     strText = String1
     strPat = String2
   End If

   Lg = Len(strPat)
   InStrPat = 0
   For K = lngStart To Len(strText) - Lg
      If Mid(strText, K, Lg) Like strPat Then
         InStrPat = K
         Exit For
      End If
   Next K
End Function

An example of the data in the column where the raw data needing to be parsed:

St. Monica's Church<br>   413 East 79th Street,  Basement (Betw 1st &amp; York Avenues) 10021 <br>       *11th S, **1st S, !!As Bill Sees It, +Men, ++Women, @Round Robin<br>    %Promises, ~Children Welcome, +++S 1-3, #Daily Reflections<br>               <font color="#666666">Last Update: 12/07/09</font></p><br>@C-<br>    6:00a<br>    C-<br>    7:15a<br>    BB-<br>    8:30a<br>    T-<br>    10:00a<br>    C-<br>    12:30<br>    C-<br>    4:00<br>

Hello world!

Well, I just joined and this is my first post.  2009 is almost done, and it has been a year of many changes.  I named the blog crisisoffaith2009 because I like the term, and because some of these changes have tested my faith.  What faith, you may ask?  Faith that everything is going to work out after a loss of some kind, I guess is what I mean.  Anyway, in this blog I will discuss various interests of mine.  I have a new love for writing, and I have always been very curious and looking to expand my knowledge, and by writing I can formalize the thought processes, journal my process, and share with others.  The interests I will be writing about will include computers, programming, music, literature, movies, technology, philosophy and history.  This is not an inclusive list, because, as you see, my interests are broad.  Well, here we go!