My first professional job in the IT field was for a law firm in Valley Stream, NY in 1995. I inherited a half done Access database and had to learn VBA fast. Since then I found my VBA skills to be very useful when working with small to medium sized companies, as anyone with Microsoft Office has a full development kit along with a very good IDE already installed. Due to my familiarity with the language, while working on the Google Maps project, I decided that for me the easiest way to parse the data into a CSV format, Excel VBA was the way to go.

There are a few things to note in the code below. The first is the use of constants to promote readability. I definitely could have used some more, specifically to map columns in the spreadsheet, but I guess I was going for quick and dirty. Perhaps in the next pass I will clean it up some more.

The next thing I want to discuss is the declaration of variables. I didn’t include an Option Explicit command, which would then require that all variables be
declared in advance, but I could have, since all variables have been declared in this case. Also, I have used camel case as a naming convention for variables, with the variable type being used as a prefix withinu the name. For example, lngRawRow is shown to be of the type Long, and when using the variable in the code that is apparent by use of this naming convention. Speaking of naming conventions, notice that the constants are all capitals, seperated by underscores. This is not necessary for the code to run, but is what I consider to fall under the realm of best practices.

The With statement which begins the actual code can also be considered a best practice, but I just use it do I don’t have to write the same code over and over again.

The Do While…Loop is used to iterate through each row on the spreadsheet. There are other ways to go about iterating through cells on a spreadsheet, such as assigning a Range object to a variable and using a For Each…Next loop. This method can be useful for exposing the methods of the range object, but in this case since the only property of the cell I am interested in is the Value property (which is the default), the Do While…Loop is sufficient.

Next we come to the parsing of the text within the cells using various string functions. First the InStr function is called to find the start of the first tag. This function looks through an entire string and returns the number of the first occurance of the string you are looking for. So, if I would like to find the position in the string “coffee” of the letter “f”, I could use the function in this way:

InStr("coffee", "f")

This function would return the number 3, as the first occurance of the letter “f” in “coffee” is the third letter. Another example is as follows:

InStr("concatenate", "cat")

This would return the number 4, as that is the beginning of the string “cat” in the string “concatenate”.

There is one other optional parameter in the InStr function, and that is the Start parameter. So, for example, if I wanted to find the string “sea” within the string “She sells seashells at the seashore”, and used the function:

InStr("She sells seashells at the seashore", "sea")

the function would return 11. If I wanted to find the same string, but start searching after the 15th character in the string, I could use the function:

InStr(15, "She sells seashells at the seashore", "sea")

returning the number 27.

In the code we are writing, we are looking for the end of the tag so that we can use the Mid function to strip the tag out. So in the line:

lngTitleStart = InStr(.Cells(lngRawRow, 1), "") + BOLD_TAG_LENGTH

we add the length of the tag, which in this case is 3, and we get a number representing the place in the string at the end of the tag. In the following line we are looking for the end tag , and we get it’s position in the string. Now we are ready to parse out the text between the tags with the Mid function, and we do this by passing the string to the function, along with the starting position, and the length of the string we would like to return. Here is an example of this function in action:

Mid("New York City", 5, 4)

would return the string “York”, which starts on the 5th character of the string and is 4 characters long. If we didn’t know the length if the string, however, we might calculate it by taking the position of the end of the string and subtracting it from the position of the beginning of the string. In the code below, we have the position of the end of the tag and of the beginning of the tag, and we want the string in between those tags, so we find the length by subtracting those values, as shown:

strTitle = Mid(.Cells(lngRawRow, 1), lngTitleStart, lngTitleEnd - lngTitleStart)

For the next bit of code, where we are trying to parse out the street address, we had to do a bit of gymnastics because the data is not uniform and the street can end with a comma, an open parethesis, or a
tag. Once it discovers the end of the string, the Mid function is called again to parse the street address.

Here is the code:

Sub Parse()

   Const BOLD_TAG_LENGTH = 3
   Const BREAK_TAG_LENGTH = 4

   Dim lngRawRow As Long Dim lngImportRow As Long Dim lngRawColumn As Long
   Dim lngTitleStart As Long
   Dim lngTitleEnd As Long
   Dim strTitle As String
   Dim lngStreetStart As Long
   Dim lngStreetEnd As Long
   Dim strStreet As String
   Dim lngZipStart As Long
   Dim strZip As String
   Dim strDescription As String
   Dim strGroup As String
   Dim lngTimeStart As Long
   Dim lngTimeEnd As Long
   Dim strTime As String

   With ThisWorkbook.Worksheets("NyMetroIntergroup")
      lngRawRow = 1
      lngImportRow = 2
      Do While .Cells(lngRawRow, 1) <> ""
         ' Get Title
         lngTitleStart = InStr(.Cells(lngRawRow, 1), "<b>") + BOLD_TAG_LENGTH
         lngTitleEnd = InStr(.Cells(lngRawRow, 1), "</b>")
         strTitle = Mid(.Cells(lngRawRow, 1), lngTitleStart, lngTitleEnd - lngTitleStart)
         ' Get Street
         lngStreetStart = InStr(.Cells(lngRawRow, 1), "<br>") + BREAK_TAG_LENGTH
         lngStreetEnd = InStr(lngStreetStart, .Cells(lngRawRow, 1), ",")
         If (InStr(lngStreetStart, .Cells(lngRawRow, 1), "(") <> 0 And InStr(lngStreetStart, .Cells(lngRawRow, 1), "(") < lngStreetEnd) Or lngStreetEnd = 0 Then
            lngStreetEnd = InStr(lngStreetStart, .Cells(lngRawRow, 1), "(")
            If lngStreetEnd = 0 Then
               lngStreetEnd = InStr(lngStreetStart, .Cells(lngRawRow, 1), "<br>")
            End If
         End If
         strStreet = Mid(.Cells(lngRawRow, 1), lngStreetStart, lngStreetEnd - lngStreetStart)
         If InStr(strStreet, "<br>") <> 0 Then
            lngStreetStart = InStr(strStreet, "<br>") + BREAK_TAG_LENGTH
            strStreet = Right(strStreet, Len(strStreet) - lngStreetStart)
         End If
         strStreet = Trim(strStreet)

         lngZipStart = InStrPat(1, .Cells(lngRawRow, 1), "#####")
         strZip = Mid(.Cells(lngRawRow, 1), lngZipStart, 5)

         For lngRawColumn = 2 To 8
            If InStr(.Cells(lngRawRow, lngRawColumn), "<br>") Then
               Select Case lngRawColumn
               Case 2
                  strGroup = "Sunday"
               Case 3
                  strGroup = "Monday"
               Case 4
                  strGroup = "Tuesday"
               Case 5
                  strGroup = "Wednesday"
               Case 6
                  strGroup = "Thursday"
               Case 7
                  strGroup = "Friday"
               Case 8
                  strGroup = "Saturday"
               End Select

               lngTimeStart = InStr(.Cells(lngRawRow, lngRawColumn), ">") + 1
               lngTimeEnd = InStr(.Cells(lngRawRow, lngRawColumn), "</td>")

               strTime = Mid(.Cells(lngRawRow, lngRawColumn), lngTimeStart, lngTimeEnd - lngTimeStart)
               strTime = Trim(strTime)

               strDescription = Right(.Cells(lngRawRow, 1), _
                  Len(.Cells(lngRawRow, 1)) - InStr(.Cells(lngRawRow, 1), _
                  "<br>") - 3) & "<br>" & strTime

               ThisWorkbook.Worksheets("Import").Cells(lngImportRow, 1) = strTitle
               ThisWorkbook.Worksheets("Import").Cells(lngImportRow, 2) = strDescription
               ThisWorkbook.Worksheets("Import").Cells(lngImportRow, 3) = strStreet
               ThisWorkbook.Worksheets("Import").Cells(lngImportRow, 4) = "New York"
               ThisWorkbook.Worksheets("Import").Cells(lngImportRow, 5) = "New York"
               ThisWorkbook.Worksheets("Import").Cells(lngImportRow, 6) = strZip
               ThisWorkbook.Worksheets("Import").Cells(lngImportRow, 7) = "US"
               ThisWorkbook.Worksheets("Import").Cells(lngImportRow, 8 ) = strGroup

               lngImportRow = lngImportRow + 1

             End If

          Next lngRawColumn

          lngRawRow = lngRawRow + 1
       Loop
    End With

End Sub

Public Function InStrPat(Start As Variant, String1 As Variant, Optional String2 As Variant) As Variant

   Dim lngStart As Long
   Dim strText As String
   Dim strPat As String
   Dim Lg As Long, K As Long

   InStrPat = Null

   If IsMissing(String2) Then

      If IsNull(Start) Or IsNull(String1) Then Exit Function
      lngStart = 1
      strText = Start
      strPat = String1
   Else
     If IsNull(Start) Or IsNull(String1) Or IsNull(String2) Then Exit Function
     lngStart = Start
     strText = String1
     strPat = String2
   End If

   Lg = Len(strPat)
   InStrPat = 0
   For K = lngStart To Len(strText) - Lg
      If Mid(strText, K, Lg) Like strPat Then
         InStrPat = K
         Exit For
      End If
   Next K
End Function

An example of the data in the column where the raw data needing to be parsed:

St. Monica's Church<br>   413 East 79th Street,  Basement (Betw 1st &amp; York Avenues) 10021 <br>       *11th S, **1st S, !!As Bill Sees It, +Men, ++Women, @Round Robin<br>    %Promises, ~Children Welcome, +++S 1-3, #Daily Reflections<br>               <font color="#666666">Last Update: 12/07/09</font></p><br>@C-<br>    6:00a<br>    C-<br>    7:15a<br>    BB-<br>    8:30a<br>    T-<br>    10:00a<br>    C-<br>    12:30<br>    C-<br>    4:00<br>

Advertisements