JetDbConnect - a class providing a simplified interface to JET OleDb compatible datasources using ADO.Net

26 03 2007

I needed a way to run SQL queries on structured text files & Excel files.

Download links available at end of post

For the last few months, I have been toiling away in the highly arcane world of fixed width text files. Very large ones at that (250-500mb). These come from text file data dumps of appraisal district databases. I need only a subset of these records. From there they go into Excel where they are cleaned up on a nearly line by line basis that can’t be handled consistently by RegEx since data is recorded so differently by each county. From there I want to summarize that data on a single household basis. I could go in Excel & do:

  1. Advanced filter, Unique Values (based on a unique household identifier)
  2. Use vlookup & sumif functions to copy and/or sum the data for each column

However, this is rather time consuming since different data is provided by in different order and level of detail for each county. Also, for rolls with more than 25k records and over 100 fields these Excel formulas can take ~30min to complete. You then need to copy & paste values or you’ll have to wait a long time anytime a recalc is performed. Needed a programmatic way to deal with this data and to minimize the human time to just records which required actual human eyes to modify.

Admittedly, creating an OleDb connection, command, reader & even datatable using ADO.Net methods is a pretty simple thing.

Dim connect As New OleDb.OleDbConnection
Dim strConnect As String = “Provider=Microsoft.Jet.OLEDB.4.0;Data Source=C:\MyExcel.xls;Extended Properties=”"Excel 8.0;HDR=Yes;IMEX=1″”"
connect.ConnectionString = strConnect
Dim cmd As New OleDb.OleDbCommand(”SELECT * FROM [Sheet1$] WHERE City=’Austin’ AND LandValue > 30000″, connect)
Dim reader As OleDb.OleDbDataReader = cmd.ExecuteReader()
Dim dt As New DataTable
dt.Load(reader)

But I wanted something even simpler & significantly more flexible than that. I wanted an object that could handle any potential Jet datasource, field list, target table, and SQL Select statement. I wanted something that could be reused in a variety of different applications and not require any hard coding of these parameters or make me implement different ways to do this in each application. I needed a good object oriented design with high reusability. To this end I created the JetDbConnect.dll

Structure

Properties

  • Closed - OleDbConnection closed or open
  • ConnectionString (ReadOnly)
  • DataSource - the file path to your Jet compatible datasource. The ONLY property you must set either by assignment or through instantiation.
  • DbConnection (ReadOnly)
  • ExtendedProperties - if you need to use a Jet datasource that is not directly supported, here is where you would give the connection string parameters that would allow you to use it
  • StorageMethod - one of the 4 enums of the file types this handles directly (Text, Excel, Access, Dbase). Should never need to be set unless you have a really messed up filename.
  • TablesList - List of Strings of available tables in the datasource
  • TargetTable - The table from which to query data, only needed in Excel & Access. If not given for those then it picks 1st table on table list
  • TextFormat - enum: FixedWidth or Delimited. If not set assumes delimited with default delimiter set in your registry settings

Methods

  • New
    • () - Default
    • (DataSource as string, TextFormat Enum)
      • This is used ONLY with text-based datasources
      • DataSource = path to datasource
      • TextFormat Enum is one of the 2 members of this Enum
    • (DataSource as string, [TargetTable] as string)
      • This can be used with any datasource type.
      • If you use it with a text-based datasource, and it is not delimited with your default delimiter, then you will also have to set the TextFormat enum elsewhere BEFORE using one of the Command,Reader, or Records methods
  • Close() - manually close the connection. Not required but can be useful
  • CommandGetAll, ReaderGetAll, RecordsGetAll ([DataSource] as string, [TargetTable] as string)
    • All 3 methods take the same parameters, an optional datasource path and an optional target table name
    • These methods allow you to easily retrieve an OleDbCommand, OleDbReader, or DataTable containing all the records in your target table of your datasource
    • If you don’t set DataSource here and it has not been set elsewhere, this method will fail
    • If you don’t set TargetTable here and it is an Access or Excel datasource and TargetTable is not set elsewhere, then “Sheet1″ or the first member of TablesList will be used for Excel or Access datasources respectively
  • CommandGetSql, ReaderGetSql, RecordsGetSql (sqlWhere as string, [fields] as string, [DataSource] as string, [TargetTable] as string)
    • All 3 methods take the same parameters:
      • Required: sqlWhere string, ex. “zipcode=’78258′ AND land_value>10000″
      • Optional: fields. A comma delimited list of fields to return with the query. ex. “owner_name, zipcode, land_value”
      • Optional: DataSource & TargetTable - see above for how the default values are handled for these 2 parameters
    • These methods allow you to easily retrieve an OleDbCommand, OleDbReader, or DataTable containing just the records in your target table of your datasource which meet your SQL criteria and optionally to return only the specified fields
    • If you want ALL the fields, pass Nothing or Void for fields. An empty string will cause an error.
  • SchemaTable
    • Returns a Schema DataTable for the DataSource not a TargetTable.
    • If you want a Schema DataTable for a table then get it from an OleDbReader on that table
  • FromXml, ToXml
    • Provides methods for serializing & de-serializing the object.
    • Since only DataSource is required for text & Dbase files and DataSource & TargetTable for Access and Excel, I don’t see these methods as being particularly useful. However, I always like to implement them just in case I need to persist an object or compare an object to a previous state.

Usage

Examples:

Get an OleDbReader for all records in an Excel file from a named sheet

dim jetdb as new JetDbConnect(”C:\myfiles\ExcelFile.xls”, “Hays County”)

dim reader as OleDbReader = jetdb.ReaderGetAll()

-or-

dim jetdb as new JetDbConnect

dim reader as OleDbReader = jetdb.ReaderGetAll(”C:\myfiles\ExcelFile.xls”, “Hays County”)

note: the class automatically adds the required decorations so that Hays County becomes [Hays County$] as required for SQL statements on an Excel datasource

Get a DataTable with select records & fields from delimited text file

dim jetdb as new JetDbConnect(”C:\myfiles\some text file.txt”)

dim dt as DataTable = jetdb.RecordsGetSql(”zipcode=’78258′ AND land_value>10000″,”owner_name, zipcode, land_value”)

note: I did not need to ever specify the TargetTable attribute since it is automatically set to “some text file.txt” and the Data Source is actually set to “C:\myfiles”

Get a list of available Sheets and Ranges from an Excel file without the need for any Office libaries or applications

dim jetdb as new JetDbConnect(”C:\myfiles\ExcelFile.xls”)

dim tables as List(of string) = jetdb.TablesList

Downloads

Just the DLL

The Visual Studio 2005 project