JetDbConnect – a class providing a simplified interface to JET OleDb compatible datasources using ADO.Net

26 03 2007

I needed a way to run SQL queries on structured text files & Excel files.

Download links available at end of post

For the last few months, I have been toiling away in the highly arcane world of fixed width text files. Very large ones at that (250-500mb). These come from text file data dumps of appraisal district databases. I need only a subset of these records. From there they go into Excel where they are cleaned up on a nearly line by line basis that can’t be handled consistently by RegEx since data is recorded so differently by each county. From there I want to summarize that data on a single household basis. I could go in Excel & do:

  1. Advanced filter, Unique Values (based on a unique household identifier)
  2. Use vlookup & sumif functions to copy and/or sum the data for each column

However, this is rather time consuming since different data is provided by in different order and level of detail for each county. Also, for rolls with more than 25k records and over 100 fields these Excel formulas can take ~30min to complete. You then need to copy & paste values or you’ll have to wait a long time anytime a recalc is performed. Needed a programmatic way to deal with this data and to minimize the human time to just records which required actual human eyes to modify.

Admittedly, creating an OleDb connection, command, reader & even datatable using ADO.Net methods is a pretty simple thing.

Dim connect As New OleDb.OleDbConnection
Dim strConnect As String = “Provider=Microsoft.Jet.OLEDB.4.0;Data Source=C:\MyExcel.xls;Extended Properties=”"Excel 8.0;HDR=Yes;IMEX=1″”"
connect.ConnectionString = strConnect
Dim cmd As New OleDb.OleDbCommand(“SELECT * FROM [Sheet1$] WHERE City=’Austin’ AND LandValue > 30000″, connect)
Dim reader As OleDb.OleDbDataReader = cmd.ExecuteReader()
Dim dt As New DataTable
dt.Load(reader)

But I wanted something even simpler & significantly more flexible than that. I wanted an object that could handle any potential Jet datasource, field list, target table, and SQL Select statement. I wanted something that could be reused in a variety of different applications and not require any hard coding of these parameters or make me implement different ways to do this in each application. I needed a good object oriented design with high reusability. To this end I created the JetDbConnect.dll

Structure

Properties

  • Closed – OleDbConnection closed or open
  • ConnectionString (ReadOnly)
  • DataSource – the file path to your Jet compatible datasource. The ONLY property you must set either by assignment or through instantiation.
  • DbConnection (ReadOnly)
  • ExtendedProperties – if you need to use a Jet datasource that is not directly supported, here is where you would give the connection string parameters that would allow you to use it
  • StorageMethod – one of the 4 enums of the file types this handles directly (Text, Excel, Access, Dbase). Should never need to be set unless you have a really messed up filename.
  • TablesList – List of Strings of available tables in the datasource
  • TargetTable – The table from which to query data, only needed in Excel & Access. If not given for those then it picks 1st table on table list
  • TextFormat – enum: FixedWidth or Delimited. If not set assumes delimited with default delimiter set in your registry settings

Methods

  • New
    • () – Default
    • (DataSource as string, TextFormat Enum)
      • This is used ONLY with text-based datasources
      • DataSource = path to datasource
      • TextFormat Enum is one of the 2 members of this Enum
    • (DataSource as string, [TargetTable] as string)
      • This can be used with any datasource type.
      • If you use it with a text-based datasource, and it is not delimited with your default delimiter, then you will also have to set the TextFormat enum elsewhere BEFORE using one of the Command,Reader, or Records methods
  • Close() – manually close the connection. Not required but can be useful
  • CommandGetAll, ReaderGetAll, RecordsGetAll ([DataSource] as string, [TargetTable] as string)
    • All 3 methods take the same parameters, an optional datasource path and an optional target table name
    • These methods allow you to easily retrieve an OleDbCommand, OleDbReader, or DataTable containing all the records in your target table of your datasource
    • If you don’t set DataSource here and it has not been set elsewhere, this method will fail
    • If you don’t set TargetTable here and it is an Access or Excel datasource and TargetTable is not set elsewhere, then “Sheet1″ or the first member of TablesList will be used for Excel or Access datasources respectively
  • CommandGetSql, ReaderGetSql, RecordsGetSql (sqlWhere as string, [fields] as string, [DataSource] as string, [TargetTable] as string)
    • All 3 methods take the same parameters:
      • Required: sqlWhere string, ex. “zipcode=’78258′ AND land_value>10000″
      • Optional: fields. A comma delimited list of fields to return with the query. ex. “owner_name, zipcode, land_value”
      • Optional: DataSource & TargetTable – see above for how the default values are handled for these 2 parameters
    • These methods allow you to easily retrieve an OleDbCommand, OleDbReader, or DataTable containing just the records in your target table of your datasource which meet your SQL criteria and optionally to return only the specified fields
    • If you want ALL the fields, pass Nothing or Void for fields. An empty string will cause an error.
  • SchemaTable
    • Returns a Schema DataTable for the DataSource not a TargetTable.
    • If you want a Schema DataTable for a table then get it from an OleDbReader on that table
  • FromXml, ToXml
    • Provides methods for serializing & de-serializing the object.
    • Since only DataSource is required for text & Dbase files and DataSource & TargetTable for Access and Excel, I don’t see these methods as being particularly useful. However, I always like to implement them just in case I need to persist an object or compare an object to a previous state.

Usage

Examples:

Get an OleDbReader for all records in an Excel file from a named sheet

dim jetdb as new JetDbConnect(“C:\myfiles\ExcelFile.xls”, “Hays County”)

dim reader as OleDbReader = jetdb.ReaderGetAll()

-or-

dim jetdb as new JetDbConnect

dim reader as OleDbReader = jetdb.ReaderGetAll(“C:\myfiles\ExcelFile.xls”, “Hays County”)

note: the class automatically adds the required decorations so that Hays County becomes [Hays County$] as required for SQL statements on an Excel datasource

Get a DataTable with select records & fields from delimited text file

dim jetdb as new JetDbConnect(“C:\myfiles\some text file.txt”)

dim dt as DataTable = jetdb.RecordsGetSql(“zipcode=’78258′ AND land_value>10000″,”owner_name, zipcode, land_value”)

note: I did not need to ever specify the TargetTable attribute since it is automatically set to “some text file.txt” and the Data Source is actually set to “C:\myfiles”

Get a list of available Sheets and Ranges from an Excel file without the need for any Office libaries or applications

dim jetdb as new JetDbConnect(“C:\myfiles\ExcelFile.xls”)

dim tables as List(of string) = jetdb.TablesList

Downloads

Just the DLL

The Visual Studio 2005 project





Dev Summit Continues

22 03 2007

Day 2 , Wed Mar 21

I was really hanging this morning, for someone who has a beer or glass of wine only occasionally, going beer for beer with professional buisness travelers is never a good idea. But it was free and pretty decent beer so I can’t complain too much.

General Thoughts

I got a whole lot more out of today’s sessions and meetings than the 1st day. The talks I attended went into more depth, the .Net SIG was good, and I took Brian & James’s advice and spent more time talking with devs in the Community center.

A few interesting things I learned in these informal conversations

  • ArcGIS Explorer (AGX) will NOT have a custom skining option. You can mess around with some config files and the AGX document to change color themes, but that is about it. So a custom branded AGX that looked completely like your product but was actually AGX under the art, won’t be easily done.
  • AGX has no inherent editing capabilities, no area measurement tool and no way to discover any attributes about data sources other than local vector sources. You have to capture mouse events and do that work yourself.
  • AGX has no way of directly interperting ArcGIS geometeries, you must provide a map/globe service that is spitting out rasterized data or convert that geometery to E2 geometery yourself.
  • ArcWeb Services can add an ArcIMS service & soon an ArcGIS Server service to maps, but GeoRSS feeds, PostGIS, WMS, OGC, local/server stored vectors (shapefiles,PGDB, GDB, SDE, etc) all have to be progamatically adding in by transforming and adding the geometery to the map yourself. The only other way to get those alternate sources in is to intergrate them with ArcIMS or ArcGIS Server. If you do have a WMS you could but it through the OGC/ArcIMS translator servlet and leverage it that way.
  • A large number of people have compalined about the highly fragmented documentation, poor indexing, incomplete docs, and lack of community comments on EDN & other ESRI help sites. It may just be conference talk but it looks like at least a few people at ESRI are begining to listen. Time will tell.

Sessions Attended

  • Programing Custom Tasks for ArcGIS Explorer
    • The task framework is not direct & intuitive. RTFM on that or you will really be lost. Not directly supported formats such as GeoRSS are not terribly difficult to transform into E2 geometeries and create content from a vast array of data sources. Just don’t expect AGX to do that for you. Deploying custom tasks is SUPER easy. Deploying a malicous task is also possible so watch what you download. They are looking for ways to patch that which won’t create Admin required installs. AGX2 is in the works and the more feedback you give them on the forums and directly, the better it will be.
  • Special Interest Group: .Net
    • A brief marketing talk about MS Visual Studio Team Foundation Server was followed up by Dave Bouwman’s great talk about wrapping feature classes, SDE, & GDB in .Net abstractions using well known code gen techinques from non-spatial DB. It pulled together some of his posts on the subject and made it all look like such a better way to code. I’ve actually used the same techinques on shapefiles using RapTier and either the Jet OleDb or the ESRI OleDb drivers. Shapefiles are not as ammenable to being edited in this manner as PGDB or other real database storage formats are. I can read the data very well and databind on it but updating the gives rather unpredictable results.
    • I would strongly recommend visiting Dave’s blog and checking out his talk and an resources it is an exicting concept. It also makes you wonder, why they don’t just offer Dave a position or short-term contract at ESRI to incorporate all they modern coding techniques in at least ArcGIS 10 if not 9.3. When you see simple object property assignments rather than the AO way of doing it and the self policing code it creates, you wonder why in the world would you want to do it the old way.
    • James Fee hit on the craziness that is the 9.2 server licensing & the unreasonable expectation to pay 1/2 of the 1st license cost (of ~20k at retail) to put the Web ADF on another machine. He was told, “We know people don’t like it, but that is the business model for today. We may be changing it in future releases but don’t hold your breath” (paraphrasing, not a direct quote).
  • Building & Deploying Enterprise Solutions with ArcGIS Server
    • Not a lot of good tech meat here, just an eye opening talk about all the hoops & roadblocks that deploying ESRI server products in lerge organizations that don’t conform to ESRI’s way of doing things can cause.
  • Developing Custom Web Tasks using the .NET Web ADF
    • A lot of technical meat in this session. It went a long way to demonstrate that doing interesting things using this framework is not going to be so easy. You really have to jump through a lot of hoops to make cool things happen, and you have to jump through too many hoops to make super simple things happen. They covered a lot of techniques, tips, & tricks for working with this and I highly recomend veiwing this session on the post-conference web site if available. Especially the code that way demonstrated at the last 3rd of the talk.
  • ArcGIS Mobile SDK
    • A super energetic talk was presented by Jeff & Mike. They have really worked hard to abstract all the underlying native C++ code into a 1st class .NET SDK. They have built a really nice toolkit for VisStudio & using it and the Mobile 5 libaries, you can get some nice applictions working with a very minimal amount of code. It relies heavily on MapCaches produced by ArcGIS Server and they didn’t go into creating those or pushing them out but I’m hoping to get that info in a later session. This was my favorite talk by ESRI thus far.




GIS Data Distribution & Visualization Methods

14 08 2006

I am attempting to put together a list of the various methods by which one can distribute GIS data AND the ability to visualize it to non-GIS users. Below is my 1st pass at making such a list. My main focus is the ability to distribute private data in a controlled manner. The end user for this system is typically someone with limited computer skills and no previous GIS use. They would need to search for & identify features, turn layers on/off, and print or export maps. Some users may also want to create their own custom layer by selecting a sub-set of an existing layer and applying different symbolization to it. They do not need a fully featured GIS, nor do they want to need much if any training to use the system.

I will be using this list in a presentation & proposal to help us decide the best method for distributing our multi-county project to outside of the 4 related companies that I developed it for.

I am further expanding on this list to include pros, cons, & costs related to each method along with links to the software & examples of implementations. The expanded list can be found at www.kestrelcomputer.com/GIS-Distribution.htm. I will post the fully expanded list on Friday Aug 18.

Please let me know if you think I have left any important method, software, or technology out.

GIS Data Visualization & Distribution Methods

Physical
Data & Viewer Reside on Client Computer or Removable Storage

  • DVD – Tatuk Viewer, ArcView Project, and Data & Imagery Files
  • DVD – ArcExplorer, ArcPress Document, and Data & Imagery Files
  • CD – Either of Above Options

Web
Data & Viewer Reside on Internet or Intranet

  • ArcIMS Viewer
  • ArcWeb Services Viewer
  • ArcWeb Explorer (ASP.Net or PHP Custom Version)
  • Google Maps (ASP.Net or PHP Custom Version)
  • SharpMap (ASP.Net Custom Implementation)
  • MapServer
  • MapBuilder
  • MapXtreme 2005 .NET
  • GeoServer
  • DEMIS WebMap Server

Hybrid
Viewer Resides on Client Computer & Data Is Served From Internet/Intranet

  • ArcGIS Explorer
  • Google Earth
  • WorldWind (with customizations)
  • uDig (with customizations)
  • SharpMap (Windows App Version)
  • WorldWind (with customizations)
  • MapDotNet Server 2007
  • MapWindow GIS
  • Dapple (based on WorldWind)




WriteTo function in .NET MemoryStream NOT properly documented

2 06 2006

What the WriteTo function should say 

I've spent 12 hrs trying to upload address as a formatted XML file to MelissaData for address verification, correction, and geocoding. I just figured out the problem! The WriteTo function writes from the CURRENT cursor position to the end of the stream. Well, thats not what the documentation says. It says it writes the WHOLE stream. I've spent hours writing & rewriting code and porting code back and forth between C# & VB.NET. The sample code was in C# but I already had some data handling functions for my data in VB so I stayed with that. But it didn't work, so I thought maybe I'm missing a small step and I combined my intial data handling and thier WebRequest/Response code together in C#. I couldn't use thier code exactly but it was very close, only instead of writing the XML to a binary array, I was either writing it to a file or directly to the memory stream. I even thought, well maybe I'm just not properly tranforming the tags to the EXACT right stuff. So I tried ALL of the XML serialization methods presented here. Still no results from the web service. I had been writing the XML to both a file & the memory stream & then checking the file to make sure all tags were closed & formated correctly, etc.. I finnally decided to use the WriteTo function to write to a MessageBox rather that the RequestStream. What comes up in the box? NOTHING, because it is at EOS. Seek the cursor back to the begining and viola a valid response is finnaly recieved!! That was about 10hrs in front of a computer and several more thinking about the problem I could have avoided, if MS would have put those few more words in the documentation for the WriteTo function. Oh well, I now am very good at creating XML documents, querying them, and modifying them in a vareity of methods and I'm much more confindent in my C# writing abilities, so I'll chalk it all up to a learning experience.