This project has moved and is read-only. For the latest updates, please go here.

CSV format

Nov 7, 2012 at 2:31 PM

I've been trying to use the CsvLoader and have come a bit unstuck.

Where the underlying field is a string, the loader appears the entyr in the CSV file to start with a "'" (single quotation mark).  It then processes the remainder of the string up to the next comma.

If you add a terminating quote this is included in the value in the transient data base.  So

ID, Name, Description

1,Iain,Writer

2,'Iain','Writer'

3,'Iain,'Writer

 

ID 1 causes an exception as a ' is needed to prefix a string.

ID 2  returns Iain' and Writer' as the values in the database

ID 2 puts what you expect in the fields.

This was a bit confusing.  FIrstly, you can have CSV files without using quotes to escape fields.  You have to use quotes if there is a comma in the field, otherwise you don't.  If you do use quotes they should appear at the start and end of the field contents.

Also, my experience is that the double quote " is the more common delimiter rather than single quote.

I'm using code pulled by Git sometime last week as I'm working with DbContext.

Is this a bug or have I got the wrong end of the stick? :)

 

Iain

 

Nov 7, 2012 at 2:56 PM
Edited Nov 7, 2012 at 4:49 PM

Hello Iain,

This new format/behavior is intentional, but there is no release build that is currently using it. Once it is published, the corresponding tutorial will be updated too. The single quote mark is a prefix character that indicates that the content is a string. This is required because empty string and null are different values and without this indicator the empty value in the CSV file could be interpreted as empty string and null too. But with the addition of ', these values can be represented by "'" and "", respectively.

If you checked the Effort.CsvTool export tool (compile the source), you would see it exports a CSV file with this format:

"ID","Name","Description"
"1","'Iain","'Writer"

The single quote mark does not serve as a delimiter, but as an indicator as I mentioned before. The delimiter is the double quite mark. I have chose the single quote mark for this purpose, because it is used for solving a similar problem is Excel too. However, it is indeed could be confusing, I kindly accept better suggestions.

Thank you for your feedback!

Regards,
Tamas

Nov 7, 2012 at 3:09 PM

I see your point. 

I would suggest you make an empty delimited string the empty string and a string with no delimiters null.

So

ID, Name, Description

1,,"Description"

2,"",Description

ID would would have a null name and ID 2 a non null empty string.

If you do continue with the current scheme could I suggest you update the tutorial to make this distinction clear.

However - well done for Effort (that sounds wrong doesn't it?).  It's a great tool and is making my unit tests a good deal easier than they would other wise be.

Iain

Nov 7, 2012 at 3:35 PM

 I definitely like your idea, but I do not know currently how should I achieve it.

The CSV parser is a black box for me currently, and I was also planning to not include its source code in the Effort library anymore, but referencing it as an external DLL (http://nuget.org/packages/LumenWorks.Framework.IO.dll). This move is motivated by the intention of decoupling the data loader implementations from the core Library completely. Perhaps creating an external Effort.DataLoaders.Csv or Effort.Csv assembly with a customized CSV parser (hope the Lumenworks is allowed to be derived) is the way to go. I have to think about this.

In my opinion "Thank you for your Effort" sounds better :)

Nov 7, 2012 at 8:48 PM

I had a quick look at the Lumenworks code.  There is a MissingFieldAction setting, one of the options of which is ReplaceByNull.

I've not tried to do anything with it, but that might be worth a try.

On another question, one of the things that confused me is that I seem to have to name the csv files by the contents of the DbSet.  If I have

DbSet<Discipline> Disciplines {get; set;}

which produces a 'Disciplines' table, I appear to have to name it 'Discipline.csv' for Effort to pick it up.  I can see that this because of some metadata which I suppose comes from EF itself, but I'm not sure I understand this.

This is of particular relevance as one task tomorrow is to work out why I can't load a many to many table from a csv file matching the table name.  I guess the easiest thing is to step through the code and see what name the table has.  However, it has no 'real' existence in the code first layer, so I'm not sure what the outcome will be.

And, indeed, 'Thank you for your Effort'!

Iain

Nov 9, 2012 at 12:20 AM

I did it a try, but unfortunately MissingFieldAction is utilized when the current line has less fields than the number of header fields defined. However, I found out that the parser is published under the MIT lincense, so I can easily create a derived work.

For your other question: you can examine the storage schema of your Code First model by extracting the MetadataWorkspace from the underlying ObjectContext class.

DbContext context = new MyDbContext();
IObjectContextAdapter adapter = context as IObjectContextAdapter;
ItemCollection ssdl = adapter.ObjectContext.MetadataWorkspace.GetItemCollection(DataSpace.SSpace);
EntityContainer container = ssdl.GetItems<EntityContainer>().First();
EntitySet[] tables = container.BaseEntitySets.OfType<EntitySet>().ToArray();

Each EntitySet object in the tables array represents a database table that contain the name that should the name of the corresponding CSV file too.

 

Nov 10, 2012 at 10:11 AM

I kind of understand this – I read some of the code. My point is that these names don’t match the names which EF gives the tables when it creates them.

For example I have a many to many relationship between Competency and Attainments. The default EF link table would be named CompetencyAttainment, but I’ve used Fluent thingys to name it xCompetencyAttainment to match some internal naming convention.

It’s called xCompetencyAttainment in the database, but I have to name to csv file CompetencyAttainment. Much the same for the main entities where I’ve given the DbSet name a plural (Competencies), but the CSV file has to be named Competency.csv.

This is workable, but it would be easier if it looked for files with the external name of the table rather than the internal or default name.

Iain

From: tamasflamich [email removed]
Sent: 09 November 2012 00:21
To: iain@idcl.co.uk
Subject: Re: CSV format [effort:402262]

From: tamasflamich

I did it a try, but unfortunately MissingFieldAction is utilized when the current line has less fields than the number of header fields defined. However, I found out that the parser is published under the MIT lincense, so I can easily create a derived work.

For your other question: you can examine the storage schema of your Code First model by extracting the MetadataWorkspace from the underlying ObjectContext class.

DbContext context = new MyDbContext();
IObjectContextAdapter adapter = context as IObjectContextAdapter;
ItemCollection ssdl = adapter.ObjectContext.MetadataWorkspace.GetItemCollection(DataSpace.SSpace);
EntityContainer container = ssdl.GetItems<EntityContainer>().First();
EntitySet[] tables = container.BaseEntitySets.OfType<EntitySet>().ToArray();

Each EntitySet object in the tables array represents a database table that contain the name that should the name of the corresponding CSV file too.

Nov 12, 2012 at 12:31 PM

This looks weird, because this code should return the names that EF named the logical tables as. Can you post an example code (DbContext implementation) please?

Nov 14, 2012 at 10:40 AM
Edited Nov 15, 2012 at 1:04 AM

I found out that the appropriate database table name can be read by observing the MetadataProperties collection. It's interesting that there is no such thing for the table columns.

private static string GetTableName(EntitySet entitySet)
{
    MetadataProperty property = entitySet
        .MetadataProperties
        .FirstOrDefault(p => p.Name == "Table");

    if (property == null)
    {
        return entitySet.Name;
    }

    return property.Value as string ?? entitySet.Name;
}

 

Nov 15, 2012 at 1:09 AM

The table naming issue is solved in the most recent source code.

Nov 15, 2012 at 11:54 PM

I also tuned the CSV parser to work as you suggested. You can download the source code and try it.

Nov 16, 2012 at 12:26 PM

Sounds great!

I've got my unit tests working with the old style.  What I'll do is wait till the other issue is resolved and then update across the board.

Thanks again! (trying very hard to avoid puns mentioning certain product names)