exceldatareader/exceldatareader

Lightweight and fast library written in C# for reading Microsoft Excel files (2

Please feel free to fork and submit pull requests to the develop branch

ExcelDataReader

Lightweight and fast library written in C# for reading Microsoft Excel files (2.0-2007).

Please feel free to fork and submit pull requests to the develop branch.

If you are reporting an issue it is really useful if you can supply an example Excel file as this makes debugging much easier and without it we may not be able to resolve any problems.

Continuous integration

Branch Build status
develop
master

Supported file formats and versions

File Type Container Format File Format Excel Version(s)
.xlsx ZIP, CFB+ZIP OpenXml 2007 and newer
.xlsb ZIP, CFB OpenXml 2007 and newer
.xls CFB BIFF8 97, 2000, XP, 2003
98, 2001, v.X, 2004 (Mac)
.xls CFB BIFF5 5.0, 95
.xls - BIFF4 4.0
.xls - BIFF3 3.0
.xls - BIFF2 2.0, 2.2
.csv - CSV (All)

Finding the binaries

It is recommended to use NuGet through the VS Package Manager Console Install-Package <package> or using the VS "Manage NuGet Packages..." extension.

As of ExcelDataReader version 3.0, the project was split into multiple packages:

Install the ExcelDataReader base package to use the "low level" reader interface. Compatible with net20, net45, netstandard1.3 and netstandard2.0.

Install the ExcelDataReader.DataSet extension package to use the AsDataSet() method to populate a System.Data.DataSet. This will also pull in the base package. Compatible with net20, net45 and netstandard2.0.

How to use

Reading .CSV files

Use ExcelReaderFactory.CreateCsvReader instead of CreateReader to parse a stream of plain text with comma separated values.

See also the configuration options FallbackEncoding and AutodetectSeparators.

The input CSV is always parsed once completely to set FieldCount, RowCount, Encoding, Separator (or twice if the CSV lacks BOM and is not UTF8), and then parsed once again while iterating the row records. Throws System.Text.DecoderFallbackException if the input cannot be parsed with the specified encoding.

The reader returns all CSV field values as strings and makes no attempts to convert the data to numbers or dates. This caller is responsible for interpreting the CSV data.

Using the reader methods

The AsDataSet() extension method is a convenient helper for quickly getting the data, but is not always available or desirable to use. IExcelDataReader extends the System.Data.IDataReader and IDataRecord interfaces to navigate and retrieve data at a lower level. The most important reader methods and properties:

  • Read() reads a row from the current sheet.
  • NextResult() advances the cursor to the next sheet.
  • ResultsCount returns the number of sheets in the current workbook.
  • Name returns the name of the current sheet.
  • CodeName returns the VBA code name identifier of the current sheet.
  • FieldCount returns the number of columns in the current sheet.
  • RowCount returns the number of rows in the current sheet. This includes terminal empty rows which are otherwise excluded by AsDataSet(). Throws InvalidOperationException on CSV files when used with AnalyzeInitialCsvRows.
  • HeaderFooter returns an object with information about the headers and footers, or null if there are none.
  • MergeCells returns an array of merged cell ranges in the current sheet.
  • RowHeight returns the visual height of the current row in points. May be 0 if the row is hidden.
  • GetColumnWidth() returns the width of a column in character units. May be 0 if the column is hidden.
  • GetFieldType() returns the type of a value in the current row. Always one of the types supported by Excel: double, int, bool, DateTime, TimeSpan, string, or null if there is no value.
  • IsDBNull() checks if a value in the current row is null.
  • GetValue() returns a value from the current row as an object, or null if there is no value.
  • GetDouble(), GetInt32(), GetBoolean(), GetDateTime(), GetString() return a value from the current row cast to their respective type.
  • GetNumberFormatString() returns a string containing the formatting codes for a value in the current row, or null if there is no value. See also the Formatting section below.
  • GetNumberFormatIndex() returns the number format index for a value in the current row. Index values below 164 refer to built-in number formats, otherwise indicate a custom number format.
  • GetCellStyle() returns an object containing style information for a cell in the current row: indent, horizontal alignment, hidden, locked.
  • The typed Get*() methods throw InvalidCastException unless the types match exactly.

CreateReader() configuration options

The ExcelReaderFactory.CreateReader(), CreateBinaryReader(), CreateOpenXmlReader(), CreateCsvReader() methods accept an optional configuration object to modify the behavior of the reader:

AsDataSet() configuration options

The AsDataSet() method accepts an optional configuration object to modify the behavior of the DataSet conversion:

Setting up AsDataSet() configuration, use the FilterRow callback to implement a "progress indicator" while loading, e.g.:

Formatting

ExcelDataReader does not support formatting directly. Users may retreive the number format string for a cell through IExcelDataReader.GetNumberFormatString(i) and use the third party ExcelNumberFormat library for formatting purposes.

Example helper method using ExcelDataReader and ExcelNumberFormat to format a value:

See also:

  • https://github.com/andersnm/ExcelNumberFormat
  • #404

Important note when upgrading from ExcelDataReader 2.x

ExcelDataReader 3 had some breaking changes, and older code may produce error messages similar to:

To fix:

  1. Make sure to rename any Excel namespace references in the code to the new namespace ExcelDataReader

  2. Make sure the project has a reference to the ExcelDataReader.DataSet package to use AsDataSet()

  3. Remove the line of code with IsFirstRowAsColumnNames and change the call to AsDataSet() to something like this:

Important note on .NET Core

By default, ExcelDataReader throws a NotSupportedException "No data is available for encoding 1252." on .NET Core.

To fix, add a dependency to the package System.Text.Encoding.CodePages and then add code to register the code page provider during application initialization (f.ex in Startup.cs):

This is required to parse strings in binary BIFF2-5 Excel documents encoded with DOS-era code pages. These encodings are registered by default in the full .NET Framework, but not on .NET Core.

Issues

Quick list of the latest Issues we found

rjlyon

rjlyon

Icon For Comments0

I cannot share the document in question, but I have a .xlsb file where it reads around 500 shared strings instead of the 3917 in the sharedStrings file. I cannot replicate this issue outside of the file that isn't working for me.

It appears that in BiffReader.cs, when reading the stream it returns null if the bytes read are not the same as the bytes requested. However, this can be expected behaviour according to the docs.

If public override Record? Read() has the stream reading modified it fixes this issue: Change

To

I will look to create a PR for this issue

superware

superware

Icon For Comments0

Especially core properties such as Created, Last Modified, Title etc. (e.g. docProps\core.xml, docProps\app.xml), probably as IDictionary<string, object>.

P.s. great work!

gchling

gchling

Icon For Comments1

For some merge cells, The read data do not match the displayed data. The Data like:

The A8:A11 are merged,so the excel only display A8, But A9 has some string, the read function shoud ignore the string in A9.

andersnm

andersnm

Icon For Comments5

This is a kind of announcement, lacking any other suitable forums for communication.

I just created an organization for ExcelDataReader on Nuget and added @Ian1971 and @wizardgsz as members. Please check your email for continued access. This is not a takeover :)

Ownership of the ExcelDataReader and ExcelDataReader.DataSet packages have been transferred to the new organization and all individual users were removed as package owners. I have not added inactive users back to the organization.

Background: A NuGet PM at Microsoft reached out and asked if we were interested in reserving an ID prefix. They suggested to create an organization before reserving the prefix, and that part is done now. The packages should get a nice checkmark real soon. The "ExcelDataReader" prefix will be public, allowing other contributors to upload packages like "ExcelDataReader.Mapping" still - although without the checkmark.

@appel1 I can add you to the organization too if you want, please let me know your username on Nuget.

ComeOutToPlay

ComeOutToPlay

Icon For Comments1

I have a file tab separated and one of the cell values is a double quote ("). When I read the row using the reader, it understand it is a longer value. My test code is as follows:

The "quote.csv" file I used is the following:

I found in the code that the QuoteChar is a private variable with value '"'. Do you know a way of parsing correctly the row containing a double quote?

Best regards.

heretic13

heretic13

Icon For Comments1
  1. An example of reading data from a table is poorly designed. The method of data reading is not clear.

  2. Substitution of concepts.

"GetDouble(), GetInt32(), GetBoolean(), GetDateTime(), GetString() return a value from the current row cast to their respective type."

These methods get the value from the cell, not from row.

Documentation requires verification

toddplayer

toddplayer

Icon For Comments2

Hello. I am trying to read an xlsx file from an Azure File Share using the AzureShareFileClient and I am not able to get the code below to execute. It was working before on an older Azure Storage library but switching to the newer Azure.Storage.Files.Shares seems to have broken it. The thing is I get no error or other output - this is a .Net console app and it just stops running when it reaches ExcelReaderFactory.CreateReader in the snippet below. I have checked all the obvious things I can think, including if the file exists using if (await fileToLoad.ExistsAsync()) and that returns true so I can see the file in question. Do you have any suggestions of how I might be able to get this to work?

using (var stream = await fileToLoad.OpenReadAsync()) { Console.WriteLine("success 1"); using (var reader = ExcelReaderFactory.CreateReader(stream)) { Console.WriteLine("success 2"); ...do more stuff } }

miranemec

miranemec

Icon For Comments6

Hi, your work is really great and save me lots of time. Thank you very much!

Is there any chance to get only filtered rows from excel? I've been using it in other solutions and It will be very nice to do it with your nugget, to make code cleaner and nicer.

Thank you!

marowar13

marowar13

Icon For Comments1

Hi,

I have etl ssis and I get the following error when executing the package where I have added the libraries ExcelDataReader and ExcelDataReader.Dataset

Error: could not load type ExcelDataReader.ExcelDataReaderExtensions from aseembly ExcelDataReader, Version=3.7.0.0, Culture=neutral, PublicKeyToken=93517dbe6a4012fa.

in development and testing environment it works fine but in production it doesn't

my code:

WhatsApp Image 2021-09-13 at 11 28 59 AM

Please help me!

Crazyloon

Crazyloon

Icon For Comments0

I can't tell if this is intentional, but I am using ExcelDataFactory.CreateCsvReader(fileStream) and when I read a .csv file that contains a space:

It is automatically trimmed, so that when I store the value in a string, instead of the value being a literal space, it is an empty string. The workaround is to ensure the value is set as a literal space in the .csv file like so:

If you take a look an alternative tool, like the Microsoft.VisualBasic.IO.TextFieldParser, it allows for the TrimWhiteSpace bool to be set to true or false. Setting it to false will cause a value like , , to be read as a literal space.

PriyadarshaniPatil

PriyadarshaniPatil

Icon For Comments3

I am running Specflow tests on the pipeline with the Specflow External Data Package. It is working fine on the local machine but it is giving the following error on the Azure Pipeline. "Could not load file or assembly 'ExcelDataReader, Version=3.6.0.0, Culture=neutral, PublicKeyToken=93517dbe6a4012fa'. The system cannot find the file specified." The assembly is restored and present but still build is getting failed. I have raised the same issue in the Soecflow community but not sure what's wrong with the pipeline.
Note- the specified file is also present on the mentioned path.

Any lead is appreciated. Thanks!

iamlozano

iamlozano

Icon For Comments2

Cycle detected Download the ExcelDataReadeR.DataSet package first and then ExcelDataReader, which as it says in the package document, these were divided into two. The error it throws at me in VS is this error: Severity Code Description Project File Line Status suppressed Error NU1108 Cycle detected. ExcelDataReader -> ExcelDataReader (> = 2.1.0). ExcelDataReader C: \ Users \ Admin \ source \ repos \ ExcelDataReader \ ExcelDataReader \ ExcelDataReader.csproj 1

Is it a problem with the package or is it that I am installing it the wrong way? (I am using .NET 3.1)

archiero

archiero

Icon For Comments0

Is there any way that you can add support to get the separator chosen by ExcelCsvReader? I know that there's a property named "Separator" in CsvWorksheet that seems to be the information that I want. I would like access to it, if it's possible.

jbrockerville

jbrockerville

Icon For Comments15

Hi,

Incredibly useful library! However, like some other users, I have a date-time issue. The ones I looked at, and mine, seem to stem from the fact that the worksheet implementations detect and convert OA dates to DateTime or TimeSpan objects. For example, the XlsxWorksheet. From there, the problems seem like issues with system region formatting and Excel formatting and their interplay. This was my issue too until I figured out that the library was doing a conversion before the system region formatting came into play. I worked through it by understanding it. Cool. But I have found instances where I simply do not want any conversion whatsoever. I want to take Excel's and the system's formatting completely out of the equation and work with the raw double. Unfortunately, there does not seem to be an option to simply not do this and return the raw OA date double value, or at least none that I could find. Can you do that? Just return the OA date double? If not, could you please add the option to do so?

Thanks! -jb

emshihab008

emshihab008

Icon For Comments1

Hello Everyone, I have an enquiry regarding recovery of old password which is used to protect excel sheet. Can anyone provides any solution? (for my cases, it is highly important to know the old password).

Regards, Ekram

SpiritBob

SpiritBob

Icon For Comments4

The library does an excellent job parsing dates to DateTime objects, but in doing so makes it impossible to distinguish if the cell originally contained the value dd/mm/YYYY instead of dd/mm/YYYY hh:mm:ss tt.

I feel like this is important for cases where you need to verify a date against a specific format, as the above would ruin the validation if you expect a format without time, as the date you receive will always be with a time. (DateTime objects without time set, default to the time of 12:00:00 AM)

Are there any possible solution to this, other than wrapping the DateTime object the library produces in a custom type that could help alleviate these differences, as changing the returned object would undoubtedly be a breaking change for many current users of the API?

appel1

appel1

Icon For Comments7

Looks like we're missing some built-in formats. 0-163 are reserved for built-in formats but we only have up to 49. 55 looks to be a date when saving with Excel on a computer set to Japanese.

Need to find out if these are documented anywhere.

gojimmypi

gojimmypi

Icon For Comments7

When using SqlBulkCopy to insert stream reader data, at times it is desirable to have additional fields in the target table that is being populated, not found in the source XLS file. For example, the bulk insert SQL target table may support uploads from multiple users, but a specific data update at a given time would be interested only a single user and respective upload rows. One solution is to "inject" static field values into the stream to identify the user and perhaps a session id.

Before doing any of this, https://github.com/ExcelDataReader/ExcelDataReader/issues/536 needs to be address due to the runtime error that otherwise occurs during bulk insert due to the not-yet-implemented GetName and GetOrdinal as described in https://github.com/ExcelDataReader/ExcelDataReader/issues/535

See PR https://github.com/ExcelDataReader/ExcelDataReader/pull/539

Versions

Quick list of the latest released versions

v3.7.0-develop00310 - Aug 09, 2020

Highlights in ExcelDataReader 3.7.0-develop00310 since ExcelDataReader 3.7.0-develop00306:

  • New GetCellError() method returns error enum for a cell
  • Bugfix: Fixed reading XLS on Unity in some timezones
  • Bugfix: Fixed "Bad SST format" exception in certain XLS (partial regression since 3.2.0)

v3.7.0-develop00306 - Aug 02, 2020

Highlights in ExcelDataReader 3.7.0-develop00306 since 3.6.0:

  • Support for XLSB format
  • New GetCellStyle method returns indent, horizontal alignment, hidden, locked for a cell
  • Bugfix: Fixed reading certain password protected XLS using 40bit keys
  • Bugfix: Fixed reading XLS with duplicate FORMAT records
  • Bugfix: Fixed reading XLS with multiple workbook streams
  • Bugfix: Fixed reading XLS strings with font encoding
  • Performance improvements

v3.6.0 - May 02, 2019

Highlights in ExcelDataReader 3.6.0 since 3.5.0:

  • New AnalyzeInitialCsvRows configuration option can speed up reading large CSV files
  • Bugfix: Fixed reading some XLSX returned no rows. Regression since 3.5.0

v3.5.0 - Mar 01, 2019

Highlights in ExcelDataReader 3.5.0 since 3.4.2:

  • New GetNumberFormatIndex() method returns the index of the built-in or custom number format in a cell
  • New GetColumnWidth() method returns the width of a column, or 0 if collapsed or hidden
  • New FilterSheet callback for AsDataSet
  • New LeaveOpen option leaves the stream open after the IExcelDataReader object is disposed
  • Optimized XLS row offset map internals
  • Bugfix: Improved support for number formats in BIFF2 XLS
  • Bugfix: Support XLS with non-standard headers, and/or mixed version BIFF records typically generated by third parties
  • Bugfix: XLS parser throws exceptions derived from ExcelReaderException upon errors
  • Bugfix: Handle XLSX zip files with backslash
  • Enabled SourceLink

v3.4.2 - Oct 14, 2018

Highlights in ExcelDataReader 3.4.2 since 3.4.1:

  • Bugfix: Fixed reading XLSX where packaging metadata uses nondefault namespace prefix
  • Bugfix: Fixed reading CSV where last line ends with a separator

v3.4.1 - Aug 29, 2018

Highlights in ExcelDataReader 3.4.1 since 3.4.0:

  • Optimized CSV parser
  • Bugfix: Fixed reading CSV with single quotes
  • Bugfix: Fixed reading cells with error as null in XLSX
  • Bugfix: Fixed crash when closing XLS and CSV readers more than once
  • Bugfix: Handle inline formatted text in XLS
  • Bugfix: Handle XLS with missing EOF record before BOF record
  • Bugfix: Handle XLSX files with one row element per cell
  • Bugfix: Fixed compile error in VS2010 in net35 and net40 apps using AsDataSet

v3.4.0 - Feb 04, 2018

Highlights in ExcelDataReader 3.4.0 since 3.3.0:

  • New MergeCells property returns merged cell ranges in the current sheet
  • New RowCount property returns number of rows in the current sheet
  • New FilterColumn callback option for AsDataSet()
  • Initial support for reading CSV
  • Bugfix: Fixed hang when reading certain broken XLS. Regression since 3.3.0
  • Bugfix: Fixed case insensitive lookup of XLSX ZIP entries
  • Bugfix: Fixed exceptions reading Compound Document directory entries with bad data

v3.3.0 - Nov 24, 2017

Highlights in ExcelDataReader 3.3.0 since 3.2.0:

  • Support for TimeSpan values. GetValue() returns values of type TimeSpan when the number format contains duration formatting tokens [hh], [mm], or [ss]. This is a breaking change for users who attempted to work around the broken DateTime-based durations before
  • New IExcelDataReader.GetNumberFormatString() method returns the number format string for a value
  • Improved performance of Reset() and AsDataSet() methods to compensate for slowness introduced by an XLSX compatibility fix in 3.2.0
  • Bugfix: Fixed problems reading certain XLS where unfortunately aligned SST contents caused an exception. Regression since 3.2.0
  • Bugfix: Missing password throws InvalidPasswordException instead of NullReferenceException
  • Bugfix: Fixed opening XLSX encrypted with the default Excel password

v3.2.0 - Sep 27, 2017

Highlights in ExcelDataReader 3.2.0 since 3.1.0:

  • Support for password protected workbooks
  • Support for netstandard2.0
  • New RowHeight property returns the height of the current row
  • New RowFilter configuration callback option for AsDataSet()
  • AsDataSet() trims trailing blank rows

v3.1.0 - Jul 23, 2017

Highlights in ExcelDataReader 3.1.0 since 3.0.0:

  • New HeaderFooter property contains information about headers and footers
  • New CodeName property contains the current sheet's VBA code name
  • Bugfix: Could not read XLS larger than ~15 megabytes
  • Bugfix: Could not read XLSX formatted text
  • Includes IntelliSense XML documentation

v3.0.0 - Jul 16, 2017

Breaking changes in ExcelDataReader 3.0.0 since 3.0.0-develop00086:

  • The IDataReader.Get* reader methods no longer attempt to convert to the requested type. Throws InvalidCastException if the field value is not of the expected type
  • Removed the ConvertOaDate option: always returns DateTimes if possible, otherwise the original value

v3.0.0-develop00086 - Jul 01, 2017

Highlights in ExcelDataReader 3.0.0-develop00086 since 3.0.0-develop00019:

  • Support for older XLS files back to Excel 2.0
  • Improved support for XLS generated by third parties; removed ReadOption.Loose/Strict
  • Support dates correctly in XLS/XLSX created on Mac
  • Support text dates, cached formula values, empty sheet paths in XLSX
  • Support fallback encoding in XLS
  • More flexible column name handling in DataSets; removed IsFirstRowAsColumnHeaders
  • CreateReader() factory methods take an ExcelReaderConfiguration object to control the reader's behavior
  • AsDataSet() method takes an ExcelDataSetConfiguration object to control how the DataSet is populated

v3.0.0-develop00019 - Jun 05, 2017

This is a major release with some breaking changes.

Highlights in ExcelDataReader 3.0.0-develop00019 since 2.1.2.3:

  • No longer using temp files for OpenXml
  • Packages for .NET Core / netstandard1.3. net20 and net45 still supported
  • Package no longer pulls in SharpZipLib on net45 and netstandard targets (but still does on net20)
  • Made AsDataSet() an extension method in separate ExcelDataReader.DataSet package
  • Added IExcelDataReader.Reset() method to restart the reader at the first result
  • No longer skips empty tables
  • No longer assumes 256 columns for XLS without dimensions
  • Always scans sheet data to determine maximum extents
  • Fixed various XLS string encoding issues
  • Renamed main namespace and DLL to ExcelDataReader (was Excel)

See #139 for the main PR.

Library Stats (Aug 29, 2022)

Subscribers: 189
Stars: 3.2K
Forks: 860
Issues: 110

NET Core API Endpoints

A project for supporting API Endpoints in ASP

NET Core API Endpoints

This project contains samples for

NET Core logging using various logger frameworks and interface

This project contains samples for

This project contains networking primitives for use with HTTP requests

It can be obtained from NuGet as-well and in our github repo

This project contains networking primitives for use with HTTP requests

A WebRTC based networking plugin for Unity3D

AirPeer allows Unity applications to communicate in a peer to peer manner using Christoph Kutza's WebRTC Network project with added features such as

A WebRTC based networking plugin for Unity3D

NET reverse proxy website that anyone can use

This project aims to create a very simple mechanism for serving

NET reverse proxy website that anyone can use

NET tool to execute other dotnet tools without installing them globally or in a project...

NET tool to execute other dotnet tools without installing them globally or in a project (a similar approach to npm)

NET tool to execute other dotnet tools without installing them globally or in a project...

OnDeviceCustomVision

NET Standard project and iOS, Android and UWP client projects

OnDeviceCustomVision

NET Project template

to adapt this template to your project

NET Project template

Power Distribution Networks

The project provides an introduction to power distribution networks

Power Distribution Networks

Unity-reinforcement-learning

A genetic algorithm applied to a neural network in a Unity project

Unity-reinforcement-learning

This project is an in progress

NET implementation of the TestDriver

This project is an in progress