la lecture Excel Open XML est ignorer les cellules vides

Je suis en utilisant le solution retenue ici pour convertir une feuille excel dans une datatable. Cela fonctionne bien si j'ai "parfait" de données, mais si j'ai une cellule vide au milieu de mes données, il semble mettre le mauvais de données dans chaque colonne.

Je pense que c'est parce que dans le code ci-dessous:

row.Descendants<Cell>().Count()

est le nombre de peuplé de cellules (pas toutes les colonnes) ET:

GetCellValue(spreadSheetDocument, row.Descendants<Cell>().ElementAt(i));

semble trouver la prochaine peuplée de la cellule (pas nécessairement ce qui est de l'indice), donc si la première colonne est vide et je l'appelle ElementAt(0), il retourne la valeur de la deuxième colonne.

Voici tout le code d'analyse.

DataRow tempRow = dt.NewRow();

for (int i = 0; i < row.Descendants<Cell>().Count(); i++)
{
    tempRow[i] = GetCellValue(spreadSheetDocument, row.Descendants<Cell>().ElementAt(i));
    if (tempRow[i].ToString().IndexOf("Latency issues in") > -1)
    {
        Console.Write(tempRow[i].ToString());
    }
}

Merci de voir ceci réponse dans le même thread que vous avez mentionnés. Il a le correctif pour les cellules vides.

InformationsquelleAutor leora | 2010-10-01

Ce sens depuis Excel ne sera pas stocker une valeur dans une cellule qui est nulle. Si vous ouvrez votre fichier à l'aide du SDK Open XML 2.0 Outil de Productivité et de traverser le XML vers le bas au niveau de la cellule, vous verrez que seules les cellules qui ont des données dans ce fichier.

Sont vos options pour insérer les données vides dans la plage de cellules que vous allez traverser ou par programme figure une cellule a été ignoré et ajuster votre index de façon appropriée.

J'ai fait un exemple de document excel avec une chaîne de référence de la cellule A1 et C1. J'ai ensuite ouvert le document excel dans le format XML Ouvert Outil de Productivité et voici le XML qui a été stockée:

<x:row r="1" spans="1:3" 
   xmlns:x="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
  <x:c r="A1" t="s">
    <x:v>0</x:v>
  </x:c>
  <x:c r="C1" t="s">
    <x:v>1</x:v>
  </x:c>
</x:row>

Ici, vous verrez que les données correspondent à la première ligne et que seulement deux cellules de données sont enregistrés pour cette ligne. Les données enregistrées correspond à A1 et C1, et qu'aucune des cellules avec des valeurs null sont enregistrés.

Afin d'obtenir les fonctionnalités dont vous avez besoin, que vous pouvez parcourir sur les Cellules comme vous le faites ci-dessus, mais vous aurez besoin de vérifier ce que la valeur de la Cellule est de référencement et de déterminer si toutes les Cellules ont été ignorés. pour cela, vous aurez besoin de deux fonctions de l'utilitaire d'obtenir le Nom de la Colonne à partir de la référence de la cellule, puis de traduire ce nom de colonne dans un index de base zéro:

    private static List<char> Letters = new List<char>() { 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', ' ' };

    ///<summary>
    ///Given a cell name, parses the specified cell to get the column name.
    ///</summary>
    ///<param name="cellReference">Address of the cell (ie. B2)</param>
    ///<returns>Column Name (ie. B)</returns>
    public static string GetColumnName(string cellReference)
    {
        //Create a regular expression to match the column name portion of the cell name.
        Regex regex = new Regex("[A-Za-z]+");
        Match match = regex.Match(cellReference);

        return match.Value;
    }

    ///<summary>
    ///Given just the column name (no row index), it will return the zero based column index.
    ///Note: This method will only handle columns with a length of up to two (ie. A to Z and AA to ZZ). 
    ///A length of three can be implemented when needed.
    ///</summary>
    ///<param name="columnName">Column Name (ie. A or AB)</param>
    ///<returns>Zero based index if the conversion was successful; otherwise null</returns>
    public static int? GetColumnIndexFromName(string columnName)
    {
        int? columnIndex = null;

        string[] colLetters = Regex.Split(columnName, "([A-Z]+)");
        colLetters = colLetters.Where(s => !string.IsNullOrEmpty(s)).ToArray();

        if (colLetters.Count() <= 2)
        {
            int index = 0;
            foreach (string col in colLetters)
            {
                List<char> col1 = colLetters.ElementAt(index).ToCharArray().ToList();
                int? indexValue = Letters.IndexOf(col1.ElementAt(index));

                if (indexValue != -1)
                {
                    //The first letter of a two digit column needs some extra calculations
                    if (index == 0 && colLetters.Count() == 2)
                    {
                        columnIndex = columnIndex == null ? (indexValue + 1) * 26 : columnIndex + ((indexValue + 1) * 26);
                    }
                    else
                    {
                        columnIndex = columnIndex == null ? indexValue : columnIndex + indexValue;
                    }
                }

                index++;
            }
        }

        return columnIndex;
    }

Ensuite, vous pouvez effectuer une itération sur les Cellules et vérifier pour voir ce que la cellule de référence est comparée à la columnIndex. Si c'est moins que vous ajoutez des données vierges pour votre tempRow, sinon il suffit de lire dans la valeur contenue dans la cellule. (Note: je n'ai pas tester le code ci-dessous, mais l'idée générale devrait aider):

DataRow tempRow = dt.NewRow();

int columnIndex = 0;
foreach (Cell cell in row.Descendants<Cell>())
{
   //Gets the column index of the cell with data
   int cellColumnIndex = (int)GetColumnIndexFromName(GetColumnName(cell.CellReference));

   if (columnIndex < cellColumnIndex)
   {
      do
      {
         tempRow[columnIndex] = //Insert blank data here;
         columnIndex++;
      }
      while(columnIndex < cellColumnIndex);
    }
    tempRow[columnIndex] = GetCellValue(spreadSheetDocument, cell);

    if (tempRow[i].ToString().IndexOf("Latency issues in") > -1)
    {
       Console.Write(tempRow[i].ToString());
    }
    columnIndex++;
}

savez-vous quand même de détecter s'il y avait une cellule vide. c'est mon problème. je veux une solution qui lit dans exactement ce qui est sur la feuille (y compris les espaces)
Le seul moyen de détecter si la cellule de référence n'existe pas dans la liste des descendants de l'enfant
Voir @amurra la réponse de here pour voir une définition pour les "Lettres" de la Liste.
Que faire si j'ai des colonnes jusqu'AH?
Aussi parfois de la cellule.CellReference est null

InformationsquelleAutor amurra

Voici une implémentation de IEnumerable qui devrait faire ce que vous voulez, de la compilation et de l'unité testée.

    ///<summary>returns an empty cell when a blank cell is encountered
    ///</summary>
    public IEnumerator<Cell> GetEnumerator()
    {
        int currentCount = 0;

        //row is a class level variable representing the current
        //DocumentFormat.OpenXml.Spreadsheet.Row
        foreach (DocumentFormat.OpenXml.Spreadsheet.Cell cell in
            row.Descendants<DocumentFormat.OpenXml.Spreadsheet.Cell>())
        {
            string columnName = GetColumnName(cell.CellReference);

            int currentColumnIndex = ConvertColumnNameToNumber(columnName);

            for ( ; currentCount < currentColumnIndex; currentCount++)
            {
                yield return new DocumentFormat.OpenXml.Spreadsheet.Cell();
            }

            yield return cell;
            currentCount++;
        }
    }

Voici les fonctions qu'il repose sur:

    ///<summary>
    ///Given a cell name, parses the specified cell to get the column name.
    ///</summary>
    ///<param name="cellReference">Address of the cell (ie. B2)</param>
    ///<returns>Column Name (ie. B)</returns>
    public static string GetColumnName(string cellReference)
    {
        //Match the column name portion of the cell name.
        Regex regex = new Regex("[A-Za-z]+");
        Match match = regex.Match(cellReference);

        return match.Value;
    }

    ///<summary>
    ///Given just the column name (no row index),
    ///it will return the zero based column index.
    ///</summary>
    ///<param name="columnName">Column Name (ie. A or AB)</param>
    ///<returns>Zero based index if the conversion was successful</returns>
    ///<exception cref="ArgumentException">thrown if the given string
    ///contains characters other than uppercase letters</exception>
    public static int ConvertColumnNameToNumber(string columnName)
    {
        Regex alpha = new Regex("^[A-Z]+$");
        if (!alpha.IsMatch(columnName)) throw new ArgumentException();

        char[] colLetters = columnName.ToCharArray();
        Array.Reverse(colLetters);

        int convertedValue = 0;
        for (int i = 0; i < colLetters.Length; i++)
        {
            char letter = colLetters[i];
            int current = i == 0 ? letter - 65 : letter - 64; //ASCII 'A' = 65
            convertedValue += current * (int)Math.Pow(26, i);
        }

        return convertedValue;
    }

De le jeter dans une classe et lui donner un essai.

Quelqu'un peut-il me montrer comment mettre en œuvre de manière explicite? Merci!
Le contexte serait beaucoup mieux pour le énumérable exemple.

InformationsquelleAutor Waylon Flinn

Voici une version légèrement modifiée de Waylon réponse qui s'est également fondé sur d'autres réponses. Il est un condensé de sa méthode dans une classe.

J'ai changé

IEnumerator<Cell> GetEnumerator()

IEnumerable<Cell> GetRowCells(Row row)

Voici la classe, vous n'avez pas besoin d'instancier, il sert juste comme une classe utilitaire:

public class SpreedsheetHelper
{
    ///<summary>returns an empty cell when a blank cell is encountered
    ///</summary>
    public static IEnumerable<Cell> GetRowCells(Row row)
    {
        int currentCount = 0;

        foreach (DocumentFormat.OpenXml.Spreadsheet.Cell cell in
            row.Descendants<DocumentFormat.OpenXml.Spreadsheet.Cell>())
        {
            string columnName = GetColumnName(cell.CellReference);

            int currentColumnIndex = ConvertColumnNameToNumber(columnName);

            for (; currentCount < currentColumnIndex; currentCount++)
            {
                yield return new DocumentFormat.OpenXml.Spreadsheet.Cell();
            }

            yield return cell;
            currentCount++;
        }
    }

    ///<summary>
    ///Given a cell name, parses the specified cell to get the column name.
    ///</summary>
    ///<param name="cellReference">Address of the cell (ie. B2)</param>
    ///<returns>Column Name (ie. B)</returns>
    public static string GetColumnName(string cellReference)
    {
        //Match the column name portion of the cell name.
        var regex = new System.Text.RegularExpressions.Regex("[A-Za-z]+");
        var match = regex.Match(cellReference);

        return match.Value;
    }

    ///<summary>
    ///Given just the column name (no row index),
    ///it will return the zero based column index.
    ///</summary>
    ///<param name="columnName">Column Name (ie. A or AB)</param>
    ///<returns>Zero based index if the conversion was successful</returns>
    ///<exception cref="ArgumentException">thrown if the given string
    ///contains characters other than uppercase letters</exception>
    public static int ConvertColumnNameToNumber(string columnName)
    {
        var alpha = new System.Text.RegularExpressions.Regex("^[A-Z]+$");
        if (!alpha.IsMatch(columnName)) throw new ArgumentException();

        char[] colLetters = columnName.ToCharArray();
        Array.Reverse(colLetters);

        int convertedValue = 0;
        for (int i = 0; i < colLetters.Length; i++)
        {
            char letter = colLetters[i];
            int current = i == 0 ? letter - 65 : letter - 64; //ASCII 'A' = 65
            convertedValue += current * (int)Math.Pow(26, i);
        }

        return convertedValue;
    }
}

Maintenant, vous êtes en mesure d'obtenir toutes les lignes des " cellules de cette façon:

//skip the part that retrieves the worksheet sheetData
IEnumerable<Row> rows = sheetData.Descendants<Row>();
foreach(Row row in rows)
{
    IEnumerable<Cell> cells = SpreedsheetHelper.GetRowCells(row);
    foreach (Cell cell in cells)
    {
         //skip part that reads the text according to the cell-type
    }
}

Il contiendra toutes les cellules, même si elles sont vides.

InformationsquelleAutor Tim Schmelter

Voir ma mise en œuvre:

  Row[] rows = worksheet.GetFirstChild<SheetData>()
                .Elements<Row>()
                .ToArray();

  string[] columnNames = rows.First()
                .Elements<Cell>()
                .Select(cell => GetCellValue(cell, document))
                .ToArray();

  HeaderLetters = ExcelHeaderHelper.GetHeaderLetters((uint)columnNames.Count());

  if (columnNames.Count() != HeaderLetters.Count())
  {
       throw new ArgumentException("HeaderLetters");
  }

  IEnumerable<List<string>> cellValues = GetCellValues(rows.Skip(1), columnNames.Count(), document);

//Here you can enumerate through the cell values, based on the cell index the column names can be retrieved.

HeaderLetters sont recueillies à l'aide de cette classe:

    private static class ExcelHeaderHelper
    {
        public static string[] GetHeaderLetters(uint max)
        {
            var result = new List<string>();
            int i = 0;
            var columnPrefix = new Queue<string>();
            string prefix = null;
            int prevRoundNo = 0;
            uint maxPrefix = max /26;

            while (i < max)
            {
                int roundNo = i /26;
                if (prevRoundNo < roundNo)
                {
                    prefix = columnPrefix.Dequeue();
                    prevRoundNo = roundNo;
                }
                string item = prefix + ((char)(65 + (i % 26))).ToString(CultureInfo.InvariantCulture);
                if (i <= maxPrefix)
                {
                    columnPrefix.Enqueue(item);
                }
                result.Add(item);
                i++;
            }
            return result.ToArray();
        }
    }

Et les méthodes d'assistance sont:

    private static IEnumerable<List<string>> GetCellValues(IEnumerable<Row> rows, int columnCount, SpreadsheetDocument document)
    {
        var result = new List<List<string>>();
        foreach (var row in rows)
        {
            List<string> cellValues = new List<string>();
            var actualCells = row.Elements<Cell>().ToArray();

            int j = 0;
            for (int i = 0; i < columnCount; i++)
            {
                if (actualCells.Count() <= j || !actualCells[j].CellReference.ToString().StartsWith(HeaderLetters[i]))
                {
                    cellValues.Add(null);
                }
                else
                {
                    cellValues.Add(GetCellValue(actualCells[j], document));
                    j++;
                }
            }
            result.Add(cellValues);
        }
        return result;
    }


private static string GetCellValue(Cell cell, SpreadsheetDocument document)
{
    bool sstIndexedcell = GetCellType(cell);
    return sstIndexedcell
        ? GetSharedStringItemById(document.WorkbookPart, Convert.ToInt32(cell.InnerText))
        : cell.InnerText;
}

private static bool GetCellType(Cell cell)
{
    return cell.DataType != null && cell.DataType == CellValues.SharedString;
}

private static string GetSharedStringItemById(WorkbookPart workbookPart, int id)
{
    return workbookPart.SharedStringTablePart.SharedStringTable.Elements<SharedStringItem>().ElementAt(id).InnerText;
}

La solution traite avec cellule partagée éléments (SST indexé cellules).

InformationsquelleAutor jaccso

Tous de bons exemples. Voici celui que je suis de l'aide car j'ai besoin de garder une trace de toutes les lignes, des cellules, des valeurs et des titres de corrélation et d'analyse.

La méthode ReadSpreadsheet ouvre un xlxs fichier et passe par chaque feuille de calcul, de ligne et de colonne. Puisque les valeurs sont stockées dans un référencés chaîne de table, j'ai aussi utiliser explicitement que par feuille de calcul. Il y a d'autres classes utilisées: DSFunction et StaticVariables. Celui-ci tient souvent utilisé les valeurs des paramètres, tels que la référencées 'quotdouble' ( quotdouble = "\u0022"; ) et "crlf' (crlf = "\u000D" + "\u000A"; ).

Pertinentes DSFunction méthode GetIntColIndexForLetter est inclus ci-dessous. Elle retourne une valeur entière pour l'index de colonne correspondant à la lettre des noms tels que (A,B, AA, ADE, etc.). Ce est utilisé avec le paramètre "ncellcolref" pour déterminer si les colonnes ont été ignorées et d'entrer vide de la chaîne de valeurs pour chaque celui qui manque.

Je fais aussi un peu de nettoyage de l'valeurs avant de les stocker temporairement dans une Liste d'objet (à l'aide de remplacement de la méthode).

Par la suite, j'utilise une table de hachage (Dictionnaire) des noms de colonnes à extraire les valeurs dans les différentes feuilles de calcul, de les corréler, de créer des valeurs normalisées, et ensuite créer un objet utilisé dans notre produit qui est ensuite stocké dans un fichier XML. Rien de ce qui est montré mais pourquoi cette approche est utilisée.

    public static class DSFunction {

    ///<summary>
    ///Creates an integer value for a column letter name starting at 1 for 'a'
    ///</summary>
    ///<param name="lettstr">Column name as letters</param>
    ///<returns>int value</returns>
    public static int GetIntColIndexForLetter(string lettstr) {
        string txt = "", txt1="";
        int n1, result = 0, nbeg=-1, nitem=0;
        try {
            nbeg = (int)("a".ToCharArray()[0]) - 1; //1 based
            txt = lettstr;
            if (txt != "") txt = txt.ToLower().Trim();
            while (txt != "") {
                if (txt.Length > 1) {
                    txt1 = txt.Substring(0, 1);
                    txt = txt.Substring(1);
                }
                else {
                    txt1 = txt;
                    txt = "";
                }
                if (!DSFunction.IsNumberString(txt1, "real")) {
                    nitem++;
                    n1 = (int)(txt1.ToCharArray()[0]) - nbeg;
                    result += n1 + (nitem - 1) * 26;
                }
                else {
                    break;
                }
            }
        }
        catch (Exception ex) {
            txt = ex.Message;
        }
        return result;
    }


}


    public static class Extractor {

    public static string ReadSpreadsheet(string fileUri) {
        string msg = "", txt = "", txt1 = "";
        int i, n1, n2, nrow = -1, ncell = -1, ncellcolref = -1;
        Boolean haveheader = true;
        Dictionary<string, int> hashcolnames = new Dictionary<string, int>();
        List<string> colvalues = new List<string>();
        try {
            if (!File.Exists(fileUri)) { throw new Exception("file does not exist"); }
            using (SpreadsheetDocument ssdoc = SpreadsheetDocument.Open(fileUri, true)) {
                var stringTable = ssdoc.WorkbookPart.GetPartsOfType<SharedStringTablePart>().FirstOrDefault();
                foreach (Sheet sht in ssdoc.WorkbookPart.Workbook.Descendants<Sheet>()) {
                    nrow = 0;
                    foreach (Row ssrow in ((WorksheetPart)(ssdoc.WorkbookPart.GetPartById(sht.Id))).Worksheet.Descendants<Row>()) {
                        ncell = 0;
                        ncellcolref = 0;
                        nrow++;
                        colvalues.Clear();
                        foreach (Cell sscell in ssrow.Elements<Cell>()) {
                            ncell++;
                            n1 = DSFunction.GetIntColIndexForLetter(sscell.CellReference);
                            for (i = 0; i < (n1 - ncellcolref - 1); i++) {
                                if (nrow == 1 && haveheader) {
                                    txt1 = "-missing" + (ncellcolref + 1 + i).ToString() + "-";
                                    if (!hashcolnames.TryGetValue(txt1, out n2)) {
                                        hashcolnames.Add(txt1, ncell - 1);
                                    }
                                }
                                else {
                                    colvalues.Add("");
                                }
                            }
                            ncellcolref = n1;
                            if (sscell.DataType != null) {
                                if (sscell.DataType.Value == CellValues.SharedString && stringTable != null) {
                                    txt = stringTable.SharedStringTable.ElementAt(int.Parse(sscell.InnerText)).InnerText;
                                }
                                else if (sscell.DataType.Value == CellValues.String) {
                                    txt = sscell.InnerText;
                                }
                                else txt = sscell.InnerText.ToString();
                            }
                            else txt = sscell.InnerText;
                            if (txt != "") txt1 = txt.ToLower().Trim(); else txt1 = "";
                            if (nrow == 1 && haveheader) {
                                txt1 = txt1.Replace(" ", "");
                                if (txt1 == "table/viewname") txt1 = "tablename";
                                else if (txt1 == "schemaownername") txt1 = "schemaowner";
                                else if (txt1 == "subjectareaname") txt1 = "subjectarea";
                                else if (txt1.StartsWith("column")) {
                                    txt1 = txt1.Substring("column".Length);
                                }
                                if (!hashcolnames.TryGetValue(txt1, out n1)) {
                                    hashcolnames.Add(txt1, ncell - 1);
                                }
                            }
                            else {
                                txt = txt.Replace(((char)8220).ToString(), "'");  //special "
                                txt = txt.Replace(((char)8221).ToString(), "'"); //special "
                                txt = txt.Replace(StaticVariables.quotdouble, "'");
                                txt = txt.Replace(StaticVariables.crlf, " ");
                                txt = txt.Replace("  ", " ");
                                txt = txt.Replace("<", "");
                                txt = txt.Replace(">", "");
                                colvalues.Add(txt);
                            }
                        }
                    }
                }
            }
        }
        catch (Exception ex) {
            msg = "notok:" + ex.Message;
        }
        return msg;
    }





}

InformationsquelleAutor Geoffrey Malafsky

La lettre de code est une base de 26 encodage, donc cela devrait fonctionner pour la convertir en décalage.

//Converts letter code (i.e. AA) to an offset
public int offset( string code)
{
    var offset = 0;
    var byte_array = Encoding.ASCII.GetBytes( code ).Reverse().ToArray();
    for( var i = 0; i < byte_array.Length; i++ )
    {
        offset += (byte_array[i] - 65 + 1) * Convert.ToInt32(Math.Pow(26.0, Convert.ToDouble(i)));
    }
    return offset - 1;
}

InformationsquelleAutor howardlo

Vous pouvez utiliser cette fonction pour extraire une cellule à partir d'une ligne passant à l'en-tête de l'indice:

public static Cell GetCellFromRow(Row r ,int headerIdx) {
        string cellname = GetNthColumnName(headerIdx) + r.RowIndex.ToString();
        IEnumerable<Cell> cells = r.Elements<Cell>().Where(x=> x.CellReference == cellname);
        if (cells.Count() > 0)
        {
            return cells.First();
        }
        else {
            return null;
        }
}
public static string GetNthColumnName(int n)
    {
        string name = "";
        while (n > 0)
        {
            n--;
            name = (char)('A' + n % 26) + name;
            n /= 26;
        }
        return name;
    }

InformationsquelleAutor Renzo Ciot

D'accord, je ne suis pas un expert sur ce sujet, mais les autres réponses ne semblent comme la tuer pour moi, donc voici ma solution:

//Loop through each row in the spreadsheet, skipping the header row
foreach (var row in sheetData.Elements<Row>().Skip(1))
{
    var i = 0;
    string[] letters = new string[15] {"A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N", "O" };

    List<String> cellsList = new List<string>();
    foreach (var cell in row.Elements<Cell>().ToArray())
    {
        while (cell.CellReference.ToString()[0] != Convert.ToChar(letters[i]))
        {//accounts for multiple consecutive blank cells
            cellsList.Add("");
            i++;
        }
        cellsList.Add(cell.CellValue.Text);
        i++;
    }

    string[] cells = cellsList.ToArray();

    foreach(var cell in cellsList)
    {
        //display contents of cell, depending on the datatype you may need to call each of the cells manually
    }
}

Espère que quelqu'un trouve cette pratique!

InformationsquelleAutor Owain Reed

Avec mes excuses pour annonce encore une autre réponse à cette question, voici le code que j'ai utilisé.

J'ai eu des problèmes avec OpenXML ne fonctionne pas correctement si une feuille de calcul a une ligne vide dans la partie supérieure. Il serait parfois juste de retour d'un DataTable avec 0 lignes et 0 colonnes en elle. Le code ci-dessous s'adapte à cette et toutes les autres feuilles de calcul.

Voici comment vous appelez mon code. Il suffit de passer un nom de fichier et le nom de la Feuille de calcul à lire dans:

DataTable dt = OpenXMLHelper.ExcelWorksheetToDataTable("C:\\SQL Server\\SomeExcelFile.xlsx", "Mikes Worksheet");

Et voici le code lui-même:

    public class OpenXMLHelper
    {
        // A helper function to open an Excel file using OpenXML, and return a DataTable containing all the data from one
        // of the worksheets.
        //
        // We've had lots of problems reading in Excel data using OLEDB (eg the ACE drivers no longer being present on new servers,
        // OLEDB not working due to security issues, and blatantly ignoring blank rows at the top of worksheets), so this is a more 
        // stable method of reading in the data.
        //
        public static DataTable ExcelWorksheetToDataTable(string pathFilename, string worksheetName)
        {
            DataTable dt = new DataTable(worksheetName);

            using (SpreadsheetDocument document = SpreadsheetDocument.Open(pathFilename, false))
            {
                //Find the sheet with the supplied name, and then use that 
                //Sheet object to retrieve a reference to the first worksheet.
                Sheet theSheet = document.WorkbookPart.Workbook.Descendants<Sheet>().Where(s => s.Name == worksheetName).FirstOrDefault();
                if (theSheet == null)
                    throw new Exception("Couldn't find the worksheet: " + worksheetName);

                //Retrieve a reference to the worksheet part.
                WorksheetPart wsPart = (WorksheetPart)(document.WorkbookPart.GetPartById(theSheet.Id));
                Worksheet workSheet = wsPart.Worksheet;

                string dimensions = workSheet.SheetDimension.Reference.InnerText;       // Get the dimensions of this worksheet, eg "B2:F4"

                int numOfColumns = 0;
                int numOfRows = 0;
                CalculateDataTableSize(dimensions, ref numOfColumns, ref numOfRows);
                System.Diagnostics.Trace.WriteLine(string.Format("The worksheet \"{0}\" has dimensions \"{1}\", so we need a DataTable of size {2}x{3}.", worksheetName, dimensions, numOfColumns, numOfRows));

                SheetData sheetData = workSheet.GetFirstChild<SheetData>();
                IEnumerable<Row> rows = sheetData.Descendants<Row>();

                string[,] cellValues = new string[numOfColumns, numOfRows];

                int colInx = 0;
                int rowInx = 0;
                string value = "";
                SharedStringTablePart stringTablePart = document.WorkbookPart.SharedStringTablePart;

                // Iterate through each row of OpenXML data, and store each cell's value in the appropriate slot in our [,] string array.
                foreach (Row row in rows)
                {
                    for (int i = 0; i < row.Descendants<Cell>().Count(); i++)
                    {
                        // *DON'T* assume there's going to be one XML element for each column in each row...
                        Cell cell = row.Descendants<Cell>().ElementAt(i);
                        if (cell.CellValue == null || cell.CellReference == null)
                            continue;                       // eg when an Excel cell contains a blank string

                        // Convert this Excel cell's CellAddress into a 0-based offset into our array (eg "G13" -> [6, 12])
                        colInx = GetColumnIndexByName(cell.CellReference);             // eg "C" -> 2  (0-based)
                        rowInx = GetRowIndexFromCellAddress(cell.CellReference)-1;     // Needs to be 0-based

                        // Fetch the value in this cell
                        value = cell.CellValue.InnerXml;
                        if (cell.DataType != null && cell.DataType.Value == CellValues.SharedString)
                        {
                            value = stringTablePart.SharedStringTable.ChildElements[Int32.Parse(value)].InnerText;
                        }

                        cellValues[colInx, rowInx] = value;
                    }
                }

                // Copy the array of strings into a DataTable.
                // We don't (currently) make any attempt to work out which columns should be numeric, rather than string.
                for (int col = 0; col < numOfColumns; col++)
                    dt.Columns.Add("Column_" + col.ToString());

                for (int row = 0; row < numOfRows; row++)
                {
                    DataRow dataRow = dt.NewRow();
                    for (int col = 0; col < numOfColumns; col++)
                    {
                        dataRow.SetField(col, cellValues[col, row]);
                    }
                    dt.Rows.Add(dataRow);
                }

#if DEBUG
                // Write out the contents of our DataTable to the Output window (for debugging)
                string str = "";
                for (rowInx = 0; rowInx < maxNumOfRows; rowInx++)
                {
                    for (colInx = 0; colInx < maxNumOfColumns; colInx++)
                    {
                        object val = dt.Rows[rowInx].ItemArray[colInx];
                        str += (val == null) ? "" : val.ToString();
                        str += "\t";
                    }
                    str += "\n";
                }
                System.Diagnostics.Trace.WriteLine(str);
#endif
                return dt;
            }
        }

        private static void CalculateDataTableSize(string dimensions, ref int numOfColumns, ref int numOfRows)
        {
            // How many columns & rows of data does this Worksheet contain ?  
            // We'll read in the Dimensions string from the Excel file, and calculate the size based on that.
            //     eg "B1:F4" -> we'll need 6 columns and 4 rows.
            //
            // (We deliberately ignore the top-left cell address, and just use the bottom-right cell address.)
            try
            {
                string[] parts = dimensions.Split(':');     //eg "B1:F4" 
                if (parts.Length != 2)
                    throw new Exception("Couldn't find exactly *two* CellAddresses in the dimension");

                numOfColumns = 1 + GetColumnIndexByName(parts[1]);     // A=1, B=2, C=3  (1-based value), so F4 would return 6 columns
                numOfRows = GetRowIndexFromCellAddress(parts[1]);
            }
            catch
            {
                throw new Exception("Could not calculate maximum DataTable size from the worksheet dimension: " + dimensions);
            }
        }

        public static int GetRowIndexFromCellAddress(string cellAddress)
        {
            // Convert an Excel CellReference column into a 1-based row index
            // eg "D42"  ->  42
            //    "F123" ->  123
            string rowNumber = System.Text.RegularExpressions.Regex.Replace(cellAddress, "[^0-9 _]", "");
            return int.Parse(rowNumber);
        }

        public static int GetColumnIndexByName(string cellAddress)
        {
            // Convert an Excel CellReference column into a 0-based column index
            // eg "D42" ->  3
            //    "F123" -> 5
            var columnName = System.Text.RegularExpressions.Regex.Replace(cellAddress, "[^A-Z_]", "");
            int number = 0, pow = 1;
            for (int i = columnName.Length - 1; i >= 0; i--)
            {
                number += (columnName[i] - 'A' + 1) * pow;
                pow *= 26;
            }
            return number - 1;
        }
    }

InformationsquelleAutor Mike Gledhill

Je ne peux pas résister à l'optimisation de la sous-routines de Amurra réponse à retirer besoin de Regex est.

La première fonction n'est pas réellement nécessaire puisque la seconde, on peut acceptera une référence de cellule (C3) ou un nom de colonne (C) (mais encore une belle fonction d'assistance). Les indices sont également d'une fonction (uniquement parce que notre mise en œuvre utilisé une base pour les lignes à correspondre visuellement avec Excel).

    ///<summary>
    ///Given a cell name, return the cell column name.
    ///</summary>
    ///<param name="cellReference">Address of the cell (ie. B2)</param>
    ///<returns>Column Name (ie. B)</returns>
    ///<exception cref="ArgumentOutOfRangeException">cellReference</exception>
    public static string GetColumnName(string cellReference)
    {
        //Advance from L to R until a number, then return 0 through previous position
        //
        for (int lastCharPos = 0; lastCharPos <= 3; lastCharPos++)
            if (Char.IsNumber(cellReference[lastCharPos]))
                return cellReference.Substring(0, lastCharPos);

        throw new ArgumentOutOfRangeException("cellReference");
    }

    ///<summary>
    ///Return one-based column index given a cell name or column name
    ///</summary>
    ///<param name="columnNameOrCellReference">Column Name (ie. A, AB3, or AB44)</param>
    ///<returns>One based index if the conversion was successful; otherwise null</returns>
    public static int GetColumnIndexFromName(string columnNameOrCellReference)
    {
        int columnIndex = 0;            
        int factor = 1;
        for (int pos = columnNameOrCellReference.Length - 1; pos >= 0; pos--)   //R to L
        {
            if (Char.IsLetter(columnNameOrCellReference[pos]))  //for letters (columnName)
            {
                columnIndex += factor * ((columnNameOrCellReference[pos] - 'A') + 1);
                factor *= 26;
            }
        }
        return columnIndex;
    }

InformationsquelleAutor crokusek

Ajouté encore une autre mise en œuvre, ce moment où le nombre de colonnes est connu à l'avance:

        ///<summary>
        ///Gets a list cells that are padded with empty cells where necessary.
        ///</summary>
        ///<param name="numberOfColumns">The number of columns expected.</param>
        ///<param name="cells">The cells.</param>
        ///<returns>List of padded cells</returns>
        private static IList<Cell> GetPaddedCells(int numberOfColumns, IList<Cell> cells)
        {
            //Only perform the padding operation if existing column count is less than required
            if (cells.Count < numberOfColumns - 1)
            {
                IList<Cell> padded = new List<Cell>();
                int cellIndex = 0;

                for (int paddedIndex = 0; paddedIndex < numberOfColumns; paddedIndex++)
                {
                    if (cellIndex < cells.Count)
                    {
                        //Grab column reference (ignore row) <seealso cref="https://stackoverflow.com/a/7316298/674776"/>
                        string columnReference = new string(cells[cellIndex].CellReference.ToString().Where(char.IsLetter).ToArray());

                        //Convert reference to index <seealso cref="https://stackoverflow.com/a/848552/674776"/>
                        int indexOfReference = columnReference.ToUpper().Aggregate(0, (column, letter) => (26 * column) + letter - 'A' + 1) - 1;

                        //Add padding cells where current cell index is less than required
                        while (indexOfReference > paddedIndex)
                        {
                            padded.Add(new Cell());
                            paddedIndex++;
                        }

                        padded.Add(cells[cellIndex++]);
                    }
                    else
                    {
                        //Add padding cells when passed existing cells
                        padded.Add(new Cell());
                    }
                }

                return padded;
            }
            else
            {
                return cells;
            }
        }

Appel à l'aide:

IList<Cell> cells = GetPaddedCells(38, row.Descendants<Cell>().ToList());

Où 38 est le nombre de colonnes.

InformationsquelleAutor teatime

De lire des cellules vides, je suis en utilisant une variable nommée "CN" attribué à l'extérieur de la ligne de lecteur et dans la boucle while, je vérifie si l'index de colonne est supérieure ou non à partir de ma variable comme c'est incrémenté après chaque cellule de lecture. si cela ne correspond pas, je suis de remplir ma colonne avec des valeurs que je veux. C'est le truc que j'ai utilisé pour rattraper les cellules vides dans mon respectant la valeur de la colonne. Voici le code:

public static DataTable ReadIntoDatatableFromExcel(string newFilePath)
{
/*Creating a table with 20 columns*/
var dt = CreateProviderRvenueSharingTable();
try
{
/*using stream so that if excel file is in another process then it can read without error*/
using (Stream stream = new FileStream(newFilePath, FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
{
using (SpreadsheetDocument spreadsheetDocument = SpreadsheetDocument.Open(stream, false))
{
var workbookPart = spreadsheetDocument.WorkbookPart;
var workbook = workbookPart.Workbook;
/*get only unhide tabs*/
var sheets = workbook.Descendants<Sheet>().Where(e => e.State == null);
foreach (var sheet in sheets)
{
var worksheetPart = (WorksheetPart)workbookPart.GetPartById(sheet.Id);
/*Remove empty sheets*/
List<Row> rows = worksheetPart.Worksheet.Elements<SheetData>().First().Elements<Row>()
.Where(r => r.InnerText != string.Empty).ToList();
if (rows.Count > 1)
{
OpenXmlReader reader = OpenXmlReader.Create(worksheetPart);
int i = 0;
int BTR = 0;/*Break the reader while empty rows are found*/
while (reader.Read())
{
if (reader.ElementType == typeof(Row))
{
/*ignoring first row with headers and check if data is there after header*/
if (i < 2)
{
i++;
continue;
}
reader.ReadFirstChild();
DataRow row = dt.NewRow();
int CN = 0;
if (reader.ElementType == typeof(Cell))
{
do
{
Cell c = (Cell)reader.LoadCurrentElement();
/*reader skipping blank cells so data is getting worng in datatable's rows according to header*/
if (CN != 0)
{
int cellColumnIndex =
ExcelHelper.GetColumnIndexFromName(
ExcelHelper.GetColumnName(c.CellReference));
if (cellColumnIndex < 20 && CN < cellColumnIndex - 1)
{
do
{
row[CN] = string.Empty;
CN++;
} while (CN < cellColumnIndex - 1);
}
}
/*stopping execution if first cell does not have any value which means empty row*/
if (CN == 0 && c.DataType == null && c.CellValue == null)
{
BTR++;
break;
}
string cellValue = GetCellValue(c, workbookPart);
row[CN] = cellValue;
CN++;
/*if any text exists after T column (index 20) then skip the reader*/
if (CN == 20)
{
break;
}
} while (reader.ReadNextSibling());
}
/*reader skipping blank cells so fill the array upto 19 index*/
while (CN != 0 && CN < 20)
{
row[CN] = string.Empty;
CN++;
}
if (CN == 20)
{
dt.Rows.Add(row);
}
}
/*escaping empty rows below data filled rows after checking 5 times */
if (BTR > 5)
break;
}
reader.Close();
}                            
}
}
}
}
catch (Exception ex)
{
throw ex;
}
return dt;
}
private static string GetCellValue(Cell c, WorkbookPart workbookPart)
{
string cellValue = string.Empty;
if (c.DataType != null && c.DataType == CellValues.SharedString)
{
SharedStringItem ssi =
workbookPart.SharedStringTablePart.SharedStringTable
.Elements<SharedStringItem>()
.ElementAt(int.Parse(c.CellValue.InnerText));
if (ssi.Text != null)
{
cellValue = ssi.Text.Text;
}
}
else
{
if (c.CellValue != null)
{
cellValue = c.CellValue.InnerText;
}
}
return cellValue;
}
public static int GetColumnIndexFromName(string columnNameOrCellReference)
{
int columnIndex = 0;
int factor = 1;
for (int pos = columnNameOrCellReference.Length - 1; pos >= 0; pos--)   //R to L
{
if (Char.IsLetter(columnNameOrCellReference[pos]))  //for letters (columnName)
{
columnIndex += factor * ((columnNameOrCellReference[pos] - 'A') + 1);
factor *= 26;
}
}
return columnIndex;
}
public static string GetColumnName(string cellReference)
{
/* Advance from L to R until a number, then return 0 through previous position*/
for (int lastCharPos = 0; lastCharPos <= 3; lastCharPos++)
if (Char.IsNumber(cellReference[lastCharPos]))
return cellReference.Substring(0, lastCharPos);
throw new ArgumentOutOfRangeException("cellReference");
}

Code fonctionne pour:

Ce code lit cellules vides
ignorer les lignes vides après la lecture complète.
lire la feuille à partir de la première dans l'ordre croissant
si excel fichier est utilisé par un autre processus, OpenXML lit encore que.

InformationsquelleAutor Jasmin Akther Suma

Voici ma solution. J'ai trouvé le ci-dessus ne semble pas bien fonctionner lorsque les champs manquants où à la fin d'une ligne.

En supposant que la première ligne de la feuille Excel dispose de TOUTES les colonnes (via les en-têtes), puis saisir le nombre de colonnes prévu par ligne (row == 1). Ensuite une boucle à travers les lignes de données (ligne > 1). La clé de la transformation des cellules manquantes est dans la méthode getRowCells, où le nombre connu de cellules de colonne est transmis en tant que bien que la ligne en cours de processus.

int columnCount = worksheetPart.Worksheet.Descendants<Row>().Where(r => r.RowIndex == 1).FirstOrDefault().Descendants<Cell>().Count();
IEnumerable<Row> rows = worksheetPart.Worksheet.Descendants<Row>().Where(r => r.RowIndex > 1);
List<List<string>> docData = new List<List<string>>();
foreach (Row row in rows)
{
List<Cell> cells = getRowCells(columnCount, row);
List<string> rowData = new List<string>();
foreach (Cell cell in cells)
{
rowData.Add(getCellValue(workbookPart, cell));
}
docData.Add(rowData);
}

Méthode getRowCells a une limitation de courant de seulement être en mesure de soutenir une feuille (ligne) qui a moins de 26 colonnes. Une boucle basée sur l'connu nombre de colonnes est utilisée pour trouver des colonnes manquantes (cellules). Si elle est trouvée, une nouvelle valeur de la Cellule est inséré dans les cellules de la collection, avec la nouvelle Cellule ayant une valeur par défaut de "" au lieu de "nulle". Les Cellules modifiées collection est alors retourné.

private static List<Cell> getRowCells(int columnCount, Row row)
{
const string COLUMN_LETTERS = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
if (columnCount > COLUMN_LETTERS.Length)
{
throw new ArgumentException(string.Format("Invalid columnCount ({0}).  Cannot be greater than {1}",
columnCount, COLUMN_LETTERS.Length));
}
List<Cell> cells = row.Descendants<Cell>().ToList();
for (int i = 0; i < columnCount; i++)
{
if (i < cells.Count)
{
string cellColumnReference = cells.ElementAt(i).CellReference.ToString();
if (cellColumnReference[0] != COLUMN_LETTERS[i])
{
cells.Insert(i, new Cell() { CellValue = new CellValue("") });             }
}
else
{
cells.Insert(i, new Cell() { CellValue = new CellValue("") });
}
}
return cells;
}
private static string getCellValue(WorkbookPart workbookPart, Cell cell)
{
SharedStringTablePart stringTablePart = workbookPart.SharedStringTablePart;
string value = (cell.CellValue != null) ? cell.CellValue.InnerXml : string.Empty;
if ((cell.DataType != null) && (cell.DataType.Value == CellValues.SharedString))
{
return stringTablePart.SharedStringTable.ChildElements[Int32.Parse(value)].InnerText;
}
else
{
return value;
}
}

InformationsquelleAutor programmerj

Vous devez vous connecter pour publier un commentaire.