欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

使用C#和.NET Core和ExcelDataReader将Excel工作表转换为JSON文档

程序员文章站 2022-05-09 15:58:39
...
使用C#和.NET Core和ExcelDataReader将Excel工作表转换为JSON文档

I've been working on a little idea where I'd have an app (maybe a mobile app with Xamarin or maybe a SPA, I haven't decided yet) for the easily accessing and searching across the 500+ videos from https://azure.microsoft.com/en-us/resources/videos/azure-friday/

我一直在想一个简单的想法,在哪里可以拥有一个应用程序(可能是带有Xamarin的移动应用程序或SPA,但我尚未决定),以便轻松访问和搜索来自https://的500多个视频/azure.microsoft.com/zh-CN/resources/videos/azure-friday/

HOWEVER. I don't have access to the database that hosts the metadata and while I'm trying to get at least read-only access to it (long story) the best I can do is a giant Excel spreadsheet dump that I was given that has all the video details.

然而。 我无权访问承载元数据的数据库,并且在尝试至少获取对元数据的只读访问(长话)时,我能做的最好的事情就是给我一个巨大的Excel电子表格转储所有视频细节。

This, of course, is sub-optimal, but regardless of how you feel about it, it's a database. Or, a data source at the very least! Additionally, since it was always going to end up as JSON in a cached in-memory database regardless, it doesn't matter much to me.

当然,这不是最佳选择,但是无论您如何看,它都是一个数据库。 或者,至少是数据! 另外,由于无论如何它最终都将以JSON形式存储在缓存的内存数据库中,所以对我来说没有太大关系。

In real-world business scenarios, sometimes the authoritative source is an Excel sheet, sometimes it's a SQL database, and sometimes it's a flat file. Who knows?

在实际业务场景中,有时权威来源是Excel工作表,有时是SQL数据库,有时是平面文件。 谁知道?

What's most important (after clean data) is that the process one builds around that authoritative source is reliable and repeatable. For example, if I want to build a little app or one page website, yes, ideally I'd have a direct connection to the SQL back end. Other alternative sources could be a JSON file sitting on a simple storage endpoint accessible with a single HTTP GET. If the Excel sheet is on OneDrive/SharePoint/DropBox/whatever, I could have a small serverless function run when the files changes (or on a daily schedule) that would convert the Excel sheet into a JSON file and drop that file onto storage. Hopefully you get the idea. The goal here is clean, reliable pragmatism. I'll deal with the larger business process issue and/or system architecture and/or permissions issue later. For now the "interface" for my app is JSON.

最重要的(在清除数据之后)是围绕该权威源构建的过程是可靠且可重复的。 例如,如果我要构建一个小应用程序或一页网站,是的,理想情况下,我将直接连接到SQL后端。 其他替代来源可能是位于简单存储端点上的JSON文件,该端点可通过单个HTTP GET访问。 如果Excel工作表位于OneDrive / SharePoint / DropBox /任何文件上,则当文件更改时(或按日程安排),我可以运行一个小型的无服务器功能,该功能会将Excel工作表转换为JSON文件并将该文件拖放到存储中。 希望你能明白。 这里的目标是干净,可靠的实用主义。 稍后将处理更大的业务流程问题和/或系统架构和/或权限问题。 现在,我的应用程序的“接口”是JSON。

So I need some JSON and I have this Excel sheet.

所以我需要一些JSON,并且我有此Excel工作表。

Turns out there's a lovely open source project and NuGet package called ExcelDataReader. There's been ways to get data out of Excel for decades. Literally decades. One of my first jobs was automating Microsoft Excel with Visual Basic 3.0 with COM Automation. I even blogged about getting data out of Excel into ASP.NET 16 years ago!

原来有一个可爱的开源项目和一个名为ExcelDataReader的NuGet包。 数十年来,一直存在从Excel中获取数据的方法。 从字面上看是几十年。 我的第一批工作之一是使用带有COM Automation的Visual Basic 3.0自动化Microsoft Excel。 我什至在16年前就写过关于将数据从Excel导入ASP.NET的博客

Today I'll use ExcelDataReader. It's really nice and it took less than an hour to get exactly what I wanted. I haven't gone and made it super clean and generic, refactored out a bunch of helper functions, so I'm interested in your thoughts. After I get this tight and reliable I'll drop it into an Azure Function and then focus on getting the JSON directly from the source.

今天,我将使用ExcelDataReader。 真的很棒,花了不到一个小时就能得到我想要的东西。 我还没有去做,使其变得超级干净和通用,重构了许多辅助函数,所以我对您的想法感兴趣。 获得了这种紧密而可靠的信息之后,我将其放入Azure函数中,然后集中精力直接从源代码获取JSON。

A few gotchas that surprised me. I got a "System.NotSupportedException: No data is available for encoding 1252." Windows-1252 or CP-1252 (code page) is an old school text encoding (it's effectively ISO 8859-1). Turns out newer .NETs like .NET Core need the System.Text.Encoding.CodePages package as well as a call to System.Text.Encoding.RegisterProvider(System.Text.CodePagesEncodingProvider.Instance); to set it up for success. Also, that extra call to reader.Read at the start to skip over the Title row had me pause a moment.

一些让我惊讶的陷阱。 我收到“ System.NotSupportedException:没有数据可用于编码1252。” Windows-1252或CP-1252(代码页)是一种老式的文本编码(实际上是ISO 8859-1)。 事实证明,像.NET Core这样的较新.NET需要System.Text.Encoding.CodePages包以及对System.Text.Encoding.RegisterProvider(System.Text.CodePagesEncodingProvider.Instance);的调用System.Text.Encoding.RegisterProvider(System.Text.CodePagesEncodingProvider.Instance); 为成功做好准备。 另外,给reader.Read额外电话。开始时跳过标题行的阅读使我暂停了片刻。

using System;
using System.IO;
using ExcelDataReader;
using System.Text;
using Newtonsoft.Json;

namespace AzureFridayToJson
{
class Program
{
static void Main(string[] args)
{
Encoding.RegisterProvider(CodePagesEncodingProvider.Instance);

var inFilePath = args[0];
var outFilePath = args[1];

using (var inFile = File.Open(inFilePath, FileMode.Open, FileAccess.Read))
using (var outFile = File.CreateText(outFilePath))
{
using (var reader = ExcelReaderFactory.CreateReader(inFile, new ExcelReaderConfiguration()
{ FallbackEncoding = Encoding.GetEncoding(1252) }))
using (var writer = new JsonTextWriter(outFile))
{
writer.Formatting = Formatting.Indented; //I likes it tidy
writer.WriteStartArray();
reader.Read(); //SKIP FIRST ROW, it's TITLES.
do
{
while (reader.Read())
{
//peek ahead? Bail before we start anything so we don't get an empty object
var status = reader.GetString(0);
if (string.IsNullOrEmpty(status)) break;

writer.WriteStartObject();
writer.WritePropertyName("Status");
writer.WriteValue(status);

writer.WritePropertyName("Title");
writer.WriteValue(reader.GetString(1));

writer.WritePropertyName("Host");
writer.WriteValue(reader.GetString(6));

writer.WritePropertyName("Guest");
writer.WriteValue(reader.GetString(7));

writer.WritePropertyName("Episode");
writer.WriteValue(Convert.ToInt32(reader.GetDouble(2)));

writer.WritePropertyName("Live");
writer.WriteValue(reader.GetDateTime(5));

writer.WritePropertyName("Url");
writer.WriteValue(reader.GetString(11));

writer.WritePropertyName("EmbedUrl");
writer.WriteValue($"{reader.GetString(11)}player");
/*
<iframe src="https://channel9.msdn.com/Shows/Azure-Friday/Erich-Gamma-introduces-us-to-Visual-Studio-Online-integrated-with-the-Windows-Azure-Portal-Part-1/player" width="960" height="540" allowFullScreen frameBorder="0"></iframe>
*/

writer.WriteEndObject();
}
} while (reader.NextResult());
writer.WriteEndArray();
}
}
}
}
}

The first pass is on GitHub at https://github.com/shanselman/AzureFridayToJson and the resulting JSON looks like this:

第一遍是在GitHub上的https://github.com/shanselman/AzureFridayToJson ,生成的JSON如下所示:

[
{
"Status": "Live",
"Title": "Introduction to Azure Integration Service Environment for Logic Apps",
"Host": "Scott Hanselman",
"Guest": "Kevin Lam",
"Episode": 528,
"Live": "2019-02-26T00:00:00",
"Url": "https://azure.microsoft.com/en-us/resources/videos/azure-friday-introduction-to-azure-integration-service-environment-for-logic-apps",
"embedUrl": "https://azure.microsoft.com/en-us/resources/videos/azure-friday-introduction-to-azure-integration-service-environment-for-logic-appsplayer"
},
{
"Status": "Live",
"Title": "An overview of Azure Integration Services",
"Host": "Lara Rubbelke",
"Guest": "Matthew Farmer",
"Episode": 527,
"Live": "2019-02-22T00:00:00",
"Url": "https://azure.microsoft.com/en-us/resources/videos/azure-friday-an-overview-of-azure-integration-services",
"embedUrl": "https://azure.microsoft.com/en-us/resources/videos/azure-friday-an-overview-of-azure-integration-servicesplayer"
},
...SNIP...

Thoughts? There's a dozen ways to have done this. How would you do this? Dump it into a DataSet and serialize objects to JSON, make an array and do the same, automate Excel itself (please don't do this), and on and on.

有什么想法吗? 有十多种方法可以做到这一点。 你会怎么做? 将其转储到DataSet中,并将对象序列化为JSON,创建一个数组并执行相同的操作,然后自动执行Excel本身(请不要执行此操作),然后一直执行。

Certainly this would be easier if I could get a CSV file or something from the business person, but the issue is that I'm regularly getting new drops of this same sheet with new records added. Getting the suit to Save As | CSV reliably and regularly isn't sustainable.

当然,如果我可以从业务人员那里获取CSV文件或其他东西,这样做会更加容易,但是问题是我经常从同一张工作表中获得新的记录,并添加了新的记录。 将西装另存为| 可靠且定期的CSV是不可持续的。



Sponsor: Stop wasting time trying to track down the cause of bugs. Sentry.io provides full stack error tracking that lets you monitor and fix problems in real time. If you can program it, we can make it far easier to fix any errors you encounter with it.

赞助者:不要再浪费时间试图找出错误的原因。 Sentry.io提供了完整的堆栈错误跟踪,可让您实时监视和修复问题。 如果您可以对它进行编程,我们可以使您更轻松地修复遇到的任何错误。

关于斯科特 (About Scott)

Scott Hanselman is a former professor, former Chief Architect in finance, now speaker, consultant, father, diabetic, and Microsoft employee. He is a failed stand-up comic, a cornrower, and a book author.

斯科特·汉塞尔曼(Scott Hanselman)是前教授,前金融首席架构师,现在是演讲者,顾问,父亲,糖尿病患者和Microsoft员工。 他是一位失败的单口相声漫画家,一个玉米种植者和一本书的作者。

使用C#和.NET Core和ExcelDataReader将Excel工作表转换为JSON文档
使用C#和.NET Core和ExcelDataReader将Excel工作表转换为JSON文档
使用C#和.NET Core和ExcelDataReader将Excel工作表转换为JSON文档
About   关于 Newsletter 时事通讯
Hosting By 主持人
使用C#和.NET Core和ExcelDataReader将Excel工作表转换为JSON文档

翻译自: https://www.hanselman.com/blog/converting-an-excel-worksheet-into-a-json-document-with-c-and-net-core-and-exceldatareader