dotnet/machinelearning

Machine Learning for

NET Documentation, tutorials and reference

Machine Learning for

Machine Learning for .NET

ML.NET is a cross-platform open-source machine learning (ML) framework for .NET.

ML.NET allows developers to easily build, train, deploy, and consume custom models in their .NET applications without requiring prior expertise in developing machine learning models or experience with other programming languages like Python or R. The framework provides data loading from files and databases, enables data transformations, and includes many ML algorithms.

With ML.NET, you can train models for a variety of scenarios, like classification, forecasting, and anomaly detection.

You can also consume both TensorFlow and ONNX models within ML.NET which makes the framework more extensible and expands the number of supported scenarios.

Getting started with machine learning and ML.NET

  • Learn more about the basics of ML.NET.
  • Build your first ML.NET model by following our ML.NET Getting Started tutorial.
  • Check out our documentation and tutorials.
  • See the API Reference documentation.
  • Clone our ML.NET Samples GitHub repo and run some sample apps.
  • Take a look at some ML.NET Community Samples.
  • Watch some videos on the ML.NET videos YouTube playlist.

Roadmap

Take a look at ML.NET's Roadmap to see what the team plans to work on in the next year.

Operating systems and processor architectures supported by ML.NET

ML.NET runs on Windows, Linux, and macOS using .NET Core, or Windows using .NET Framework.

ML.NET also runs on ARM64, Apple M1, and Blazor Web Assembly. However, there are some limitations.

64-bit is supported on all platforms. 32-bit is supported on Windows, except for TensorFlow and LightGBM related functionality.

ML.NET NuGet packages status

Release notes

Check out the release notes to see what's new. You can also read the blog posts for more details about each release.

Using ML.NET packages

First, ensure you have installed .NET Core 2.1 or later. ML.NET also works on the .NET Framework 4.6.1 or later, but 4.7.2 or later is recommended.

Once you have an app, you can install the ML.NET NuGet package from the .NET Core CLI using:

or from the NuGet Package Manager:

Alternatively, you can add the Microsoft.ML package from within Visual Studio's NuGet package manager or via Paket.

Daily NuGet builds of the project are also available in our Azure DevOps feed:

#404_packaging/MachineLearning/nuget/v3/index.json

Building ML.NET (For contributors building ML.NET open source code)

To build ML.NET from source please visit our developer guide.

Debug Release
CentOS
Ubuntu
macOS
Windows x64
Windows FullFramework
Windows x86
Windows NetCore3.1

Release process and versioning

Major releases of ML.NET are shipped once a year with the major .NET releases, starting with ML.NET 1.7 in November 2021 with .NET 6, then ML.NET 2.0 with .NET 7, etc. We will maintain release branches to optionally service ML.NET with bug fixes and/or minor features on the same cadence as .NET servicing.

Check out the Release Notes to see all of the past ML.NET releases.

Contributing

We welcome contributions! Please review our contribution guide.

Community

  • Join our community on Discord.
  • Tune into the .NET Machine Learning Community Standup every other Wednesday at 10AM Pacific Time.

This project has adopted the code of conduct defined by the Contributor Covenant to clarify expected behavior in our community. For more information, see the .NET Foundation Code of Conduct.

Code examples

Here is a code snippet for training a model to predict sentiment from text samples. You can find complete samples in the samples repo.

Now from the model we can make inferences (predictions):

License

ML.NET is licensed under the MIT license, and it is free to use commercially.

.NET Foundation

ML.NET is a part of the .NET Foundation.

Issues

Quick list of the latest Issues we found

torronen

torronen

enhancement
Icon For Comments0

Is your feature request related to a problem? Please describe. I am running model.Transform.

Then, I am would like to get the original field "Quote" with type string. I would expect the following to work, but it gives an exception about wrong type: IDataView predictions= model.Transform(myDataFrame); var quotes = predictions.GetColumn<string>("Quotes").ToList();

The problem seems to be transforms (Micorsoft.ML.AutoML in 1.6.0) has created transforms to different types. image

'Cannot map column (name: Quote, type: Vector<Single, 2>) in data to the user-defined type, System.String. Arg_ParamName_Name'

Describe the solution you'd like GetColumn should try to find column with the specified type if multiple columns with the same name exists.

Describe alternatives you've considered I have not yet solved this issue.

I suppose my solution will be :

  1. get the values before Transform()
  2. looping through all columns and checking type and name would work.

Additional data I would like to get the column here to be sure the row-order is the same.

I assume Transform() should preserve order(?) . If so, then I suppose this is not a very important issue.

luisquintanilla

luisquintanilla

enhancement
Icon For Comments0

Problem

When I want to apply a custom transform to a single data column in my dataset, I have to provide the input and output types. If I've used TextLoader to load my data without defining schema classes, I now have to go and create new classes for my input and output.

Proposed Solution

Create an overload to CustomMapping that takes an InputColumnName and OutputColumnName parameters which perform the lookup and apply the transform to the specified columns.

luisquintanilla

luisquintanilla

enhancement
Icon For Comments0

Problem

Today, when an IDataView is created using TextLoader, there's no need to create classes that define the schema.

However, if I want to export the IDataView as an IEnumerable, I can't because I need to explicitly provide the type of IEnumerable<T>. If no classes have been created, I now need to go and create a new class just to export to an IEnumerable.

Proposed solution

Just like LoadFromEnumerable is able to infer and create an IDataView from IEnumerable<T>, CreateEnumerable should be able to infer T based on the DataView schema or bind to an object at runtime using dynamic.

or

tarekgh

tarekgh

untriaged
Icon For Comments2

The following sample is pointing at https://aka.ms/mlnet-resources/datasets/cifar10.zip but looks this resources is not valid.

https://github.com/dotnet/machinelearning/blob/bca5736c23e093006bb6a5a3f85a789c4a1cdcf2/docs/samples/Microsoft.ML.Samples/Dynamic/Trainers/MulticlassClassification/ImageClassification/LearningRateSchedulingCifarResnetTransferLearning.cs

Looks we have code in different places trying to get the same resources from other sources https://github.com/dotnet/machinelearning/blob/bca5736c23e093006bb6a5a3f85a789c4a1cdcf2/docs/samples/Microsoft.ML.AutoML.Samples/Cifar10.cs#L13

Also there is other samples pointing at https://github.com/onnx/models/tree/master/vision/classification/squeezenet which exists but the zipped files there contains a different structured files than what the sample expect. https://github.com/dotnet/machinelearning/blob/04dda55ab0902982b16309c8e151f13a53e9366d/docs/samples/Microsoft.ML.Samples/Dynamic/Transforms/ApplyONNXModelWithInMemoryImages.cs#L16 even the comment suggest there is Microsoft.ML.Onnx.TestModels nuget which not exist either. The correct structured file can be found in https://s3.amazonaws.com/download.onnx/models/opset_8/squeezenet.tar.gz

wil70

wil70

enhancement
Icon For Comments0
  1. Can I retrain LightGBM? 1.a. I tried via AutoML and it seems to start from scratch when I do a FIT or I call MLModel1.Training.cs " public static ITransformer RetrainPipeline(MLContext mlContext, IDataView trainData)" auto generated code. 1.b. In the documentation "Re-train a model", LightGBM is not listed as re-traine-able? 1.b.1. is it still true (asking as the documentation is 9 months old)? TY! 1.b.2 I can see from Python or even XGBM c/c++ you can retrain? using init_model option of lightgbm.train
  2. Can I use the model generated by LightGBM CLI/MPI in ML.net? Thanks!
christopherfowers

christopherfowers

untriaged
Icon For Comments1

System Information (please complete the following information):

  • OS & Version: MacOS 12.6
  • ML.NET Version: mlnet-osx-x64 - 16.13.9
  • .NET Version: .NET 6.0.302

Describe the bug When using the CLI from terminal to train a model using data from a csv file and specifying columns to ignore using --ignore-cols 1, 2, 3 results in an output model and sample project that does in fact use the columns intended to be ignored as inputs for classification predictions.

To Reproduce Steps to reproduce the behavior:

  1. create a multi-column csv for text classification and call it data.csv. Include columns you don't with to be used in the predictions at all. (fill it with some meaningful data to train classification models.)
  2. open a terminal and navigate to the folder containing the csv.
  3. mlnet classification --dataset "data.csv" --has-header true --train-time 10 --label-col 8 --ignore-cols 1, 2, 3, 4, 5, 6, 7, 9 (obviously this step should include the appropriate label column (0 indexed) and ignore columns (also 0 indexed))

Expected behavior Generated model and sample project should not use columns listed in the --ignore-cols flag arguments.

Actual behavior Each of the ignored columns are still used.

luisquintanilla

luisquintanilla

enhancement
Icon For Comments0

Create a visualization method that when given a pipeline displays a visual of the pipeline. This can work both for interactive and standard .NET projects.

API

Define pipeline

Interactive

.NET project

Samples

Proposed Implementation

  1. Take an ML.NET EstimatorChain and dynamically generate Mermaid diagram.
  2. Process Mermaid diagram as Markdown using Markdig
  3. Convert Mermaid diagram to HTML
  4. Display HTML
    1. If in interactive environment, register a custom formatter.
    2. If in standard .NET application, save as image.
crazyoutlook

crazyoutlook

untriaged
Icon For Comments3

System information

  • OS version/distro: Windows 10/11
  • .NET Version (eg., dotnet --info): Dot Net versions 2.1/5/6 ML.Net_Onnx.zip

Issue

  • What did you do? I created a object detection model in Azure Custom Vision then exported it as ONNX and tried consuming it with ML.Net. I am using following ML package versions : Microsoft.ML (1.4.0), Microsoft.ML.ImageAnalytics (1.4.0). Microsoft.ML.OnnxTransformer (1.4.0)

  • What happened? ML.Net with ONNX model is giving me incorrect prediction results or no results or very poor confidence levels.

  • What did you expect? I am expecting correct results when working with ONNX model and ML.Net. To re-confirm the issue I consumed same ONNX model with Python code and it worked correct and prediction results were as expected. So there does not seem to be any issue with ONNX model file. The issue seems to be with ML.Net code in the way predictions are done

Source code / logs

I am attaching ML.Net source code which I am using to consume ONNX model

Please paste or attach the code or logs or traces that would be helpful to diagnose the issue you are reporting.

luisquintanilla

luisquintanilla

untriaged
Icon For Comments0

Every year we conduct a survey to gather feedback on pain points and feature requests that help shape the direction of Machine Learning in .NET.

This past year we have made major improvements to ML.NET tooling and APIs, and now we're investigating new areas to improve and grow, including documentation, data prep, notebooks, deep learning, MLOps, model explainability, and more.

Please take this ~10 minute survey to give your input on what you want to see next in ML.NET, and optionally leave your contact information at the end if you'd like to talk with the ML.NET team about your feedback.

Take the Survey

deiruch

deiruch

enhancement
Icon For Comments0

Is your feature request related to a problem? Please describe. I have generated a ModelInput class using AutoML. It contains several columns, all annotated with [ColumnName], like this:

Then I try to load CSV data for this model:

This fails, because the fields in ModelInput do not contain a LoadColumn attribute.

Describe the solution you'd like LoadFromTextFile should try to use the ColumnName attribute, if the LoadColumn attribute is not specified.

Describe alternatives you've considered AutoML could generate LoadColumn attributes.

jonathanpeppers

jonathanpeppers

untriaged
Icon For Comments0

System Information (please complete the following information):

  • OS & Version: Windows 10
  • .NET Version: .NET 6

Describe the bug

As seen here: https://github.com/jonathanpeppers/inclusive-code-reviews-ml/pull/29#discussion_r944879120

A pipeline such as:

Hits an exception such as:

To Reproduce

Steps to reproduce the behavior:

  1. Run this project:

https://github.com/jonathanpeppers/inclusive-code-reviews-ml/tree/main/ml.net/InclusiveCodeReviews.Convert

  1. Uncomment these two lines:

https://github.com/jonathanpeppers/inclusive-code-reviews-ml/blob/486f7737174702233825ceddf28adb5cc7912f43/ml.net/InclusiveCodeReviews.ConsoleApp/ModelBuilder.cs#L59-L61

Expected behavior

In particular, we want to use KeepPunctuations=false and export to ONNX.

Screenshots, Code, Sample Projects

See above.

PerfectEngineer

PerfectEngineer

untriaged
Icon For Comments0

System information

  • OS version/distro: Win 10
  • .NET Version (eg., dotnet --info): 6.0
  • ML.NET Version: ML.NET v1.7.1
  • SciSharp.TensorFlow.Redis: 2.3.1

Issue OOM when allocating tensor

Code

BitmapImage imageAsBitmap = new BitmapImage(new Uri("D:/ImageToGetPrediction.jpeg")); var encoder = new JpegBitmapEncoder(); encoder.Frames.Add(BitmapFrame.Create(imageAsBitmap)); var input = new Input();

using (MemoryStream ms = new MemoryStream()) { encoder.Save(ms); input.Image = ms.ToArray(); } var prediction = _predictor.Predict(input);

Exception

OOM when allocating tensor with shape[1,75,75,256] and type float on /job:localhost/replica:0/task:0/device:CPU:0 by allocator cpu [[{{node resnet_v2_101/block1/unit_1/bottleneck_v2/conv3/BiasAdd}}]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. INNERMESSAGE STACKTRACE: at Microsoft.ML.TensorFlow.TensorFlowUtils.Runner.Run() at Microsoft.ML.Vision.ImageClassificationModelParameters.Classifier.Score(VBuffer1& image, Span1 classProbabilities) at Microsoft.ML.Vision.ImageClassificationModelParameters.<>c__DisplayClass22_02.<Microsoft.ML.Data.IValueMapper.GetMapper>b__0(VBuffer1& src, VBuffer1& dst) at Microsoft.ML.Data.SchemaBindablePredictorWrapperBase.<>c__DisplayClass19_02.b__0(TDst& dst) at Microsoft.ML.Data.PredictedLabelScorerBase.EnsureCachedPosition[TScore](Int64& cachedPosition, TScore& score, DataViewRow boundRow, ValueGetter1 scoreGetter) at Microsoft.ML.Data.MulticlassClassificationScorer.<>c__DisplayClass16_0.<GetPredictedLabelGetter>b__1(VBuffer1& dst) at Microsoft.ML.Data.TypedCursorable1.TypedRowBase.<>c__DisplayClass8_01.b__0(TRow row) at Microsoft.ML.Data.TypedCursorable1.TypedRowBase.FillValues(TRow row) at Microsoft.ML.Data.TypedCursorable1.RowImplementation.FillValues(TRow row) at Microsoft.ML.PredictionEngineBase2.FillValues(TDst prediction) at Microsoft.ML.PredictionEngine2.Predict(TSrc example, TDst& prediction) at Microsoft.ML.PredictionEngineBase`2.Predict(TSrc example)

PerfectEngineer

PerfectEngineer

untriaged
Icon For Comments0

System Information (please complete the following information):

  • OS & Version: Windows 10
  • ML.NET Version: ML.NET v1.7.1
  • .NET Version: .NET 6.0
  • SciSharp.TensorFlow.Redis: 2.3.1 Expected behavior Should able to get prediction

CODE

BitmapImage imageAsBitmap = new BitmapImage(new Uri("D:/ImageToGetPrediction.jpeg")); var encoder = new JpegBitmapEncoder(); encoder.Frames.Add(BitmapFrame.Create(imageAsBitmap)); var input = new Input();

using (MemoryStream ms = new MemoryStream()) { encoder.Save(ms); input.Image = ms.ToArray(); } var prediction = _predictor.Predict(input);

STACKTRACE

Error: External component has thrown an exception

STACKTRACE: at Tensorflow.c_api.TF_SessionRun(IntPtr session, TF_Buffer* run_options, TF_Output[] inputs, IntPtr[] input_values, Int32 ninputs, TF_Output[] outputs, IntPtr[] output_values, Int32 noutputs, IntPtr[] target_opers, Int32 ntargets, IntPtr run_metadata, SafeStatusHandle status) at Microsoft.ML.TensorFlow.TensorFlowUtils.Runner.Run() at Microsoft.ML.Vision.ImageClassificationModelParameters.Classifier.Score(VBuffer1& image, Span1 classProbabilities) at Microsoft.ML.Vision.ImageClassificationModelParameters.<>c__DisplayClass22_02.<Microsoft.ML.Data.IValueMapper.GetMapper>b__0(VBuffer1& src, VBuffer1& dst) at Microsoft.ML.Data.SchemaBindablePredictorWrapperBase.<>c__DisplayClass19_02.b__0(TDst& dst) at Microsoft.ML.Data.PredictedLabelScorerBase.EnsureCachedPosition[TScore](Int64& cachedPosition, TScore& score, DataViewRow boundRow, ValueGetter1 scoreGetter) at Microsoft.ML.Data.MulticlassClassificationScorer.<>c__DisplayClass16_0.<GetPredictedLabelGetter>b__1(VBuffer1& dst) at Microsoft.ML.Data.TypedCursorable1.TypedRowBase.<>c__DisplayClass8_01.b__0(TRow row) at Microsoft.ML.Data.TypedCursorable1.TypedRowBase.FillValues(TRow row) at Microsoft.ML.Data.TypedCursorable1.RowImplementation.FillValues(TRow row) at Microsoft.ML.PredictionEngineBase2.FillValues(TDst prediction) at Microsoft.ML.PredictionEngine2.Predict(TSrc example, TDst& prediction) at Microsoft.ML.PredictionEngineBase`2.Predict(TSrc example)

PerfectEngineer

PerfectEngineer

needs-further-triage
Icon For Comments3

System information

  • **OS Win 10
  • **.NET Version Dot Net 6.0

Issue

Exception and System Hang

  • What did you do? I am getting Prediction from the model generated for Image classification
  • What happened? System Hangs
  • What did you expect? Should predict the result

Source code / logs

Code

var context = new MLContext(seed: 0); var model = context.Model.Load(_modelPath, out DataViewSchema schema); PredictionEngine<Input, Output> _predicto = context.Model.CreatePredictionEngine<Input, Output>(model);

Exception Log Constructor Details: Error during class instantiationINNERMESSAGE System.Reflection.TargetInvocationException: Exception has been thrown by the target of an invocation. ---> System.InvalidOperationException: Error during class instantiation ---> System.Reflection.TargetInvocationException: Exception has been thrown by the target of an invocation. ---> System.InvalidOperationException: Error during class instantiation ---> System.Reflection.TargetInvocationException: Exception has been thrown by the target of an invocation. ---> System.FormatException: Tensorflow exception triggered while loading model. ---> System.Runtime.InteropServices.SEHException (0x80004005): External component has thrown an exception. at Tensorflow.c_api.TF_GraphImportGraphDef(IntPtr graph, SafeBufferHandle graph_def, SafeImportGraphDefOptionsHandle options, SafeStatusHandle status) at Tensorflow.Graph.Import(Byte[] bytes, String prefix) at Microsoft.ML.TensorFlow.TensorFlowUtils.LoadTFSession(IExceptionContext ectx, Byte[] modelBytes, String modelFile) --- End of inner exception stack trace --- at Microsoft.ML.TensorFlow.TensorFlowUtils.LoadTFSession(IExceptionContext ectx, Byte[] modelBytes, String modelFile) at Microsoft.ML.Vision.ImageClassificationModelParameters..ctor(IHostEnvironment env, ModelLoadContext ctx) at Microsoft.ML.Vision.ImageClassificationModelParameters.Create(IHostEnvironment env, ModelLoadContext ctx) --- End of inner exception stack trace --- at System.RuntimeMethodHandle.InvokeMethod(Object target, Span1& arguments, Signature sig, Boolean constructor, Boolean wrapExceptions) at System.Reflection.RuntimeMethodInfo.Invoke(Object obj, BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture) at Microsoft.ML.Runtime.ComponentCatalog.LoadableClassInfo.CreateInstanceCore(Object[] ctorArgs) --- End of inner exception stack trace --- at Microsoft.ML.Runtime.ComponentCatalog.LoadableClassInfo.CreateInstanceCore(Object[] ctorArgs) at Microsoft.ML.Runtime.ComponentCatalog.TryCreateInstance[TRes](IHostEnvironment env, Type signatureType, TRes& result, String name, String options, Object[] extra) at Microsoft.ML.Runtime.ComponentCatalog.TryCreateInstance[TRes,TSig](IHostEnvironment env, TRes& result, String name, String options, Object[] extra) at Microsoft.ML.ModelLoadContext.TryLoadModelCore[TRes,TSig](IHostEnvironment env, TRes& result, Object[] extra) at Microsoft.ML.ModelLoadContext.TryLoadModel[TRes,TSig](IHostEnvironment env, TRes& result, RepositoryReader rep, Entry ent, String dir, Object[] extra) at Microsoft.ML.ModelLoadContext.LoadModel[TRes,TSig](IHostEnvironment env, TRes& result, RepositoryReader rep, Entry ent, String dir, Object[] extra) at Microsoft.ML.ModelLoadContext.LoadModelOrNull[TRes,TSig](IHostEnvironment env, TRes& result, RepositoryReader rep, String dir, Object[] extra) at Microsoft.ML.ModelLoadContext.LoadModelOrNull[TRes,TSig](IHostEnvironment env, TRes& result, String name, Object[] extra) at Microsoft.ML.ModelLoadContext.LoadModel[TRes,TSig](IHostEnvironment env, TRes& result, String name, Object[] extra) at Microsoft.ML.Data.MulticlassPredictionTransformer.Create(IHostEnvironment env, ModelLoadContext ctx) --- End of inner exception stack trace --- at System.RuntimeMethodHandle.InvokeMethod(Object target, Span1& arguments, Signature sig, Boolean constructor, Boolean wrapExceptions) at System.Reflection.RuntimeMethodInfo.Invoke(Object obj, BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture) at Microsoft.ML.Runtime.ComponentCatalog.LoadableClassInfo.CreateInstanceCore(Object[] ctorArgs) --- End of inner exception stack trace --- at Microsoft.ML.Runtime.ComponentCatalog.LoadableClassInfo.CreateInstanceCore(Object[] ctorArgs) at Microsoft.ML.Runtime.ComponentCatalog.TryCreateInstance[TRes](IHostEnvironment env, Type signatureType, TRes& result, String name, String options, Object[] extra) at Microsoft.ML.Runtime.ComponentCatalog.TryCreateInstance[TRes,TSig](IHostEnvironment env, TRes& result, String name, String options, Object[] extra) at Microsoft.ML.ModelLoadContext.TryLoadModelCore[TRes,TSig](IHostEnvironment env, TRes& result, Object[] extra) at Microsoft.ML.ModelLoadContext.TryLoadModel[TRes,TSig](IHostEnvironment env, TRes& result, RepositoryReader rep, Entry ent, String dir, Object[] extra) at Microsoft.ML.ModelLoadContext.LoadModel[TRes,TSig](IHostEnvironment env, TRes& result, RepositoryReader rep, Entry ent, String dir, Object[] extra) at Microsoft.ML.ModelLoadContext.LoadModelOrNull[TRes,TSig](IHostEnvironment env, TRes& result, RepositoryReader rep, String dir, Object[] extra) at Microsoft.ML.ModelLoadContext.LoadModelOrNull[TRes,TSig](IHostEnvironment env, TRes& result, String name, Object[] extra) at Microsoft.ML.ModelLoadContext.LoadModel[TRes,TSig](IHostEnvironment env, TRes& result, String name, Object[] extra) at Microsoft.ML.Data.TransformerChain1..ctor(IHostEnvironment env, ModelLoadContext ctx) at Microsoft.ML.Data.TransformerChain.Create(IHostEnvironment env, ModelLoadContext ctx) --- End of inner exception stack trace --- at System.RuntimeMethodHandle.InvokeMethod(Object target, Span1& arguments, Signature sig, Boolean constructor, Boolean wrapExceptions) at System.Reflection.RuntimeMethodInfo.Invoke(Object obj, BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture) at Microsoft.ML.Runtime.ComponentCatalog.LoadableClassInfo.CreateInstanceCore(Object[] ctorArgs)

luisquintanilla

luisquintanilla

enhancement
Icon For Comments0
  • When setting evaluation metrics using methods like SetBinaryClassificationMetric, the default value for labelColumn parameter is label. This is inconsistent with like the predictColumn parameter and the rest of ML.NET. Suggest change to Label.
  • When setting evaluation metrics using methods like SetBinaryClassificationMetric, the parameter names are labelColumn and predictColumn. In ML.NET, typically parameters referring to column names include name. Suggest change of parameter names to labelColumnName and predictColumnName.
wil70

wil70

needs-further-triage
Icon For Comments8

System Information (please complete the following information):

  • OS & Version: Win8, latest version as of this bug entry
  • ML.NET Version: 16.13.9
  • .NET Version:6.0.303

Describe the bug When I start c# AutoML in c# I get a OutOfMemoryException after the memory reach the maximum of 64GB. I have 64GB Ram, I have a 330GB csv file of data.

Note: I couldn't do it with ML.net CLI due to this bug https://github.com/dotnet/machinelearning/issues/6288 so I tried to do it with c# AutoML package. I'm totally new at ML.NET, sorry in adavance for the code quality

To Reproduce Steps to reproduce the behavior:

  1. Generate a 330GB file with 4209 columns with random data
  2. create a c# project and paste the code below
  3. See error log at the end of this message with the OutOfMemoryException

Expected behavior I expect to be able to be able to handle 2TB files and 100K columns without any issue with ML.Net CLI and also with c# on a 64GB ram computer by streaming the data instead of loading all in memeory.

Screenshots, Code, Sample Projects If applicable, add screenshots, code snippets, or sample projects to help explain your problem.

Additional context Add any other context about the problem here.

I have a 330gb file (64 gb ram). I tried ML.NET CLI but hit a bug see. So I'm now trying with c#, the bug is different than the ML.NET CLI issue as it seems to try to load everything in memory

IDataView trainingData = mlContext.Data.LoadFromTextFile( "c:\data.csv", separatorChar: ',', hasHeader: true, trimWhitespace: true);

var cts = new CancellationTokenSource(); var experimentSettings = new MulticlassExperimentSettings(); //experimentSettings.TrainingData = trainingData; experimentSettings.MaxExperimentTimeInSeconds = 3600; experimentSettings.CancellationToken = cts.Token; experimentSettings.CacheBeforeTrainer = CacheBeforeTrainer.Auto;
    <span>// Cancel experiment after the user presses any key</span>
    <span>//CancelExperimentAfterAnyKeyPress(cts);</span>
    experimentSettings.CacheDirectoryName = null;

    MulticlassClassificationExperiment experiment = mlContext.<span>Auto()</span>.<span>CreateMulticlassClassificationExperiment(<span>experimentSettings</span>)</span>;
    ExperimentResult&lt;MulticlassClassificationMetrics&gt; experimentResult = experiment.<span>Execute(<span>trainingData</span>, <span>"Entry(Text)"</span>)</span>;<span>//, progressHandler: progressHandler);</span>

..... public class ModelInput { [LoadColumn(0), NoColumn] public string _data0 { get; set; }

[LoadColumn(1), NoColumn] public float ignoreData1 { get; set; }
    [<span>LoadColumn(2, 4205)</span>]
    <span>public</span> <span>float</span> _data { <span>get</span>; <span>set</span>; }

    [<span>LoadColumn(4206),NoColumn</span>]<span>//(4206,4208)]</span>
    <span>public</span> <span>float</span> _ignoreData4206 { <span>get</span>; <span>set</span>; }         
[<span>LoadColumn(4207), NoColumn</span>]<span>//(4206,4208)]</span>
    <span>public</span> <span>float</span> _ignoreData4207 { <span>get</span>; <span>set</span>; }
    [<span>LoadColumn(4208), NoColumn</span>]<span>//(4206,4208)]</span>
    <span>public</span> <span>float</span> _ignoreData4208 { <span>get</span>; <span>set</span>; }

    [<span>LoadColumn(4209),ColumnName(<span>"Entry(Text)"</span>)</span>]
    <span>public</span> <span>string</span> _label { <span>get</span>; <span>set</span>; }
}

There is a Exception of type 'System.OutOfMemoryException' was thrown. (new System.Collections.Generic.Mscorlib_CollectionDebugView<Microsoft.ML.AutoML.RunDetail<Microsoft.ML.Data.MulticlassClassificationMetrics>>(experimentResult.RunDetails).Items[0]).Exception.StackTrace at Microsoft.ML.Internal.Utilities.OrderedWaiter.Wait(Int64 position, CancellationToken token) at Microsoft.ML.Data.CacheDataView.GetPermutationOrNull(Random rand) at Microsoft.ML.Data.CacheDataView.GetRowCursorSetWaiterCore[TWaiter](TWaiter waiter, Func2 predicate, Int32 n, Random rand) at Microsoft.ML.Data.CacheDataView.GetRowCursorSet(IEnumerable1 columnsNeeded, Int32 n, Random rand) at Microsoft.ML.Data.OneToOneTransformBase.GetRowCursorSet(IEnumerable1 columnsNeeded, Int32 n, Random rand) at Microsoft.ML.Data.DataViewUtils.TryCreateConsolidatingCursor(DataViewRowCursor& curs, IDataView view, IEnumerable1 columnsNeeded, IHost host, Random rand) at Microsoft.ML.Data.TransformBase.GetRowCursor(IEnumerable1 columnsNeeded, Random rand) at Microsoft.ML.Trainers.TrainingCursorBase.FactoryBase1.Create(Random rand, Int32[] extraCols) at Microsoft.ML.Trainers.OnlineLinearTrainer2.TrainCore(IChannel ch, RoleMappedData data, TrainStateBase state) at Microsoft.ML.Trainers.OnlineLinearTrainer2.TrainModelCore(TrainContext context) at Microsoft.ML.Trainers.TrainerEstimatorBase2.TrainTransformer(IDataView trainSet, IDataView validationSet, IPredictor initPredictor) at Microsoft.ML.Trainers.OneVersusAllTrainer.TrainOne(IChannel ch, ITrainerEstimator2 trainer, RoleMappedData data, Int32 cls) at Microsoft.ML.Trainers.OneVersusAllTrainer.Fit(IDataView input) at Microsoft.ML.Data.EstimatorChain1.Fit(IDataView input) at Microsoft.ML.Data.EstimatorChain1.Fit(IDataView input) at Microsoft.ML.AutoML.RunnerUtil.TrainAndScorePipeline[TMetrics](MLContext context, SuggestedPipeline pipeline, IDataView trainData, IDataView validData, String groupId, String labelColumn, IMetricsAgent`1 metricsAgent, ITransformer preprocessorTransform, FileInfo modelFileInfo, DataViewSchema modelInputSchema, IChannel logger)

There is a Exception of type 'System.OutOfMemoryException' was thrown. (new System.Collections.Generic.Mscorlib_CollectionDebugView<Microsoft.ML.AutoML.RunDetail<Microsoft.ML.Data.MulticlassClassificationMetrics>>(experimentResult.RunDetails).Items[0]).Exception.InnerException.StackTrace at Microsoft.ML.Internal.Utilities.ArrayUtils.EnsureSize[T](T[]& array, Int32 min, Int32 max, Boolean keepOld, Boolean& resized) at Microsoft.ML.Internal.Utilities.BigArray1.AddRange(ReadOnlySpan1 src) at Microsoft.ML.Data.CacheDataView.ColumnCache.ImplVec`1.CacheCurrent() at Microsoft.ML.Data.CacheDataView.Filler(DataViewRowCursor cursor, ColumnCache[] caches, OrderedWaiter waiter)

prodanovic

prodanovic

enhancement
Icon For Comments1

Is your feature request related to a problem? Please describe. Our data scientists are training lightgbm models in python, and our inference runtime is in C#. We are very much interested in using ML.NET to run inference, however loading the model from a file is not yet supported in ML.NET. Is there an obstacle in adding this additional binding already available in lightgbm C++ API ?

Describe the solution you'd like Add LGBM_BoosterLoadModelFromString binding to WrappedLightGbmInterface () available in https://github.com/Microsoft/LightGBM/blob/master/include/LightGBM/c_api.h

Describe alternatives you've considered An alternative is to convert our models to ONNX and not use ML.NET runtime.

kitosd

kitosd

untriaged
Icon For Comments0

Model Builder Version : 16.1.0.20.27905 Visual Studio Version : 16.11.18

Bug description With freshly-updated [as of today, 2022-08-19] VS2019 and model builder, I tried to use the new ML builder wizard "Value prediction". It failed with an exception: Method not found: 'System.Collections.Generic.IEnumerable1<System.ValueTuple2<Microsoft.ML.AutoML.RunDetail`1<!!0>,Int32>> Microsoft.ML.AutoML.BestResultUtil.GetTopNRunResults

Steps to Reproduce

Installed latest VS 2019 and updated Model Builder to latest version Create new project. Right click on project -> Add -> "Machine Learning" Choose image classification, "Local ML", Next. As input, choose an appropriate folder, observe "data preview" looks like I expected, with images being shown under various classifications Train, Start Training. Expected Experience Pretty much what I'd expect from the tutorial, and of a wizard.

Actual Experience Shortly after click "start training", I get this: "GPU Service not found. Falling back to CPU AutoML Service."

Moments later, this exception: Method not found: 'System.Collections.Generic.IEnumerable1<System.ValueTuple2<Microsoft.ML.AutoML.RunDetail1<!!0>,Int32>> Microsoft.ML.AutoML.BestResultUtil.GetTopNRunResults(System.Collections.Generic.IEnumerable1<Microsoft.ML.AutoML.RunDetail1<!!0>>, Microsoft.ML.AutoML.IMetricsAgent1<!!0>, Int32, Boolean)'. at Microsoft.ML.ModelBuilder.AutoMLService.Experiments.AutoMLExperiment3.d__21.MoveNext() at System.Runtime.CompilerServices.AsyncTaskMethodBuilder1.Start[TStateMachine](TStateMachine& stateMachine) at Microsoft.ML.ModelBuilder.AutoMLService.Experiments.AutoMLExperiment3.ExecuteAsync(IDataView trainData, IDataView validateData, ColumnInformation columnInformation, Nullable1 userCancellationToken, Nullable`1 timeout) at Microsoft.ML.ModelBuilder.AutoMLEngine.d__30.MoveNext() in /_/src/Microsoft.ML.ModelBuilder.AutoMLService/AutoMLEngineService/AutoMLEngine.cs:line 147

luisquintanilla

luisquintanilla

enhancement
Icon For Comments0

Description

TrialResult gives you important information about each trial AutoML runs as part of an experiment. Although today we have the NotebookMonitor for real-time visualizations and feedback during training, once training is done, to get similar information, users have to write their own code to access each of these pieces of information.

Proposal

Given an AutoML experiment called experiment, when training is done, users should be able to call ToDataFrame to convert the results of an AutoML experiment into a DataFrame.

The result should display columns with information for each of the trials in a DataFrame similar to the following:

TrialId DurationInMilliseconds Metric Pipeline Parameter

21 8.5 0.98 ... ...

5 3.8 0.83 ... ...

From there, if users to sort / query the results using the built-in DataFrame methods, they can. Additionally, they can export to CSV using the WriteCsv method.

wil70

wil70

untriaged
Icon For Comments24

System Information (please complete the following information):

  • OS & Version: Win8, latest version as of this bug entry
  • ML.NET Version: 16.13.9
  • .NET Version:6.0.303

Describe the bug When I start ML.net from CLI, I get a OutOfMemoryException I have 64GB Ram, I have a 330GB csv file of data.

I tried with To Reproduce Steps to reproduce the behavior:

  1. Generate a 330GB file with 4209 columns with random data
  2. open prompt
  3. type in command line: mlnet classification --train-time 75600 --name SampleClassification --log-file-path c:\Log_data.txt --has-header true --label-col 4209 --ignore-cols 0,1,4206,4207,4208 --dataset "c:\data.csv" --test-dataset "c:\test_data.csv"
  4. See error log at the end of this message with the OutOfMemoryException

Expected behavior I expect ml.net to continue and feed the data as it stream it, so there should be no OutOfMemoryException When I monitor the mknet.exe prices with task manager, the mlnet.exe process doesn't go high at all, like less than ~14GB. So something is not right as I have 64GB and also it shouldn't matter isn't it as .

Screenshots, Code, Sample Projects Additional context Here is the log Start Training start nni training Experiment output folder: C:\Users\W\AppData\Local\Temp\AutoML-NNI\Experiment-GET3JS System.FormatException: Parsing failed with an exception: Stream reading encountered exception ---> System.FormatException: Stream reading encountered exception ---> System.OutOfMemoryException: Exception of type 'System.OutOfMemoryException' was thrown. at System.Text.StringBuilder.ToString() at System.IO.StreamReader.ReadLine() at Microsoft.ML.Data.TextLoader.Cursor.LineReader.ThreadProc() --- End of inner exception stack trace --- at Microsoft.ML.Data.TextLoader.Cursor.LineReader.GetBatch() at Microsoft.ML.Data.TextLoader.Cursor.ParallelState.Parse(Int32 tid) at Microsoft.ML.Data.TextLoader.Cursor.ParallelState.ThreadProc(Object obj) --- End of inner exception stack trace --- at Microsoft.ML.Data.TextLoader.Cursor.ParseParallel(ParallelState state)+MoveNext() at Microsoft.ML.Data.TextLoader.Cursor.MoveNextCore() at Microsoft.ML.Data.RootCursorBase.MoveNext() at Microsoft.ML.ModelBuilder.AutoMLService.Proposer.Controller.CountRows(IDataView data, Int64 maxRows) in //src/Microsoft.ML.ModelBuilder.AutoMLService/Proposer/Controller.cs:line 174 at Microsoft.ML.ModelBuilder.AutoMLService.Proposer.Controller.Initialize() in //src/Microsoft.ML.ModelBuilder.AutoMLService/Proposer/Controller.cs:line 111 at Microsoft.ML.ModelBuilder.AutoMLService.Experiments.LocalAutoMLExperiment.ExecuteAsync(IDataView trainData, IDataView validateData, ColumnInformation columnInformation, CancellationToken cancellationToken, CancellationToken timeout) in //src/Microsoft.ML.ModelBuilder.AutoMLService/Experiments/LocalAutoMLExperiment.cs:line 138 at Microsoft.ML.ModelBuilder.AutoMLEngine.StartTrainingAsync(TrainingConfiguration config, PathConfiguration pathConfig, CancellationToken userCancellationToken) in //src/Microsoft.ML.ModelBuilder.AutoMLService/AutoMLEngineService/AutoMLEngine.cs:line 160 at Microsoft.ML.CLI.Runners.AutoMLRunner.ExecuteAsync() in //src/mlnet/Runners/AutoMLRunner.cs:line 88 at Microsoft.ML.CLI.Program.TrainAsync(TrainingConfiguration trainingConfiguration, PathConfiguration pathConfig, AutoMLServiceLogLevel logLevel) in //src/mlnet/Program.cs:line 348 at Microsoft.ML.CLI.Program.AutoMLCommandRunner(AutoMLCommand command, Boolean skipGenerateConsoleApp) in //src/mlnet/Program.cs:line 329 at Microsoft.ML.CLI.Program.<>c.<b__4_0>d.MoveNext() in //src/mlnet/Program.cs:line 89 --- End of stack trace from previous location --- at System.CommandLine.Invocation.CommandHandler.GetExitCodeAsync(Object value, InvocationContext context) at System.CommandLine.Invocation.ModelBindingCommandHandler.InvokeAsync(InvocationContext context) at System.CommandLine.Invocation.InvocationPipeline.<>c__DisplayClass4_0.<b__0>d.MoveNext() --- End of stack trace from previous location --- at System.CommandLine.Builder.CommandLineBuilderExtensions.<>c__DisplayClass23_0.<b__0>d.MoveNext() --- End of stack trace from previous location --- at Microsoft.ML.CLI.Program.<>c__DisplayClass4_0.<b__9>d.MoveNext() in /_/src/mlnet/Program.cs:line 290 --- End of stack trace from previous location --- at System.CommandLine.Builder.CommandLineBuilderExtensions.<>c.<b__24_0>d.MoveNext() --- End of stack trace from previous location --- at System.CommandLine.Builder.CommandLineBuilderExtensions.<>c__DisplayClass22_0.<b__0>d.MoveNext() --- End of stack trace from previous location --- at System.CommandLine.Builder.CommandLineBuilderExtensions.<>c__DisplayClass11_0.<b__0>d.MoveNext() --- End of stack trace from previous location --- at System.CommandLine.Builder.CommandLineBuilderExtensions.<>c.<b__10_0>d.MoveNext() --- End of stack trace from previous location --- at System.CommandLine.Builder.CommandLineBuilderExtensions.<>c__DisplayClass14_0.<b__0>d.MoveNext() Check out log file for more information: c:\Log_data.txt Exiting ...

C:\Users\W>'

wil70

wil70

enhancement
Icon For Comments1

Hello, is there a way to end earlier a long training? Like I put 600000 seconds, and I have good results already, so I would like to finish it sooner. How do I do this? A way to end earlier and also be able to resume/continue from there would be awesome (Like I could add x extra minutes of training) TY! w

wil70

wil70

enhancement
Icon For Comments1

Hello, is there a way to resume an ML.net cli training to where it was before a crash? I have a lot of data in the folder C:\Users\wwww\AppData\Local\Temp\AutoML-NNI\Experiment-9K67B4 but I do not know how to make mlnet start from there.

Detail: I used the cli, ie "mlnet classicfiaction...." I trained for a few days, but I made a mistake which used a lot of memory on my computer, which stopped the mlnet process. I would like to start mlnet to where it left so it can continue from there

Thanks w

2022-08-10 15:03:24.3091 DEBUG System.InvalidOperationException: Event we were waiting on was subject to an exception ---> System.OutOfMemoryException: Exception of type 'System.OutOfMemoryException' was thrown. at System.Array.Resize[T](T[]& array, Int32 newSize) at Microsoft.ML.Internal.Utilities.ArrayUtils.EnsureSize[T](T[]& array, Int32 min, Int32 max, Boolean keepOld, Boolean& resized) at Microsoft.ML.Data.CacheDataView.ColumnCache.ImplOne`1.CacheCurrent() at Microsoft.ML.Data.CacheDataView.Filler(DataViewRowCursor cursor, ColumnCache[] caches, OrderedWaiter waiter)

Versions

Quick list of the latest released versions

v1.7.1 - Mar 09, 2022

Minor servicing update with dependency updates and PFI bug fix for correctly finding the correct transformer to use.

v1.7.0-rc.1 - Oct 21, 2021

ML.NET 1.7.0 RC 1

Moving forward, we are going to be aligning more with the overall .NET release schedule. As such, this is a smaller release since we had a larger one just about 3 months ago but it aligns us with the release of .NET 6.

New Features

ML.NET

  • Switched to getting version from assembly custom attributes- (#4512) Remove reliance on getting product version for model.zip/version.txt from FileVersionInfo and replace with using assembly custom attributes. This will help in supporting single file applications. (Thanks @r0ss88)
  • Can now optionally not dispose of the underlying model when you dispose a prediction engine. (#5964) A new prediction engine options class has been added that lets you determine if the underlying model should be disposed of or not when the prediction engine itself is disposed of.
  • Can now set the number of threads that onnx runtime uses (#5962) This lets you specify the number of parallel threads ONNX runtime will use to execute the graph and run the model. (Thanks @yaeldekel)
  • The PFI API has been completely reworked and is now much more user friendly (#5934) You can now get the output from PFI as a dictionary mapping the column name (or the slot name) to its PFI result.

DataFrame

  • Can now merge using multiple columns in a JOIN condition (#5838) (Thanks @asmirnov82)

Enhancements

ML.NET

  • Run formatting on all src projects (#5937) (Thanks @jwood803)
  • Added BufferedStream for reading from DeflateStream - reduces loading time for .NET core (#5924) (Thanks @martintomasek)
  • Update editor config to match Roslyn and format samples (#5893) (Thanks @jwood803)
  • Few more minor editor config changes (#5933)

DataFrame

  • Use Equals and = operator for DataViewType comparison (#5942) (Thanks @thoron)

Bug Fixes

  • Initialize _bestMetricValue when using the Loss metric (#5939) (Thanks @MiroslavKabat)

Build / Test updates

  • Changed the queues used for building/testing from Ubuntu 16.04 to 18.04 (#5970)
  • Add in support for building with VS 2022. (#5956)
  • Codecov yml token was added (#5950)
  • Move from XliffTasks to Microsoft.DotNet.XliffTasks (#5887)

Documentation Updates

  • Fixed up Readme, updated the roadmap, and new doc detailing some platform limitations. (#5892)

Breaking Changes

  • None

v1.6.0 - Jul 15, 2021

ML.NET 1.6.0

New Features

  • Support for Arm/Arm64/Apple Silicon has been added. (#5789) You can now use most ML.NET on Arm/Arm64/Apple Silicon devices. Anything without a hard dependency on x86 SIMD instructions or Intel MKL are supported.
  • Support for specifying a temp path ML.NET will use. (#5782) You can now set the TempFilePath in the MLContext that it will use.
  • Support for specifying the recursion limit to use when loading an ONNX model (#5840) The recursion limit defaults to 100, but you can now specify the value in case you need to use a larger number. (Thanks @Crabzmatic)
  • Support for saving Tensorflow models in the SavedModel format added (#5797) You can now save models that use the Tensorflow SavedModel format instead of just the frozen graph format. (Thanks @darth-vader-lg)
  • DataFrame Specific enhancements
  • Extended DataFrame GroupBy operation (#5821) Extend DataFrame GroupBy operation by adding new property Groupings. This property returns collection of IGrouping objects (the same way as LINQ GroupBy operation does) (Thanks @asmirnov82)

Enhancements

  • Switched from using a fork of SharpZipLib to using the official package (#5735)
  • Let user specify a temp path location (#5782)
  • Clean up ONNX temp models by opening with a "Delete on close" flag (#5782)
  • Ensures the named model is loaded in a PredictionEnginePool before use (#5833) (Thanks @feiyun0112)
  • Use indentation for 'if' (#5825) (Thanks @feiyun0112)
  • Use Append instead of AppendFormat if we don't need formatting (#5826) (Thanks @feiyun0112)
  • Cast by using is operator (#5829) (Thanks @feiyun0112)
  • Removed unnecessary return statements (#5828) (Thanks @feiyun0112)
  • Removed code that could never be executed (#5808) (Thanks @feiyun0112)
  • Remove some empty statements (#5827) (Thanks @feiyun0112)
  • Added in short-circuit logic for conditionals (#5824) (Thanks @feiyun0112)
  • Update LightGBM to v2.3.1 (#5851)
  • Raised the default recursion limit for ONNX models from 10 to 100. (#5796) (Thanks @darth-vader-lg)
  • Speed up the inference of the Tensorflow saved_models. (#5848) (Thanks @darth-vader-lg)
  • Speed-up bitmap operations on images. (#5857) (Thanks @darth-vader-lg)
  • Updated to latest version of Intel MKL. (#5867)
  • AutoML.NET specific enhancements
  • Offer suggestions for possibly mistyped label column names in AutoML (#5624) (Thanks @Crabzmatic)
  • DataFrame Specific enhancements
  • Improve csv parsing (#5711)
  • IDataView to DataFrame (#5712)
  • Update to the latest Microsoft.DotNet.Interactive (#5710)
  • Move DataFrame to machinelearning repo (#5641)
  • Improvements to the sort routine (#5776)
  • Improvements to the Merge routine (#5778)
  • Improve DataFrame exception text (#5819) (Thanks @asmirnov82)
  • DataFrame csv DateTime enhancements (#5834)

Bug Fixes

  • Fix erroneous use of TaskContinuationOptions in ThreadUtils.cs (#5753)
  • Fix a few locations that can try to access a null object (#5804) (Thanks @feiyun0112)
  • Use return value of method (#5818) (Thanks @feiyun0112)
  • Adding throw to some exceptions that weren't throwing them originally (#5823) (Thanks @feiyun0112)
  • Fixed a situation in the CountTargetEncodingTransformer where it never reached the stop condition (#5822) (Thanks @feiyun0112)
  • DataFrame Specific bug fixes
  • Fix issue with DataFrame Merge method (#5768) (Thanks @asmirnov82)

Build / Test updates

  • Changed default branch from master to main (#5715) (#5717) (#5719)
  • Fix for libomp in the CI process for MacOS 11 (#5771)
  • Minor code cleanup. (#5770)
  • Updated arcade to the latest version (#5783)
  • Switched signing certificate to use dotnet certificate (#5794)
  • Building natively and cross targeting for Arm/Arm64/Apple Silicon is now supported. (#5789)
  • Upload classic pdb to symweb (#5816)
  • Fix MacOS CI issue (#5854)
  • Added in a Helix Integration for testing. (#5837)
  • Added in Helix Integration for arm/arm64/Apple Silicon for testing (#5860)

Documentation Updates

  • Fixed markdown issues in MulticlassClassificationMetrics and CalibratedBinaryClassificationMetrics (#5732) (Thanks @R0Wi)
  • Update unix instructions for x-compiling on ARM (#5811)
  • Update Contribution.MD with description of help wanted tags (#5815)
  • Add Korean translation for repo readme.md (#5780) (Thanks @metr0jw)
  • Fix spelling error in MLContext class summary (#5832) (Thanks @Crabzmatic)
  • Update issue templates (#5846)

Breaking Changes

  • None

v1.5.5 - Mar 03, 2021

New Features

  • New API allowing confidence parameter to be a double.(#5623) . A new API has been added to accept double type for the confidence level. This helps when you need to have higher precision than an int will allow for. (Thank you @esso23)
  • Support to export ValueMapping estimator to ONNX was added (#5577)
  • New API to treat TensorFlow output as batched/not-batched (#5634) A new API has been added so you can specify if the output from TensorFlow is batched or not.

Enhancements

  • Make ColumnInference serializable (#5611)

Bug Fixes

  • AutoML.NET specific fixes.
    • Fixed an AutoML aggregate timeout exception (#5631)
    • Offer suggestions for possibly mistyped label column names in AutoML (#5624) (Thank you @Crabzmatic)
  • Update some ToString conversions (#5627) (Thanks @4201104140)
  • Fixed an issue in SRCnnEntireAnomalyDetector (#5579)
  • Fixed nuget.config multi-feed issue (#5614)
  • Remove references to Microsoft.ML.Scoring (#5602)
  • Fixed Averaged Perceptron default value (#5586)

Build / Test updates

  • Fixing official build by adding homebrew bug workaround (#5596)
  • Nuget.config url fix for roslyn compilers (#5584)
  • Add SymSgdNative reference to AutoML.Tests.csproj (#5559)

Documentation Updates

  • Updated documentation for the correct version of CUDA for TensorFlow. (#5635)
  • Updates documentation for an issue with brew and installing libomp. (#5635)
  • Updated an ONNX url to the correct url. (#5635)
  • Added a note in the documentation that the PredictionEngine is not thread safe. (#5583)

Breaking Changes

  • None

v1.5.4 - Jan 05, 2021

New Features

  • New API for exporting models to Onnx. (#5544). A new API has been added to Onnx converter to specify the output columns you care about. This will export a smaller and more performant model in many cases.

Enhancements

  • Perf improvement for TopK Accuracy and return all topK in Classification Evaluator (#5395) (Thank you @jasallen)
  • Update OnnxRuntime to 1.6 (#5529)
  • Updated tensorflow.net to 0.20.0 (#5404)
  • Added in DcgTruncationLevel to AutoML api and increased default level to 10 (#5433)

Bug Fixes

  • AutoML.NET specific fixes.
    • Fixed AutoFitMaxExperimentTimeTest (#5506)
    • Fixed code generator tests failure (#5520)
    • Use Timer and ctx.CancelExecution() to fix AutoML max-time experiment bug (#5445)
    • Handled exception during GetNextPipeline for AutoML (#5455)
    • Fixed internationalization bug(#5162) in AutoML parameter sweeping caused by culture dependent float parsing. (#5163)
    • Fixed MaxModels exit criteria for AutoML unit test (#5471)
    • Fixed AutoML CrossValSummaryRunner for TopKAccuracyForAllK (#5548)
  • Fixed bug in Tensorflow Transforer with handling primitive types (#5547)
  • Fixed MLNet.CLI build error (#5546)
  • Fixed memory leaks from OnnxTransformer (#5518)
  • Fixed memory leak in object pool (#5521)
  • Fixed Onnx Export for ProduceWordBags (#5435)
  • Upgraded boundary calculation and expected value calculation in SrCnnEntireAnomalyDetector (#5436)
  • Fixed SR anomaly score calculation at beginning (#5502)
  • Improved error message in ColumnConcatenatingEstimator (#5444)
  • Fixed issue 5020, allow ML.NET to load tf model with primitive input and output column (#5468)
  • Fixed issue 4322, enable lda summary output (#5260)
  • Fixed perf regression in ShuffleRows (#5417)
  • Change the _maxCalibrationExamples default on CalibratorUtils (#5415)

Build / Test updates

  • Migrated to Arcade build system that is used my multiple dotnet projects. This will give increased build/CI efficiencies going forward. Updated build instructions can be found in the docs/building folder
  • Fixed MacOS builds (#5467 and #5457)

Documentation Updates

  • Fixed Spelling on stopwords (#5524)(Thank you @LeoGaunt)
  • Changed LoadRawImages Sample (#5460)

Breaking Changes

  • None

v1.5.2 - Sep 14, 2020

New Features

  • New API and algorithms for time series data. In this release ML.NET introduces new capabilities for working with time series data.
    • Detecting seasonality in time series (#5231)
    • Removing seasonality from time series prior to anomaly detection (#5202)
    • Threshold for root cause analysis (#5218)
    • RCA for anomaly detection can now return multiple dimensions(#5236)
  • Ranking experiments in AutoML.NET API. ML.NET now adds support for automating ranking experiments. (#5150, #5246) Corresponding support will soon be added to Model Builder in Visual Studio.
  • Cross validation support in ranking (#5263)
  • CountTargetEncodingEstimator. This transforms a categorical column into a set of features that includes the count of each label class, the log-odds for each label class and the back-off indicator (#4514)

Enhancements

  • Onnx Enhancements
    • Support more types for ONNX export of HashEstimator (#5104)
    • Added ONNX export support for NaiveCalibrator (#5289)
    • Added ONNX export support for StopWordsRemovingEstimator and CustomStopWordsRemovingEstimator (#5279)
    • Support onnx export with previous OpSet version (#5176)
    • Added a sample for Onnx conversion (#5195)
  • New features in old transformers
    • Robust Scaler now added to the Normalizer catalog (#5166)
    • ReplaceMissingValues now supports Mode as a replacement method. (#5205)
    • Added in standard conversions to convert types to string (#5106)
  • Output topic summary to model file for LDATransformer (#5260)
  • Use Channel Instead of BufferBlock (#5123, #5313). (Thanks @jwood803)
  • Support specifying command timeout while using the database loader (#5288)
  • Added cross entropy support to validation training, edited metric reporting (#5255)
  • Allow TextLoader to load empty float/double fields as NaN instead of 0 (#5198)

Bug Fixes

  • Changed default value of RowGroupColumnName from null to GroupId (#5290)
  • Updated AveragedPerceptron default iterations from 1 to 10 (#5258)
  • Properly normalize column names in Utils.GetSampleData() for duplicate cases (#5280)
  • Add two-variable scenario in Tensor shape inference for TensorflowTransform (#5257)
  • Fixed score column name and order bugs in CalibratorTransformer (#5261)
  • Fix for conditional error in root cause analysis additions (#5269)
  • Ensured Sanitized Column Names are Unique in AutoML CLI (#5177)
  • Ensure that the graph is set to be the current graph when scoring with multiple models (#5149)
  • Uniform onnx conversion method when using non-default column names (#5146)
  • Fixed multiple issues related to splitting data. (#5227)
  • Changed default NGram length from 1 to 2. (#5248)
  • Improve exception msg by adding column name (#5232)
  • Use model schema type instead of class definition schema (#5228)
  • Use GetRandomFileName when creating random temp folder to avoid conflict (#5229)
  • Filter anomalies according to boundaries under AnomalyAndMargin mode (#5212)
  • Improve error message when defining custom type for variables (#5114)
  • Fixed OnnxTransformer output column mapping. (#5192)
  • Fixed version format of built packages (#5197)
  • Improvements to "Invalid TValue" error message (#5189)
  • Added IDisposable to OnnxTransformer and fixed memory leaks (#5348)
  • Fixes #4392. Added AddPredictionEnginePool overload for implementation factory (#4393)
  • Updated codegen to make it work with mlnet 1.5 (#5173)
  • Updated codegen to support object detection scenario. (#5216)
  • Fix issue #5350, check file lock before reload model (#5351)
  • Improve handling of infinity values in AutoML.NET when calculating average CV metrics (#5345)
  • Throw when PCA generates invalid eigenvectors (#5349)
  • RobustScalingNormalizer entrypoint added (#5310)
  • Replace whitelist terminology to allow list (#5328) (Thanks @LetticiaNicoli)
  • Fixes (#5352) issues caused by equality with non-string values for root cause localization (#5354)
  • Added catch in R^2 calculation for case with few samples (#5319)
  • Added support for RankingMetrics with CrossValSummaryRunner (#5386)

Test updates

  • Refactor of OnnxConversionTests.cs (#5185)
  • New code coverage (#5169)
  • Test fix using breastcancel dataset and test cleanup (#5292)

Documentation Updates

  • Updated ORT version info for OnnxScoringEstimator (#5175)
  • Updated OnnxTransformer docs (#5296)
  • Improve VectorTypeAttribute(dims) docs (#5301)

Breaking Changes

  • None

v1.5.0 - May 26, 2020

New Features

  • New anomaly detection algorithm (#5135). ML.NET has previously supported anomaly detection through DetectAnomalyBySrCnn. This function operates in a streaming manner by computing anomalies around each arriving point and examining a window around it. Now we introduce a new function DetectEntireAnomalyBySrCnn that computes anomalies by considering the entire dataset and also supports the ability to set sensitivity and output margin.
  • Root Cause Detection (#4925) ML.NET now also supports root cause detection for anomalies detected in time series data.

Enhancements

  • Updates to TextLoader
    • Enable TextLoader to accept new lines in quoted fields (#5125)
    • Add escapeChar support to TextLoader (#5147)
    • Add public generic methods to TextLoader catalog that accept Options objects (#5134)
    • Added decimal marker option in TextLoader (#5145, #5154)
  • Onnxruntime updated to v1.3 (#5104). This brings support for additional data types for the HashingEstimator.
  • Onnx export for OneHotHashEncodingTransformer and HashingTransormer (#5013, #5152, #5138)
  • Support for Categorical features in CalculateFeatureContribution of LightGBM (#5018)

Bug Fixes

In this release we have traced down every bug that would occur randomly and sporadically and fixed many subtle bugs. As a result, we have also re-enabled a lot of tests listed in the Test Updates section below.

  • Fixed race condition for test MulticlassTreeFeaturizedLRTest (#4950)
  • Fix SsaForecast bug (#5023)
  • Fixed x86 crash (#5081)
  • Fixed and added unit tests for EnsureResourceAsync hanging issue (#4943)
  • Added IDisposable support for several classes (#4939)
  • Updated libmf and corresponding MatrixFactorizationSimpleTrainAndPredict() baselines per build (#5121)
  • Fix MatrixFactorization trainer's warning (#5071)
  • Update CodeGenerator's console project to netcoreapp3.1 (#5066)
  • Let ImageLoadingTransformer dispose the last image it loads (#5056)
  • [LightGBM] Fixed bug for empty categorical values (#5048)
  • Converted potentially large variables to type long (#5041)
  • Made resource downloading more robust (#4997)
  • Updated MultiFileSource.Load to fix inconsistent behavior with multiple files (#5003)
  • Removed WeakReference already cleaned up by GC (#4995)
  • Fixed Bitmap(file) locking the file. (#4994)
  • Remove WeakReference list in PredictionEnginePoolPolicy. (#4992)
  • Added the assembly name of the custom transform to the model file (#4989)
  • Updated constructor of ImageLoadingTransformer to accept empty imageFolder paths (#4976)

Onnx bug fixes

  • ColumnSelectingTransformer now infers ONNX shape (#5079)
  • Fixed KMeans scoring differences between ORT and OnnxRunner (#4942)
  • CountFeatureSelectingEstimator no selection support (#5000)
  • Fixes OneHotEncoding Issue (#4974)
  • Fixes multiclass logistic regression (#4963)
  • Adding vector tests for KeyToValue and ValueToKey (#5090)

AutoML fixes

  • Handle NaN optimization metric in AutoML (#5031)
  • Add projects capability in CodeGenerator (#5002)
  • Simplify CodeGen - phase 2 (#4972)
  • Support sweeping multiline option in AutoML (#5148)

Test updates

  • Fix libomp installation for MacOS Builds(#5143, #5141)
  • address TF test download fail, use resource manager with retry download (#5102)
  • Adding OneHotHashEncoding Test (#5098)
  • Changed Dictionary to ConcurrentDictionary (#5097)
  • Added SQLite database to test loading of datasets in non-Windows builds (#5080)
  • Added ability to compare configuration specific baselines, updated baslines for many tests and re-enabled disabled tests (#5045, #5059, #5068, #5057, #5047, #5029, #5094, #5060)
  • Fixed TestCancellation hanging (#4999)
  • fix benchmark test hanging issue (#4985)
  • Added working version of checking whether file is available for access (#4938)

Documentation Updates

  • Update OnnxTransformer Doc XML (#5085)
  • Updated build docs for .NET Core 3.1 (#4967)
  • Updated OnnxScoringEstimator's documentation (#4966)
  • Fix xrefs in the LDSVM trainer docs (#4940)
  • Clarified parameters on time series (#5038)
  • Update ForecastBySsa function specifications and add seealso (#5027)
  • Add see also section to TensorFlowEstimator docs (#4941)

Breaking Changes

  • None

v1.5.0-preview2 - Mar 12, 2020

New Features (IN-PREVIEW, please provide feedback)

  • TimeSeriesImputer (#4623) This data transformer can be used to impute missing rows in time series data.
  • LDSVM Trainer (#4060) The "Local Deep SVM" usess trees as its SVM kernel to create a non-linear binary trainer. A sample can be found here.
  • Onnxruntime updated to v1.2 This also includes support for GPU execution of onnx models
  • Export-to-ONNX for below components:
    • SlotsDroppingTransformer (#4562)
    • ColumnSelectingTransformer (#4590)
    • VectorWhiteningTransformer (#4577)
    • NaiveBayesMulticlassTrainer (#4636)
    • PlattCalibratorTransformer (#4699)
    • TokenizingByCharactersTransformer (#4805)
    • TextNormalizingTransformer (#4781)

Bug Fixes

  • Fix issue in WaiterWaiter caused by race condition (#4829)
    • Onnx Export change to allow for running inference on multiple rows in OnnxRuntime (#4783)
  • Data splits to default to MLContext seed when not specified (#4764)
  • Add Seed property to MLContext and use as default for data splits (#4775)
  • Onnx bug fixes
    • Updating onnxruntime version (#4882)
    • Calculate ReduceSum row by row in ONNX model from OneVsAllTrainer (#4904)
    • Several onnx export fixes related to KeyToValue and ValueToKey transformers (#4900, #4866, #4841, #4889, #4878, #4797)
    • Fixes to onnx export for text related transforms (#4891, #4813)
    • Fixed bugs in OptionalColumnTransform and ColumnSelecting (#4887, #4815)
    • Alternate solution for ColumnConcatenatingTransformer (#4875)
    • Added slot names support for OnnxTransformer (#4857)
    • Fixed output schema of OnnxTransformer (#4849)
    • Changed Binarizer node to be cast to the type of the predicted label … (#4818)
    • Fix for OneVersusAllTrainer (#4698)
    • Enable OnnxTransformer to accept KeyDataViewTypes as if they were UInt32 (#4824)
    • Fix off by 1 error with the cats_int64s attribute for the OneHotEncoder ONNX operator (#4827)
    • Changed Binarizer node to be cast to the type of the predicted label … (#4818)
    • Updated handling of missing values with LightGBM, and added ability to use (0) as missing value (#4695)
    • Double cast to float for some onnx estimators (#4745)
    • Fix onnx output name for GcnTransform (#4786)
  • Added support to run PFI on uncalibrated binary classification models (#4587)
  • Fix bug in WordBagEstimator when training on empty data (#4696)
  • Added Cancellation mechanism to Image Classification (through the experimental nuget) (fixes #4632) (#4650)
  • Changed F1 score to return 0 instead of NaN when Precision + Recall is 0 (#4674)
  • TextLoader, BinaryLoader and SvmLightLoader now check the existence of the input file before training (#4665)
  • ImageLoadingTransformer now checks the existence of input folder before training (#4691)
  • Use random file name for AutoML experiment folder (#4657)
  • Using invariance culture when converting to string (#4635)
  • Fix NullReferenceException when it comes to Recommendation in AutoML and CodeGenerator (#4774)

Enhancements

  • Added in support for System.DateTime type for the DateTimeTransformer (#4661)
  • Additional changes to ExpressionTransformer (#4614)
  • Optimize generic MethodInfo for Func (#4588)
  • Data splits to default to MLContext seed when not specified (#4764)
  • Added in DateTime type support for TimeSeriesImputer (#4812)

Test updates

  • Code analysis updates
    • Update analyzer test library (#4740)
    • Enable the internal code analyzer for test projects (#4731)
    • Implement MSML_ExtendBaseTestClass (Test classes should be derived from BaseTestClass) (#4746)
    • Enable MSML_TypeParamName for the full solution (#4762)
    • Enable MSML_ParameterLocalVarName for the full solution (#4833)
    • Enable MSML_SingleVariableDeclaration for the full solution (#4765)
  • Better logging from tests
    • Ensure tests capture the full log (#4710)
    • Fix failure to capture test failures (#4716)
    • Collect crash dump upload dump and pdb to artifact (#4666)
  • Enable Conditional Numerical Reproducibility for tests (#4569)
  • Changed all MLContext creation to include a fixed seed (#4736)
  • Fix incorrect SynchronizationContext use in TestSweeper (#4779)

Documentation Updates

Breaking Changes

  • None

v1.5.0-preview - Jan 01, 2020

New Features (IN-PREVIEW, please provide feedback)

  • Export-to-ONNX for below components:

    • WordTokenizingTransformer (#4451)
    • NgramExtractingTransformer (#4451)
    • OptionalColumnTransform (#4454)
    • KeyToValueMappingTransformer (#4455)
    • LbfgsMaximumEntropyMulticlassTrainer (4462)
    • LightGbmMulticlassTrainer (4462)
    • LightGbmMulticlassTrainer with SoftMax (4462)
    • OneVersusAllTrainer (4462)
    • SdcaMaximumEntropyMulticlassTrainer (4462)
    • SdcaNonCalibratedMulticlassTrainer (4462)
    • CopyColumn Transform (#4486)
    • PriorTrainer (#4515)
  • DateTime Transformer (#4521)

  • Loader and Saver for SVMLight file format (#4190) Sample

  • Expression transformer (#4548)
    The expression transformer takes the expression in the form of text using syntax of a simple expression language, and performs the operation defined in the expression on the input columns in each row of the data. The transformer supports having a vector input column, in which case it applies the expression to each slot of the vector independently. The expression language is extendable to user defined operations. Sample

Bug Fixes

  • Fix using permutation feature importance with Binary Prediction Transformer and CalibratedModelParametersBase loaded from disk. (#4306)
  • Fixed model saving and loading of OneVersusAllTrainer to include SoftMax. (#4472)
  • Ignore hidden columns in AutoML schema checks of validation data. (#4490)
  • Ensure BufferBlocks are completed and empty in RowShufflingTransformer. (#4479)
  • Create methods not being called when loading models from disk. (#4485)
  • Fixes onnx exports for binary classification trainers. (#4463)
  • Make PredictionEnginePool.GetPredictionEngine thread safe. (#4570)
  • Memory leak when using FeaturizeText transform. (#4576)
  • System.ArgumentOutOfRangeException issue in CustomStopWordsRemovingTransformer. (#4592)
  • Image Classification low accuracy on EuroSAT Dataset. (4522)

Stability fixes by Sam Harwell

  • Prevent exceptions from escaping FileSystemWatcher events. (#4535)
  • Make local functions static where applicable. (#4530)
  • Disable CS0649 in OnnxConversionTest. (#4531)
  • Make test methods public. (#4532)
  • Conditionally compile helper code. (#4534)
  • Avoid running API Compat for design time builds. (#4529)
  • Pass by reference when null is not expected. (#4546)
  • Add Xunit.Combinatorial for test projects. (#4545)
  • Use Theory to break up tests in OnnxConversionTest. (#4533)
  • Update code coverage integration. (#4543)
  • Use std::unique_ptr for objects in LdaEngine. (#4547)
  • Enable VSTestBlame to show details for crashes. (#4537)
  • Use std::unique_ptr for samplers_ and likelihood_in_iter_. (#4551)
  • Add tests for IParameterValue implementations. (#4549)
  • Convert LdaEngine to a SafeHandle. (#4538)
  • Create SafeBoosterHandle and SafeDataSetHandle. (#4539)
  • Add IterationDataAttribute. (#4561)
  • Add tests for ParameterSet equality. (#4550)
  • Add a test handler for AppDomain.UnhandledException. (#4557)

Breaking Changes

None

Enhancements

  • Hash Transform API that takes in advanced options. (#4443)
  • Image classification performance improvements and option to create validation set from train set. (#4522)
  • Upgraded OnnxRuntime to v1.0 and Google Protobuf to 3.10.1. (#4416)

CLI and AutoML API

  • None.

Remarks

  • Thank you, Sam Harwell for making a series of stability fixes that has substantially increased the stability of our Build CI.

v1.4.0 - Jan 01, 2020

New Features

  • General Availability of Image Classification API
    Introduces Microsoft.ML.Vision package that enables image classification by leveraging an existing pre-trained deep neural network model. Here the API trains the last classification layer using TensorFlow by using its C# bindings from TensorFlow .NET. This is a high level API that is simple yet powerful. Below are some of the key features:

    • GPU training: Supported on Windows and Linux, more information here.
    • Early stopping: Saves time by stopping training automatically when model has been stabelized.
    • Learning rate scheduler: Learning rate is an integral and potentially difficult part of deep learning. By providing learning rate schedulers, we give users a way to optimize the learning rate with high initial values which can decay over time. High initial learning rate helps to introduce randomness into the system, allowing the Loss function to better find the global minima. While the decayed learning rate helps to stabilize the loss over time. We have implemented Exponential Decay Learning rate scheduler and Polynomial Decay Learning rate scheduler.
    • Pre-trained DNN Architectures: The supported DNN architectures used internally for transfer learning are below:
      • Inception V3.
      • ResNet V2 101.
      • ResNet V2 50.
      • MobileNet V2.

    Example code:

    Samples

    Defaults

    Learning rate scheduling

    Early stopping

    ResNet V2 101 train-test split

    End-to-End

  • General Availability of Database Loader
    The database loader enables to load data from databases into the IDataView and therefore enables model training directly against relational databases. This loader supports any relational database provider supported by System.Data in .NET Core or .NET Framework, meaning that you can use any RDBMS such as SQL Server, Azure SQL Database, Oracle, SQLite, PostgreSQL, MySQL, Progress, etc.

    It is important to highlight that in the same way as when training from files, when training with a database ML .NET also supports data streaming, meaning that the whole database doesn’t need to fit into memory, it’ll be reading from the database as it needs so you can handle very large databases (i.e. 50GB, 100GB or larger).

    Example code:

    Design specification

    Sample

    How to doc

  • General Availability of PredictionEnginePool for scalable deployment
    When deploying an ML model into multi-threaded and scalable .NET Core web applications and services (such as ASP .NET Core web apps, WebAPIs or an Azure Function) it is recommended to use the PredictionEnginePool instead of directly creating the PredictionEngine object on every request due to performance and scalability reasons. For further background information on why the PredictionEnginePool is recommended, read this blog post.

    Sample

  • General Availability of Enhanced for .NET Core 3.0
    This means ML .NET can take advantage of the new features when running in a .NET Core 3.0 application. The first new feature we are using is the new hardware intrinsics feature, which allows .NET code to accelerate math operations by using processor specific instructions.

Bug Fixes

  • Adds reasonable exception when user tries to use OnnxSequenceType attribute without specifing sequence type. (#4272)
  • Image Classification API: Fix processing incomplete batch(<batchSize), images processed per epoch , enable EarlyStopping without Validation Set. (#4289)
  • Exception is thrown if NDCG > 10 is used with LightGbm for evaluating ranking. (##4081)
  • DatabaseLoader error when using attributes (i.e ColumnName). (#4308)
  • Recommendation experiment got SMAC local search exception during training. (#4358)
  • TensorFlow exception triggered: input ended unexpectedly in the middle of a field. (#4314)
  • PredictionEngine breaks after saving/loading a Model. (#4321)
  • Data file locked even after TextLoader goes out of context. (#4404)
  • ImageClassification API should save cache files/meta files in user temp directory or user provided workspace path. (#4410)

Breaking Changes

None

Enhancements

  • Publish latest nuget to public feed from master branch when commits are made. (#4406)
  • Defaults for ImageClassification API. (#4415)

CLI and AutoML API

  • Recommendation Task. (#4246, 4391)
  • Image Classification Task. (#4395)
  • Move AutoML CodeGen to master from feature branch. (#4365)

Remarks

  • None.

1.4.0-preview2 - Oct 09, 2019

New Features

  • Deep Neural Networks Training (0.16.0-preview2)

    Improves the in-preview ImageClassification API further:

    • Early stopping feature stops the training when optimal accuracy is reached (#4237)
    • Enables inferencing on in-memory images (#4242)
    • PredictedLabel output column now contains actual class labels instead of uint32 class index values (#4228)
    • GPU support on Windows and Linux (#4270, #4277)
    • Upgraded TensorFlow .NET version to 0.11.3 (#4205)

    In-memory image inferencing sample
    Early stopping sample
    GPU samples

  • New ONNX Exporters (1.4.0-preview2)

    • LpNormNormalizing transformer (#4161)
    • PCA transformer (4188)
    • TypeConverting transformer (#4155)
    • MissingValueIndicator transformer (#4194)

Bug Fixes

  • OnnxSequenceType and ColumnName attributes together doesn't work (#4187)
  • Fix memory leak in TensorflowTransformer (#4223)
  • Enable permutation feature importance to be used with model loaded from disk (#4262)
  • IsSavedModel returns true when loaded TensorFlow model is a frozen model (#4262)
  • Exception when using OnnxSequenceType attribute directly without specify sequence type (#4272, #4297)

Samples

  • TensorFlow full model retrain sample (#4127)

Breaking Changes

None.

Obsolete API

  • OnnxSequenceType attribute that doesn't take a type (#4272, #4297)

Enhancements

  • Improve exception message in LightGBM (#4214)
  • FeaturizeText should allow only outputColumnName to be defined (#4211)
  • Fix NgramExtractingTransformer GetSlotNames to not allocate a new delegate on every invoke (#4247)
  • Resurrect broken code coverage build and re-enable code coverage for pull request (#4261)
  • NimbusML entrypoint for permutation feature importance (#4232)
  • Reuse memory when copying outputs from TensorFlow graph (#4260)
  • DateTime to DateTime standard conversion (#4273)
  • CodeCov version upgraded to 1.7.2 (#4291)

CLI and AutoML API

None.

Remarks

None.

v1.4.0-preview - Oct 09, 2019

New Features

  • Deep Neural Networks Training (0.16.0-preview) (#4151)

    Improves the in-preview ImageClassification API further:

    • Increases DNN training speed by ~10x compared to the same API in 0.15.1 release.
    • Prevents repeated computations by caching featurized image values to disk from intermediate layers to train the final fully-connected layer.
    • Reduced and constant memory footprint.
    • Simplifies the API by not requiring the user to pre-process the image.
    • Introduces callback to provide metrics during training such as accuracy, cross-entropy.
    • Improved image classification sample.

    Design specification

    Sample

  • Database Loader (0.16.0-preview) (#4070,#4091,#4138)

    Additional DatabaseLoader support:

    • Support DBNull.
    • Add CreateDatabaseLoader<TInput> to map columns from a .NET Type.
    • Read multiple columns into a single vector

    Design specification

    Sample

  • Enhanced .NET Core 3.0 Support

    • Use C# hardware intrinsics detection to support AVX, SSE and software fallbacks
    • Allows for faster training on AVX-supported machines
    • Allows for scoring core ML .NET models on ARM processors. (Note: some components do not support ARM yet, ex. FastTree, LightGBM, OnnxTransformer)

Bug Fixes

None.

Samples

  • DeepLearning Image Classification Training sample (DNN Transfer Learning) (#633)
  • DatabaseLoader sample loading an IDataView from SQL Server localdb (#611)

Breaking Changes

None

Enhancements

None.

CLI and AutoML API

  • AutoML codebase has moved from feature branch to master branch (#3882).

Remarks

None.

v1.3.1 - Aug 06, 2019

New Features

  • Deep Neural Networks Training (PREVIEW) (#4057)
    Introduces in-preview 0.15.1 Microsoft.ML.DNN package that enables full DNN model retraining and transfer learning in .NET using C# bindings for tensorflow provided by Tensorflow .NET. The goal of this package is to allow high level DNN training and scoring tasks such as image classification, text classification, object detection, etc using simple yet powerful APIs that are framework agnostic but currently they only uses Tensorflow as the backend. The below APIs are in early preview and we hope to get customer feedback that we can incorporate in the next iteration.

    DNN stack

    Design specification

    Image classification (Inception V3) sample

    Image classification (Resnet V2 101) sample

  • Database Loader (PREVIEW) (#4035)
    Introduces Database loader that enables training on databases. This loader supports any relational database supported by System.Data in .NET Framework or .NET Core, meaning that you can use many RDBMS such as SQL Server, Azure SQL Database, Oracle, PostgreSQL, MySQL, etc. This feature is in early preview and can be accessed via Microsoft.ML.Experimental nuget.

    Design specification

    Sample

Bug Fixes

Serious

  • SaveOnnxCommand appears to ignore predictors when saving a model to ONNX format: This broke export to ONNX functionality. (3974)

  • Unable to use fasterrcnn onnx model. (3963)

  • PredictedLabel is always true for Anomaly Detection: This bug disabled scenarios like fraud detection using binary classification/PCA. (#4039)

  • Update build certifications: This bug broke the official builds because of outdated certificates that were being used. (#4059)

Other

  • Stop LightGbm Warning for Default Metric Input: Fixes warning, LightGBM Warning Unknown parameter metric= is produced when the default metric is used. (#3965)

Samples

  • Fraud Detection using the anomaly detection PCA trainer

Breaking Changes

None

Enhancements

  • Farewell to the Static API (4009)

  • AVX and FMA intrinsics in Factorization Machine (3940)

CLI and AutoML API

  • Bug fixes.

Remarks

  • Machine Learning at Microsoft with ML.NET is presented at KDD 2019 Proceedings

v1.2.0 - Jul 03, 2019

General Availability

  • Microsoft.ML.TimeSeries

    • Anomaly detection algorithms (Spike and Change Point):
      • Independent and identically distributed.
      • Singular spectrum analysis.
      • Spectral residual from Azure Anomaly Detector/Kensho team.
    • Forecasting models:
      • Singular spectrum analysis.
    • Prediction Engine for online learning
      • Enables updating time series model with new observations at scoring so that the user does not have to re-train the time series with old data each time.

    Samples

  • Microsoft.ML.OnnxTransformer
    Enables scoring of ONNX models in the learning pipeline. Uses ONNX Runtime v0.4.

    Sample

  • Microsoft.ML.TensorFlow
    Enables scoring of TensorFlow models in the learning pipeline. Uses TensorFlow v1.13. Very useful for image and text classification. Users can featurize images or text using DNN models and feed the result into a classical machine learning model like a decision tree or logistic regression trainer.

    Samples

New Features

  • Tree-based featurization (#3812)

    Generating features using tree structure has been a popular technique in data mining. Useful for capturing feature interactions when creating a stacked model, dimensionality reduction, or featurizing towards an alternative label. ML.NET's tree featurization trains a tree-based model and then maps input feature vector to several non-linear feature vectors. Those generated feature vectors are:

    • The leaves it falls into. It's a binary vector with ones happens at the indexes of reached leaves,
    • The paths that the input vector passes before hitting the leaves, and
    • The reached leaves values.

    Here are two references.

    Samples

  • Microsoft.Extensions.ML integration package. (#3827)

    This package makes it easier to use ML.NET with app models that support Microsoft.Extensions - i.e. ASP.NET and Azure Functions.

    Specifically it contains functionality for:

    • Dependency Injection
    • Pooling PredictionEngines
    • Reloading models when the file or URI has changed
    • Hooking ML.NET logging to Microsoft.Extensions.Logging

Bug Fixes

Serious

  • Time series Sequential Transform needs to have a binding mechanism: This bug made it impossible to use time series in NimbusML. (#3875)

  • Build errors resulting from upgrading to VS2019 compilers: The default CMAKE_C_FLAG for debug configuration sets /ZI to generate a PDB capable of edit and continue. In the new compilers, this is incompatible with /guard:cf which we set for security reasons. (#3894)

  • LightGBM Evaluation metric parameters: In LightGbm EvaluateMetricType where if a user specified EvaluateMetricType.Default, the metric would not get added to the options Dictionary, and LightGbmWrappedTraining would throw because of that. (#3815)

  • Change default EvaluationMetric for LightGbm: In ML.NET, the default EvaluationMetric for LightGbm is set to EvaluateMetricType.Error for multiclass, EvaluationMetricType.LogLoss for binary etc. This leads to inconsistent behavior from the user's perspective. (#3859)

Other

  • CustomGains should allow multiple values in argument attribute. (#3854)

Breaking Changes

None

Enhancements

  • Fixes the Hardcoded Sigmoid value from -0.5 to the value specified during training. (#3850)

  • Fix TextLoader constructor and add exception message. (#3788)

  • Introduce the FixZero argument to the LogMeanVariance normalizer. (#3916)

  • Ensembles trainer now work with ITrainerEstimators instead of ITrainers. (#3796)

  • LightGBM Unbalanced Data Argument. (#3925)

  • Tree based trainers implement ICanGetSummaryAsIDataView. (#3892)

  • CLI and AutoML API

    • Internationalization fixes to generate proper ML.NET C# code. (#3725)
    • Automatic Cross Validation for small datasets, and CV stability fixes. (#3794)
    • Code cleanup to match .NET style. (#3823)

Documentation and Samples

  • Samples for applying ONNX model to in-memory images. (#3851)
  • Reformatted all ~200 samples to 85 character width so the horizontal scrollbar does not appear on docs webpage. (#3930, 3941, 3949, 3950, 3947, 3943, 3942, 3946, 3948)

Remarks

  • Roughly 200 Github issues were closed, the count decreased from ~550 to 351. Most of the issues got resolved due to the release of stable API and availability of samples.

v1.1.0 - Jun 04, 2019

New Features

  • Image type support in IDataView
    PR#3263 added support for in-memory image as a type in IDataView. Previously it was not possible to use an image directly in IDataView, and the user had to specify the file path as a string and load the image using a transform. The feature resolved the following issues: 3162, 3723, 3369, 3274, 445, 3460, 2121, 2495, 3784.

    Image type support in IDataView was a much requested feature by the users.

    Sample to convert gray scale image in-Memory | Sample for custom mapping with in-memory using custom type

  • Super-Resolution based Anomaly Detector (preview, please provide feedback)
    PR#3693 adds a new anomaly detection algorithm to the Microsoft.ML.TimeSeries nuget. This algorithm is based on Super-Resolution using Deep Convolutional Networks and also got accepted in KDD'2019 conference as an oral presentation. One of the advantages of this algorithm is that it does not require any prior training and based on benchmarks using grid parameter search to find upper bounds it out performs the Independent and identically distributed(IID) and Singular Spectrum Analysis(SSA) based anomaly detection algorithms in accuracy. This contribution comes from the Azure Anomaly Detector team.

    Algo Precision Recall F1 #TruePositive #Positives #Anomalies Fine tuned parameters

    SSA (requires training) | 0.582 | 0.585 | 0.583 | 2290 | 3936 | 3915 | Confidence=99, PValueHistoryLength=32, Season=11, and use half the data of each series to do the training. IID | 0.668 | 0.491 | 0.566 | 1924 | 2579 | 3915 | Confidence=99, PValueHistoryLength=56 SR | 0.601 | 0.670 | 0.634 | 2625 | 4370 | 3915 | WindowSize=64, BackAddWindowSize=5, LookaheadWindowSize=5, AveragingWindowSize=3, JudgementWindowSize=64, Threshold=0.45

    Sample for anomaly detection by SRCNN | Sample for anomaly detection by SRCNN using batch prediction

  • Time Series Forecasting (preview, please provide feedback)
    PR#1900 introduces a framework for time series forecasting models and exposes an API for Singular Spectrum Analysis(SSA) based forecasting model in the Microsoft.ML.TimeSeries nuget. This framework allows to forecast w/o confidence intervals, update model with new observations and save/load the model to/from persistent storage. This closes following issues 929 and 3151 and was a much requested feature by the github community since September 2018. With this change Microsoft.ML.TimeSeries nuget is feature complete for RTM.

    Sample for forecasting | Sample for forecasting using confidence intervals

Bug Fixes

Serious

  • Math Kernel Library fails to load with latest libomp: Fixed by PR#3721 this bug made it impossible for anyone to check code into master branch because it was causing build failures.

  • Transform Wrapper fails at deserialization: Fixed by PR#3700 this bug affected first party(1P) customer. A model trained using NimbusML(Python bindings for ML.NET) and then loaded for scoring/inferencing using ML.NET will hit this bug.

  • Index out of bounds exception in KeyToVector transformer: Fixed by PR#3763 this bug closes following github issues: 3757,1751,2678. It affected first party customer and also github users.

Other

  • Download images only when not present on disk and print warning messages when converting unsupported pixel format by PR#3625
  • ML.NET source code does not build in VS2019 by PR#3742
  • Fix SoftMax precision by utilizing double in the internal calculations by PR#3676
  • Fix to the official build due to API Compat tool change by PR#3667
  • Check for number of input columns in concat transform by PR#3809

Breaking Changes

None

Enhancements

  • API Compat tool by PR#3623 ensures future changes to ML.NET will not break the stable API released in 1.0.0.
  • Upgrade the TensorFlow version from 1.12.0 to 1.13.1 by PR#3758
  • API for saving time series model to stream by PR#3805

Documentation and Samples

  • L1-norm and L2-norm regularization documentation by PR#3586
  • Sample for data save and load from text and binary files by PR#3745
  • Sample for LoadFromEnumerable with a SchemaDefinition by PR#3696
  • Sample for LogLossPerClass metric for multiclass trainers by PR#3724
  • Sample for WithOnFitDelegate by PR#3738
  • Sample for loading data using text loader using various techniques by PR#3793

Remarks

v1.0.0 - May 03, 2019

ML.NET is now 1.0.0. 🍰

This is our stable API. In this final sprint we have worked mainly on improving the documentation. Please let us know what you like about ML.NET and what we can improve to make your use of machine learning easier in .NET. With this release we are committed to staying backward compatible.

Release Notes Download and Install

v1.0.0-preview - Apr 03, 2019

This is the RC1 release for ML.NET version 1.0.0. The work on the API project has been concluded. The focus before releasing version 1.0.0 would be to enhance documentation and samples as well as addressing any critical issues. Please note that NuGets have now 1.0.0-preview as well as 0.12.0-preview versions depending on which one will become stable release. Also IDataView is now in Microsoft.ML namespace. As always thank you so much for being an awesome community of Machine Learning enthusiasts.

Release Notes Download and Install

v0.11.0 - Mar 06, 2019

A lot more API clean up as well as many fixes are packed in this release! We are quickly approaching RC1 release for ML.NET and our first priority is to complete the API related work. Thank you for being patient while we get closer to our stable surface. We are super excited to work through the remaining issues and ship V0.1. In fact we are so excited that in the release notes was mentioned that FastTree has a new package now. That is partially true as you can see in our nightly builds but 0.11 still does not have a separate package. oh well! :)

Release Notes Download and Install

v0.10.0 - Feb 05, 2019

More API clean up as well as many fixes are in this release. We are preparing for our stable API in 1.0 release and greatly appreciate the community feedback and engagement. Please note that IDataView is now in Microsoft.Data.DataView #2220. Also please note that #2239 has changed the order of parameters and your existing code needs to be updated.

Release Notes Download and Install

v0.9.0 - Jan 09, 2019

This release brings many fixes as well as significant API clean up. We have removed the API that was marked obsolete. Explainability features of ML.NET have also got some improvements as originally planned. Thanks to all the great support as we improve the API for 1.0 release.

Release Notes Download and Install

v0.8.0 - Dec 04, 2018

ML.NET 0.8 is here with some very exciting features. Explainability, stateful time series, implicit feedback in recommendations and better debuggability as well as many bug fixes are included in this release. Please note that the legacy API has been marked obsolete and will be removed in the next release. Many thanks to the awesome users and community contributors for your continuous support.

Release Notes Download and Install

v0.7.0 - Nov 06, 2018

ML.NET 0.7 brings multiple enhancements such as anomaly detection, matrix factorization, x86 builds, as well as custom transforms. We continue to refine our API with many exciting extensions. Thanks to everyone for your massive support and contributions in this release.

Release Notes Download and Install

v0.6.0 - Oct 02, 2018

ML.NET 0.6 is a milestone release as it introduces the new API. There is major work to improve the internals of the library. This release is also bringing massive performance gains in predictions. We are also excited to introduce ONNX transform.

Release Notes Download and Install

Known issues:

[The Trainers list for training context is empty] (https://github.com/dotnet/machinelearning/issues/1054)

v0.5.0 - Sep 05, 2018

ML.NET 0.5 makes TensorFlow transform available. We continue to work towards new API. We have also introduced many enhancements and bug fixes. Our sincere gratitude to all the amazing users and contributors.

Release Notes Download and Install

v0.4.0 - Aug 07, 2018

ML.NET 0.4 is the first release where we have started an overall work on API for ML.NET. We are working on a much more flexible and extensible API. We have also introduced enhancements including new word-embeddings transform and SymSGD learner. Many thanks to all the amazing help from the contributors and users.

Release Notes Download and Install

v0.3.0 - Jul 03, 2018

ML.NET 0.3 introduces many enhancements including Field-Aware Factorization Machines, LightGBM, Ensembles, LightLDA transform, OVA and support for ONNX. Many thanks to our users and contributors for amazing help with filing issues and submitting PRs and engaging on gitter.

Release Notes Download and Install

Known issues:

LightGBM doesn't work during F5 of a .NET Core application

v0.2.0 - Jun 05, 2018

ML.NET 0.2 is the second preview release of the package. It addresses many issues and adds a few enhancement including addition of clustering. Thank you for helping shape the package by using its many learners and engaging through submitting issues and pull requests on GitHub.

Release Notes Download and Install

v0.1.0 - May 07, 2018

ML.NET 0.1 is the first preview release of ML.NET. Thank you for trying it out and we look forward to your feedback! Try training, scoring, and using machine learning models in your app and tell us how it goes.

Release Notes Download and Install

Library Stats (Sep 20, 2022)

Subscribers: 611
Stars: 8.1K
Forks: 1.8K
Issues: 683

dotnet-sshdeploy

here, otherwise you are in the right place

dotnet-sshdeploy

GraphQL Dotnet Parser

This library contains a lexer and parser classes as well as the complete GraphQL AST model

GraphQL Dotnet Parser

This dotnet extension is designed to clean-up the NuGet cache

(hopefully) temporary workaround for the @dotmorten as outlined in

This dotnet extension is designed to clean-up the NuGet cache

Dotnet client for Tarantool NoSql database

Some methods are not implemented yet because there are no direct analogs in IProto

Dotnet client for Tarantool NoSql database

dotnet-coverageconverter

coverage (binary format) files to

dotnet-coverageconverter

dotnet-stellar-sdk Stellar API SDK for

Report Bug · Report Security Vulnerability

dotnet-stellar-sdk Stellar API SDK for

dotnet-jwk is a JSON Web Key manager for dotnet

It allow to generate, encrypt, decrypt, convert and check JWK

dotnet-jwk is a JSON Web Key manager for dotnet

dotnet add package Brighid

Protecting the Client Secret

dotnet add package Brighid

dotnet-real-time-chat

A real time chat using C# dotnet and RabbitMQ

dotnet-real-time-chat
dotnet tool install --global dotnet-extract