atifaziz/csharpminifier

CSharpMinifier filters comments and unnecessary whitespace from valid C#

source code in order to arrive at a compressed form without changing the

CSharpMinifier filters comments and unnecessary whitespace from valid C#

C# Minifier

CSharpMinifier filters comments and unnecessary whitespace from valid C# source code in order to arrive at a compressed form without changing the behaviour of the code. Unlike JavaScript minifiers, the goal is not to reduce the download size or parsing effort. Instead, it is best used for computing hashes or digests for the purpose of detecting potentially useful as opposed to any physical changes.

It is available as a .NET Standard Library as well as a .NET Core console application that can be installed as a global tool.

It is a minifier but not an obfuscator or an uglifier; that is, private details like local variable names are not abbreviated.

Before minification:

After minification:

A monitor that actively watches source files for changes and triggers a re-compilation of binaries would be one application that could benefit from minification of sources. The monitor could compute a hash from the minified sources and only trigger a re-compilation if the hashes of the minified and original versions no longer compare equal (instead of whenever a physical change occurs). Minification removes C# comments and extraneous whitespace in the source as they do not constitute a useful change that could affect behaviour of the code at run-time.

Minification

CSharpMinifier assumes that the input C# source is syntactically sound; otherwise the results are undefined (even if they may appear to resemble some defined behaviour).

For the purpose of minification, especially that of whitespace and comments, CSharpMinifier needs to ensure that it does not confuse, for example, a comment appearing in a string or a commented-out string. Therefore it parses some minimal grammar of a C# source, such as:

  • Horizontal whitespace (space or tab)
  • New-line sequences like CR, LF or CRLF
  • Single-line comments, staring with // and until a new-line sequence or end of input
  • Multi-line comments; that is, everything between /* and */
  • Pre-processor directives
  • Strings literals of all sorts:
    • regular e,g, , e.g. "..."
    • verbatim, e.g. , e.g. @"..."
    • interpolated, e.g. $"..."
    • interpolated verbatim, e.g. [email protected]"..." (or @$"..." starting with C# 8)

Everything surrounding or in-between the above is treated as raw and unparsed text. As a consequence, the C# source does not have to be a full C# program or library code. You can minify C# snippets like scripts and expressions as long as they are syntactically valid.

All whitespace within the following type of lexical tokens is maintained:

  • string literals
  • pre-processor directive text

CSharpMinifier preserves all pre-processor directives except #region and #endregion. These are filtered but content in-between is subject to minification.

Horizontal (e.g. space and tab) and vertical (e.g. carriage-return and line-feed) whitespace is eliminated in all cases except:

  • a single horizontal space is maintained between words and some some obscure cases of operators (e.g. x = i+++ +2) to prevent minification from introducing ambiguities.
  • a pre-processor directive; it must appear on a line of its own so it is succeeded by a new-line sequence.

CSharpMinifier provides offset, line and column information about all lexical tokens it recognizes.

The scanner/parser is hand-written. It does not use Roslyn so it is extremely lightweight to use and in processing. It is implemented as a simple state machine and practically makes no heap allocations.

While CSharpMinifier does detect some syntactic errors, like unterminated strings and comments, and reports them through raised exceptions, there should be no expectation that such checks will be maintained in future versions. After all, as stated earlier, the input C# source is expected to be syntactically correct and all results otherwise are undefined.

The minification process can be customized through options to prevent the following source code (single- or multi-line) comments from being subject to minification:

  • those matching a user-defined regular expression pattern
  • those that appear at the start of the source (e.g. typically copyright, notices, terms and conditions)
  • those that are marked important, either /*! ... */ or //! ...

For C# syntax validation, consider using CSharpSyntaxValidator.

Installation

To install the library for use in a project, do either:

nuget install CSharpMinifier

or for projects based on .NET Core SDK:

dotnet add package CSharpMinifier

See also the various installation instructions on the library package page on nuget.org

To install the command-line application as .NET Core global tool, do:

dotnet tool install -g CSharpMinifierConsole

Usage

Suppose the following C# source text is loaded in a string variable called source:

To minify, do:

Alternatively, to analyse and visit the tokens identified by the scanner:

The output from the above will be:

Command-Line Usage

csmin < example.cs > example.min.cs

For more information, run csmin help or see the on-line documentation.

Building

The .NET Core SDK is the minimum requirement.

To build just the binaries on Windows, run:

.\build.cmd

On Linux or macOS, run instead:

./build.sh

To build the binaries and the NuGet packages on Windows, run:

.\pack.cmd

On Linux or macOS, run instead:

./pack.sh

Testing

To exercise the unit tests on Windows, run:

.\test.cmd

On Linux or macOS, run:

./test.sh

This will also build the binaries if necessary.

Issues

Quick list of the latest Issues we found

atifaziz

atifaziz

Icon For Comments0

Consider the following breaking change introduced with C# 11:

Format specifiers can't contain curly braces

Introduced in .NET SDK 6.0.200, Visual Studio 2022 version 17.1.

Format specifiers in interpolated strings can’t contain curly braces (either { or }). In previous versions {{ was interpreted as an escaped { and }} was interpreted as an escaped } char in the format specifier. Now the first } char in a format specifier ends the interpolation, and any { char is an error. This makes interpolated string processing consistent with the processing for xref:System.String.Format%2A?displayProperty=nameWithType:

X is the format for uppercase hexadecimal and C is the hexadecimal value for 12.

The workaround is to remove the extra braces in the format string.

You can learn more about this change in the associated roslyn issue.

atifaziz

atifaziz

bug
Icon For Comments1

The C# language allows skipped sections of conditional compilation directives to contain any text that does not need to follow the C# grammar. More specifically:

The remaining conditional_sections, if any, are processed as skipped_sections: except for pre-processing directives, the source code in the section need not adhere to the lexical grammar; no tokens are generated from the source code in the section; and pre-processing directives in the section must be lexically correct but are not otherwise processed. Within a conditional_section that is being processed as a skipped_section, any nested conditional_sections (contained in nested #if...#endif and #region...#endregion constructs) are also processed as skipped_sections.

Feeding the following example from the C# specification to CSharpMinifier's Scanner:

ends in the syntax error, “Unterminated multi-line comment The syntax error is at line 9 and column 60 (or offset 218). The last anchor was at line 9 and column 13 (or offset 170)”, whereas the C# compiler simply ignores the unterminated comment.

Taking another obscure example from the specification:

gives the following tokens:

Token Kind Text

Preprocessor directive #if X

New line <EOL>

White space

Multi line comment /<EOL> #else<EOL> / */

White space

Text class

White space

Text Q

White space

Text {

White space

Text }

New line <EOL>

White space

Preprocessor directive #endif

New line <EOL>

White space

It should however:

always produces the same token stream (class Q { }), regardless of whether or not X is defined. If X is defined, the only processed directives are #if and #endif, due to the multi-line comment. If X is undefined, then three directives (#if, #else, #endif) are part of the directive set.

This requires adding processing and interpretation of conditional compilation directives so the scanner can deliver skipped sections as a token in their own right.

Versions

Quick list of the latest released versions

v1.3.0 - May 26, 2020

Library NuGet Package Tool NuGet Package

This release only contains an update for the console application (tool). It drops support for .NET 2.2 and 3.0 that have reached end of life. It now requires the long-term releases only: .NET Core 3.1 or 2.1. Other than that, it has no functional changes.

The library remains at its latest version of 1.2.1.

See also:

v1.2.1 - Nov 29, 2019

Library NuGet Package Tool NuGet Package

This is a minor bug fix release that contains the fix for bug reported in issue #12.

See also:

v1.2.0 - Sep 27, 2019

Library NuGet Package Tool NuGet Package

This release primarily adds support for interpolated verbatim strings that, starting with C# 8, can be expressed as @$"..." in addition to [email protected]"..." (which was the only allowed form when they were originally introduced).

The console tool version used to require the .NET Core 2.1 run-time but now supports 2.2 as well as 3.0.

See also:

v1.1.1 - May 24, 2019

Library NuGet Package Tool NuGet Package

This is bug fix release for the console tool, where the hash command would ignore the --keep-important-comment option. There are no changes to the library, which remains at version 1.1.0.

See also:

v1.1.0 - May 22, 2019

Library NuGet Package Tool NuGet Package

What's New

The minification process can now be customized through options to prevent the following source code (single- or multi-line) comments from being subject to minification:

  • those matching a user-defined regular expression pattern
  • those that appear at the start of the source (e.g. typically copyright, notices, terms and conditions)
  • those that are marked important, either /*! ... */ or //! ...

See the new CSharpMinifier.Minifier.Minify overloads that accept a MinificationOptions. The command-line version has three new options for controlling filtering of comments for the min and the hash commands:

See also:

v1.0.3 - May 21, 2019

Library NuGet Package Tool NuGet Package

This release primarily contains a fix for issue #3.

See also:

v1.0.2 - Mar 28, 2019

Library NuGet Package Tool NuGet Package

Library Changes

Pre-process directive recognition is tightened, even in the face of some invalid C#. See #1 for more details.

Console Utility Changes

Apart from picking up on improving on #1, the version display in the help text is much more SemVer-ish (i.e. 3-part version number where it can be helped). For example, with 1.0.1, the first line of display when running csmin -h read:

CSharpMinifierConsole (version 1.0.1.0)

After upgrading to this release, the same line will trim the last zero:

CSharpMinifierConsole (version 1.0.2)

v1.0.1 - Mar 27, 2019

Library NuGet Package Tool NuGet Package

Library Changes

Token.Traits is marked obsolete. Use GetTraits or HasTraits extensions for a TokenKind instead.

Console Utility Changes

Fixed help text inaccuracies for various commands.

v1.0.0 - Mar 26, 2019

Library NuGet Package Tool NuGet Package

This is the initial release! 🚀

Library Stats (Sep 18, 2022)

Subscribers: 2
Stars: 30
Forks: 2
Issues: 4

NET Core Extensions and Helper NuGet packages

A simple and fast (fastest?) object to object mapper that does not use reflection

NET Core Extensions and Helper NuGet packages

# Azure Artifacts Credential Provider

The Azure Artifacts Credential Provider automates the acquisition of credentials needed to restore NuGet packages as part of your

# Azure Artifacts Credential Provider

Search for Nuget packages using the

Search including prerelease

Search for Nuget packages using the

Extensions is a series of NuGet packages designed to encapsulate common developer tasks associated with...

Extensions is a series of NuGet packages designed to encapsulate common developer tasks associated with building multi-platform mobile, desktop and web applications using the Uno...

Extensions is a series of NuGet packages designed to encapsulate common developer tasks associated with...
CLI

1.2K

C# REPL is a tool for rapid experimentation and exploration of C# expressions, statements, and...

C# REPL is a tool for rapid experimentation and exploration of C# expressions, statements, and NuGet packages

C# REPL is a tool for rapid experimentation and exploration of C# expressions, statements, and...
Utils, helpers, extensions and attributes to work with Basic authentication
Utils, helpers, extensions and attributes to work with JWT authentication
NuGet packages available on the releases page and on www
NuGet packages available on the releases page and on www

Themes is the repo for add-ons NuGet packages that can be added to any new...

Uno Cupertino library is designed to help you use Uno Gallery repository

Themes is the repo for add-ons NuGet packages that can be added to any new...

Authentication enables an application to support Swedish BankID (svenskt BankID) authentication in

Built on NET Standard and packaged as NuGet-packages they are easy to install and use on multiple platforms

Authentication enables an application to support Swedish BankID (svenskt BankID) authentication in