Wednesday, 23 April 2014

Reactive Extensions Observables Versus Regular .NET Events Part 1

Since it’s original release, .NET has provided it’s own support for raising events and subscribing to them. This is essentially an implementation of the “Observer pattern”, and for the most part works well, although the way it was implemented was less than ideal.

The shortcomings include the need to copy the event handler to a temporary variable before raising it to avoid a nasty (but rare) race condition, not being able to pass events as objects (resulting in hacky workarounds in mocking and unit testing frameworks to deal with them), and the rather cumbersome standard event handler function signature, which has the “sender” object as the first parameter (requiring you to cast it to the concrete type you already know it is if you want to make use of it), and the arguments property needing to derive from a rather pointless EventArgs base class.

But there is another option, which doesn’t require you to create your own custom implementation of the observer pattern, and that is to make use of the Reactive Extensions (Rx) framework. This has been around for some time now, but perhaps hasn’t gained the attention it deserves. Rx essentially provides a much more powerful way of working with events.

In this post I’ll show how you can take .NET code that both raises and consumes regular .NET events and convert it to using Rx. And I plan to follow up with a future post that looks at the benefits of making this transition.

Raising .NET Events

First, let’s look at how you raise events the standard .NET way. This simple demo class has a Start method that kicks off a background task. It raises two types of event – Progress events, and a Finished event when it’s done which can contain an exception if something went wrong.

class EventDemo
{
    public event EventHandler<ProgressEventArgs> Progress;
    public event EventHandler<FinishedEventArgs> Finished;

    protected virtual void OnFinished(FinishedEventArgs e)
    {
        var handler = Finished;
        if (handler != null) handler(this, e);
    }

    protected virtual void OnProgress(ProgressEventArgs e)
    {
        var handler = Progress;
        if (handler != null) handler(this, e);
    }

    public void Start()
    {
        Task.Run((Action)DoStuff);
    }

    private void DoStuff()
    {
        try
        {
            foreach (var n in Enumerable.Range(1, 10))
            {
                Thread.Sleep(1000);
                OnProgress(new ProgressEventArgs(n));
            }
            OnFinished(new FinishedEventArgs());
        }
        catch (Exception exception)
        {
            OnFinished(new FinishedEventArgs(exception));
        }
    }
}

You can see here that we’ve created a couple of helper functions (OnFinished and OnProgress) to avoid the race condition issue. We also needed to create a couple of event argument classes:

class ProgressEventArgs : EventArgs
{
    public ProgressEventArgs(int progress)
    {
        Progress = progress;
    }

    public int Progress { get; private set; }
}

class FinishedEventArgs : EventArgs
{
    public FinishedEventArgs(Exception exception = null)
    {
        Exception = exception;
    }

    public Exception Exception { get; private set; }
}

Subscribing to .NET events

To subscribe to the events raised by this class, we simply use the += operator and give it either the name of the event handler method or a lambda function to run. Here we use the first technique, since it makes unsubscribing easier if we need to do that later.

class EventConsumer
{
    private readonly EventDemo eventDemo;
    public EventConsumer()
    {
        eventDemo = new EventDemo();
        eventDemo.Progress += OnProgress;
        eventDemo.Finished += EventDemoOnFinished;
        eventDemo.Start();
    }

    private void EventDemoOnFinished(object sender, FinishedEventArgs finishedEventArgs)
    {
        Console.WriteLine("Finished {0}", finishedEventArgs.Exception == null ? "success" : finishedEventArgs.Exception.Message);
    }

    private void OnProgress(object sender, ProgressEventArgs progressEventArgs)
    {
        Console.WriteLine("Progress {0}", progressEventArgs.Progress);
    }
}

Raising Rx Events

Now let’s have a look at how we might implement the same thing, but using Reactive Extensions instead. We might be tempted to think that the Rx version of our class should mirror as closely as possible the .NET version, so we’d add two IObservable properties, which were the equivalent of the Progress and Finished events. The easy way to create IObservables is to make instances of the Rx Subject class , and the “events” can be fired by calling the OnNext method on the subject. The event argument can be any object – there is no need to make it inherit from EventArgs.

class ObservableDemoNaive
    {
        private readonly Subject<Progress> progressSubject;
        private readonly Subject<Finished> finishedSubject;

        public ObservableDemoNaive()
        {
            progressSubject = new Subject<Progress>();
            finishedSubject = new Subject<Finished>();
        }

        public IObservable<Progress> Progress { get { return progressSubject; } }

        public IObservable<Finished> Finished { get { return finishedSubject; } }

        public void Start()
        {
            Task.Run(() => DoStuff());
        }

        private void DoStuff()
        {
            try
            {
                foreach (var n in Enumerable.Range(1, 10))
                {
                    Thread.Sleep(1000);
                    progressSubject.OnNext(new Progress(n));
                }
                finishedSubject.OnNext(new Finished());
            }
            catch (Exception exception)
            {
                finishedSubject.OnNext(new Finished() { Exception = exception});
            }
        }
    }

Whilst this works,  it isn’t a particularly good example of using Reactive Extensions. For starters, the Finished event is trying to solve a problem that Rx already solves for us. An observable can tell us when it is completed, as well as when it has errored. So we can simplify this to work with a single Observable:

class ObservableDemoBetter
{
    private readonly Subject<Progress> progressSubject;

    public ObservableDemoBetter
    {
        progressSubject = new Subject<Progress>();
    }

    public IObservable<Progress> Progress { get { return progressSubject; } }

    public void Start()
    {
        Task.Run(() => DoStuff());
    }

    private void DoStuff()
    {
        try
        {
            foreach (var n in Enumerable.Range(1, 10))
            {
                Thread.Sleep(1000);
                progressSubject.OnNext(new Progress(n));
            }
            progressSubject.OnCompleted();
        }
        catch (Exception exception)
        {
            progressSubject.OnError(exception);
        }
    }
}

Note that the OnCompleted method of the Subject class doesn’t take any parameters. So if your original FinishedEventArgs contained additional data, a different way of passing it back would be needed.

However, in making this change, we’ve actually modified the behaviour. An observable sequence can only complete once, so if you were to call Start twice, you’d not see any results after the first sequence finished. In fact, this probably highlights a design flaw in our original class – if two operations are going on simultaneously, you have no idea which one the Progress and Finished events relate to.

This can be solved by making our Start method return the IObservable stream of progress events for the particular operation that was started:

class ObservableDemo
{
    public IObservable<Progress> Start()
    {
        var subject = new Subject<Progress>();
        Task.Run(() => DoStuff(subject));
        return subject;
    }

    private void DoStuff(Subject<Progress> subject)
    {
        try
        {
            foreach (var n in Enumerable.Range(1, 10))
            {
                Thread.Sleep(1000);
                subject.OnNext(new Progress(n));
            }
            subject.OnCompleted();
        }
        catch (Exception exception)
        {
            subject.OnError(exception);
        }
        finally
        {
            subject.Dispose();
        }
    }
}

Now we have a much cleaner, simpler interface, and there can be no confusion about which background task is raising the events if we call Start twice.

Subscribing to Rx Events

Subscribing to Rx events is nice and simple. You can provide an action that is called for each event, and optionally for when it completes successfully or with an error as shown here:

var observable = eventDemo.Start();
observable.Subscribe(
    p => Console.WriteLine("Progress {0}", p.Value),
    e => Console.WriteLine("Error {0}", e.Message),
    () => Console.WriteLine("Finished success"));

As can be seen, the Rx versions of both the event producer and consumer code are slightly more succinct than the equivalent regular .NET event code.

The Story So Far

So far we’ve seen that it is fairly straightforward to switch from .NET events to Rx Observables, but you may want to consider changing how you work with them to fit more nicely with the way Rx works. In the next post we’ll look at other benefits that Observables bring to the table such as better unsubscribing, and the ability to manipulate and filter the events with a LINQ-style syntax, as well as seeing how to turn existing .NET events into Observables.

If you want to play with Reactive Extensions yourself, the easy way is to simply add the NuGet package to your project.

Monday, 14 April 2014

Roundtrip Serialization of DateTimes in C#

Say you need to serialize a DateTime object to string (perhaps for an XML or JSON file), and want to ensure that you get back exactly the same time you serialized. Well, you might think that you can just use DateTime.ToString and DateTime.Parse:

var testDate = DateTime.Now;
var serialized = testDate.ToString();
Console.WriteLine("Serialized: {0}",serialized);
var deserialized = DateTime.Parse(serialized);
Console.WriteLine("Deserialized: {0}",deserialized);

// prints:
Serialized: 14/04/2014 16:32:00
Deserialized: 14/04/2014 16:32:00

At first glance it seems to work, but actually this is a very bad idea. What’s wrong with this approach? Well two things. For starters, the millisecond component of the time has been lost. More seriously, you are entirely dependent on the deserializing system’s short date format being the same as the serializing system’s. For example, if you serialize in a UK culture with a short date format of dd/mm/yyyy and then deserialize in a US culture with a short date format of mm/dd/yyyy you can end up with completely the wrong date (e.g. 1st of March becomes the 3rd of January).

So how can you ensure you get exactly the same DateTime object back that you serialized? Well the answer is to use ToString with the “O” format specifier which is called the “roundtrip format specifier”. This will not only include the milliseconds, but also ensures that the DateTime.Kind property is represented in the string (under the hood a .NET DateTime is essentially a tick count plus a DateTimeKind).

Here’s three dates formatted with the roundtrip format specifier, but each with a different DateTime.Kind value:

Console.WriteLine("Local: {0}", DateTime.Now.ToString("O"));
Console.WriteLine("UTC: {0}", DateTime.UtcNow.ToString("O"));
var unspecified = new DateTime(2014,3,11,16,13,56,123,DateTimeKind.Unspecified);
Console.WriteLine("Unspecified: {0}", unspecified.ToString("O"));

// prints:
Local: 2014-04-14T16:36:01.5305961+01:00
UTC: 2014-04-14T15:36:01.5345961Z
Unspecified: 2014-03-11T16:13:56.1230000

As you can see, UTC times end in Z, local times include the current offset from UTC, and unspecified times have no suffix. You can also see that we have fractions of a second to 7 decimal places. If you’ve serialized your DateTime objects like this, you can then get them back to exactly what they were before by using DateTime.Parse, and passing in the DateTimeStyles.RoundTripKind argument:

DateTime.Parse("2014-03-11T16:14:27.8660648+01:00", CultureInfo.InvariantCulture, DateTimeStyles.RoundtripKind);

The only thing to note here is that the serialization of a local time includes the offset from UTC. Which means that if you deserialize it in a different timezone, you will get the correct local time for that timezone. Which will of course be a different local time to the one originally serialized This is almost certainly what you want, but it does mean that technically, the binary representation of the deserialized DateTime object is not identical to the one that was serialized (different number of ticks).

Wednesday, 9 April 2014

Thoughts on Windows RT Audio APIs and BUILD 2014

One of the sessions I was most excited about at BUILD this year was Jason Olson and Pete Brown’s session on Sequencers, Synthesizers, and Software, Oh My! Building Great Music Creation Apps for Windows Store.

I had been hoping before the conference that we might see MIDI support for WinRT...

...and I wasn't disappointed. The talk covered a brand new MIDI API for Windows RT (requires 8.1 Update 1), as well as giving some guidance on low latency WASAPI usage.

Let's look first at the MIDI support…

Basically, it allows you to access MIDI in and out devices (MIDI over USB), receiving and sending messages. This is great news, and enables the creation of several types of application not previously possible in Windows Store apps (I’d wanted to make a software synthesizer for quite some time, so that’s first on my list now).

It’s distributed as a NuGet package allowing access from C#. This is huge in my opinion. One of the most painful things about doing audio work in C# has been endlessly creating wrappers for COM based APIs such as WASAPI and Media Foundation. It's really hard to get right in C# and you can run into all kinds of nasty threading issues. I had hoped when WinRT originally came out that it would wrap WASAPI and MediaFoundation in a way that eliminated the need for all the interop in NAudio, but sadly that was not to be the case. But this represents a great step in the right direction, and I hope it becomes standard.

What can't it do:

It doesn't read or write MIDI files, but that's not a particularly big deal to me. NAudio and MIDI.NET both can do that.

It doesn't provide a MIDI sequencer. You'd need to schedule out your MIDI out events at exactly the right time to play back a MIDI file from your app.

It doesn't offer up a software synthesizer as a MIDI out. That would be a great addition, especially if you could also load the software synthesizer up with SoundFonts.

There is no RTP MIDI support but they suggested it might be a feature under consideration. This would be useful for wireless MIDI, and could also form the basis for sending MIDI streams between applications which could also be useful.

And now onto WASAPI

There was some really useful information in the session on how to set up your audio processing to work at a pro audio thread priority, and how to use a “raw mode” to avoid global software based effects (e.g. a graphic equaliser) introducing additional latency. Getting good information on WASAPI is hard to come by, so it was good to see recommendations as well on using event driven mode and when to use exclusive mode.

What I would love to see with WASAPI is exactly the same approach as has been taken with MIDI. Give us a NuGet package that presents both the Render and Capture devices in a way that can be used from C#. Let me easily open an output device, specify if I need pro audio thread priority, raw mode, exclusive mode, and then give me a callback to fill the output buffer on demand with new samples. Ideally it should incorporate resampling too, as it is a huge pain to resample yourself if you are playing a 44.1kHz file while your soundcard is operating at 48kHz.

Likewise with capture. Let me easily open any capture device (or loopback capture), and get a callback whenever the next audio buffer is received containing the raw sample data. This might be possible already with the MediaCapture class but I'm unclear how you get at the raw samples (somehow create a custom sink?). Again, I’d like to be able to specify my capture sample rate and have the API resample on the fly for me, as often with voice recordings, 16kHz is quite acceptable. WASAPI is much harder to use than the legacy waveIn APIs in this regard.

Now Jason Olson did provide a link to a cool sample application on GitHub, showing the use of these APIs in C++. However, there seems to be a general assumption that if you're using WASAPI you are going to be working in C++. This is probably true for pro audio music applications. Despite having invested a huge amount of my time over the last 12 years in creating a C# audio library, I still recommend to people needing the lowest possible latency that they should consider C++.

But I also know that there is a huge amount of audio related development that does not require super low latency, particularly when dealing with voice recordings such as telephony, radio or VOIP. Recording and replaying voice is something that many businesses want to create software for, and a C# API greatly simplifies the development process. Both my experience from supporting NAudio, and having worked a decade in the telecoms industry tells me that there is a huge market for applications that deal with speech in various formats.

Finally, I'd also like to see Microsoft give some thought to creating an audio buss that allows WinRT apps to transmit real-time audio between themselves, allowing you to link together virtual synthesizers with effects. This could form the basis of solving the audio plugin problem for DAW-like applications in the Windows store.

Anyway, they repeatedly asked for feedback during the talk, so that’s the direction I’d like to see WinRT audio APIs going in. Seamless support for using audio capture and render devices from C# with transparent resampling on playback and record.

Wednesday, 2 April 2014

Stop Making Monoliths

Martin Fowler and James Lewis recently published an excellent article on microservices, which describes a new approach to architecting large enterprise systems, by composing them out of many independently deployable services, rather than producing a "monolith", which is usually one giant application that must be deployed/installed in one hit, and usually runs as a single process.

What's wrong with monoliths?

The trouble with monoliths is they become increasingly complex and unwieldy as the project grows larger. Despite our best intentions they usually become very tightly coupled, meaning components cannot be replaced or reused. Performance also becomes an issue, and there is no easy way to scale this approach other than to buy a more powerful server.

Why do we build them?

So why do we keep making monoliths? Two reasons. The first is simply because we can. I wrote a while ago about the fact that the vast increase in computing power has enabled us to build applications that are bigger than they ought to be.

The second reason is because it is the path of least resistance. It's far easier to put new code into the existing service, than to create a new service, which will require us to set up a new build and deployment process. So with a microservice architecture, you have to make it really easy to create a new service, or developers will end up bloating the existing services with additional features. Once the friction of creating new services is eliminated, you can begin to migrate away from your monolithic architecture.

What are the benefits of microservices over monoliths?

What benefits do microservices give over a monolithic architecture? Let me briefly mention two.

First microservices gives us real decoupling. Components in microservice architectures are actually replaceable rather than theoretically replaceable. The service boundary forces a radical decoupling of parts of our application, that is hard to enforce in a monolithic codebase, as it is too easy to circumvent (deliberately or unintentionally).

Second, it gives freedom to adopt new technology. Monoliths typically have to be built with one programming language, one database access technology, one dependency injection framework etc. It makes it easy to get stuck on legacy technology. But a microservice architecture allows you to incrementally improve your application to make use of better technologies, tools and practices.

There are plenty more benefits (not least the implications for scalability). But I'm hoping to blog more about other aspects of microservices, so I'll save them for next time.

Friday, 28 March 2014

How to record and play audio at the same time with NAudio

Quite often I get questions from people who would like to play audio that they are receiving over the network, or recording from the microphone, but also want to save that audio to WAV at the same time. This is actually quite easy to achieve, so long as you think in terms of a “signal chain” (which is something I talk about a lot in my audio courses on Pluralsight).

Basically, the usual strategy I recommend for playing audio that you receive over the network or from the microphone is to put it into a BufferedWaveProvider. You fill it with (PCM) audio as it becomes available, and then in its Read method, it returns the audio, or silence if the buffer is empty.

Normally, you’d pass the BufferedWaveProvider directly to the IWavePlayer device (such as WaveOut, but to implement save to WAV, we’ll first wrap it in a new signal chain component that we’ll create for this purpose. We’ll call it “SavingWaveProvider” and it will implement IWaveProvider. In it’s Read method, it will read from it’s source wave provider (the BufferedWaveProvider) in our case, and write to a WAV file before we pass it on.

We’ll dispose the WaveFileWriter if we read 0 bytes from the source wave provider, which should normally indicate we have reached the end of playback. But we also make the whole class Disposable, since BufferedWaveProvider is set up to always return the number of bytes we asked for in Read, so it will never reach the end itself.

Here’s the code for SavingWaveProvider:

class SavingWaveProvider : IWaveProvider, IDisposable
{
    private readonly IWaveProvider sourceWaveProvider;
    private readonly WaveFileWriter writer;
    private bool isWriterDisposed;

    public SavingWaveProvider(IWaveProvider sourceWaveProvider, string wavFilePath)
    {
        this.sourceWaveProvider = sourceWaveProvider;
        writer = new WaveFileWriter(wavFilePath, sourceWaveProvider.WaveFormat);
    }

    public int Read(byte[] buffer, int offset, int count)
    {
        var read = sourceWaveProvider.Read(buffer, offset, count);
        if (count > 0 && !isWriterDisposed)
        {
            writer.Write(buffer, offset, read);
        }
        if (count == 0)
        {
            Dispose(); // auto-dispose in case users forget
        }
        return read;
    }

    public WaveFormat WaveFormat { get { return sourceWaveProvider.WaveFormat; } }

    public void Dispose()
    {
        if (!isWriterDisposed)
        {
            isWriterDisposed = true;
            writer.Dispose();
        }
    }
}

And here’s how you use it to both play and save audio at the same time (note this is very simplified WPF app with two buttons and no checks that you don’t press Start twice in a row etc). The key is that we pass the BufferedWaveProvider into the constructor of the SavingWaveProvider, and then pass that to waveOut.Init. Then all we need to do is make sure we dispose SavingWaveProvider so that the WAV file header gets written correctly:

public partial class MainWindow : Window
{
    private WaveIn recorder;
    private BufferedWaveProvider bufferedWaveProvider;
    private SavingWaveProvider savingWaveProvider;
    private WaveOut player;

    public MainWindow()
    {
        InitializeComponent();
    }

    private void OnStartRecordingClick(object sender, RoutedEventArgs e)
    {
        // set up the recorder
        recorder = new WaveIn();
        recorder.DataAvailable += RecorderOnDataAvailable;

        // set up our signal chain
        bufferedWaveProvider = new BufferedWaveProvider(recorder.WaveFormat);
        savingWaveProvider = new SavingWaveProvider(bufferedWaveProvider, "temp.wav");

        // set up playback
        player = new WaveOut();
        player.Init(savingWaveProvider);

        // begin playback & record
        player.Play();
        recorder.StartRecording();
    }

    private void RecorderOnDataAvailable(object sender, WaveInEventArgs waveInEventArgs)
    {
        bufferedWaveProvider.AddSamples(waveInEventArgs.Buffer,0, waveInEventArgs.BytesRecorded);
    }

    private void OnStopRecordingClick(object sender, RoutedEventArgs e)
    {
        // stop recording
        recorder.StopRecording();
        // stop playback
        player.Stop();
        // finalise the WAV file
        savingWaveProvider.Dispose();
    }
}

This technique isn’t only for saving audio that you record or receive over the network. It’s also a great way to get a copy of the audio you just played, which is very handy when you want to troubleshoot audio issues and want to get a copy of the exact audio that was sent to the soundcard. You can even insert many of these at different places in your signal chain, to hear what the audio sounded like earlier in the signal chain.

Friday, 14 March 2014

Detecting Mouse Hover Over ListBox Items in WPF

I’m a big fan of using MVVM in WPF and for the most part it works great, but still I find myself getting frustrated from time to time that data binding tasks that ought to be easy seem to require tremendous feats of ingenuity as well as encyclopaedic knowledge of the somewhat arcane WPF data binding syntax. However, my experience is that you can usually achieve whatever you need to if you create an attached behaviour.

In this instance, I wanted to detect when the mouse was hovered over an item in a ListBox, so that my ViewModel could perform some custom actions. The ListBox was of course bound to items in a ViewModel, and the items had their own custom template. You may know that unfortunately you can’t bind a method on your ViewModel to an event handler, or this would be straightforward.

My solution was to create an attached behaviour that listened to both the MouseEnter and MouseLeave events for the top level element in my ListBoxItem template. Both will call the Execute method of the ICommand you are bound to. When the mouse enters the ListBoxItem, it passes true as the parameter, and when it exits, it passes false. Here’s the attached behaviour:

public static class MouseOverHelpers
{
    public static readonly DependencyProperty MouseOverCommand =
        DependencyProperty.RegisterAttached("MouseOverCommand", typeof(ICommand), typeof(MouseOverHelpers),
                                                                new PropertyMetadata(null, PropertyChangedCallback));

    private static void PropertyChangedCallback(DependencyObject dependencyObject, DependencyPropertyChangedEventArgs args)
    {
        var ui = dependencyObject as UIElement;
        if (ui == null) return;

        if (args.OldValue != null)
        {
            ui.RemoveHandler(UIElement.MouseLeaveEvent, new RoutedEventHandler(MouseLeave));
            ui.RemoveHandler(UIElement.MouseEnterEvent, new RoutedEventHandler(MouseEnter));
        }

        if (args.NewValue != null)
        {
            ui.AddHandler(UIElement.MouseLeaveEvent, new RoutedEventHandler(MouseLeave));
            ui.AddHandler(UIElement.MouseEnterEvent, new RoutedEventHandler(MouseEnter));
        }
    }

    private static void ExecuteCommand(object sender, bool parameter)
    {
        var dp = sender as DependencyObject;
        if (dp == null) return;

        var command = dp.GetValue(MouseOverCommand) as ICommand;
        if (command == null) return;

        if (command.CanExecute(parameter))
        {
            command.Execute(parameter);
        }
    }

    private static void MouseEnter(object sender, RoutedEventArgs e)
    {
        ExecuteCommand(sender, true);
    }

    private static void MouseLeave(object sender, RoutedEventArgs e)
    {
        ExecuteCommand(sender, false);
    }

    public static void SetMouseOverCommand(DependencyObject o, ICommand value)
    {
        o.SetValue(MouseOverCommand, value);
    }

    public static ICommand GetMouseOverCommand(DependencyObject o)
    {
        return o.GetValue(MouseOverCommand) as ICommand;
    }
}

And here’s how you would make use of it in a ListBoxItem template:

<ControlTemplate TargetType="ListBoxItem">
    <Border Name="Border"
           my:MouseOverHelpers.MouseOverCommand="{Binding MouseOverCommand}">
        <Image Source="{Binding ImageSource}" Width="32" Height="32" Margin="2,0,2,0"/>
    </Border>
</ControlTemplate>

This is essentially a specialised version of the generic approach described here for binding an ICommand to any event. My version simply saves you needing a separate command for MouseEnter and MouseLeave.

Friday, 7 March 2014

Python Equivalents of LINQ Methods

In my last post, I looked at how Python’s list comprehensions and generators allow you to achieve many of the same tasks that you would use LINQ for in C#. In this post, we’ll look at Python equivalents for some of the most popular LINQ extension methods. We’ll mostly be looking at Python’s built-in functions and itertools module.

For these examples, our test data will be a list of fruit. But all of these techniques work with any interable, including the output of generator functions. Here’s our Python test data

fruit = ['apple', 'orange', 'banana', 'pear', 
         'raspberry', 'peach', 'plum']

Which of course in C# is

var fruit = new List<string>() { "apple", "orange",
 "banana", "pear", "raspberry", "peach", "plum" };

Any & All

LINQ’s Any method allows you to test whether any of the items in a sequence fulfil a certain requirement, while All checks if all of them do. Python’s built-in functions are named the same, so it’s really straightforward. Let’s see if any of our fruit contain the letter “e”, then see if all of them do:

>>> any("e" in f for f in fruit)
True
>>> all("e" in f for f in fruit)
False

in LINQ:

fruit.Any(f => f.Contains("e"));
fruit.All(f => f.Contains("e"));

Min & Max

Again, Python has built-in functions similarly named to LINQ. Let’s find the minimum and maximum fruit lengths:

>>> max(len(f) for f in fruit)
9
>>> min(len(f) for f in fruit)
4

which are the equivalents of:

fruit.Max(f => f.Length);
fruit.Min(f => f.Length);

Take, Skip, TakeWhile & SkipWhile

LINQ’s Take and Skip methods are very useful for paging data, or limiting the amount you process, and TakeWhile and SkipWhile come in handy from time to time as well (TakeWhile can be a good way of checking for user cancellation).

Take and Skip can be implemented using the itertools islice function. We can specify an end index, or a start and end index. If the end index is None, that means keep going to the end of the iterable. I’d prefer methods actually called “skip” and “take” as I think that makes for more readable code, but they could be easily created if needed.

Here’s Take(2) and Skip(2) implemented with Python. Since islice returns a generator function, I turn it into a list for debugging purposes:

>>> from itertools import islice
>>> list(islice(fruit, 2))
['apple', 'orange']
>>> list(islice(fruit, 2, None))
['banana', 'pear', 'raspberry', 'peach', 'plum']

islice does have the benefit though of letting you combine a skip and a take into one step rather than chaining them like you would in C#:

fruit.Skip(2).Take(2);

with islice:

>>> list(islice(fruit, 2, 4))
['banana', 'pear']

The itertools module does include a “takewhile” method and for LINQ’s SkipWhile, it’s “dropwhile”. With these functions, you might want to use Python’s lambda syntax, which is a rare example of where the Python is less succinct than C#.

>>> from itertools import takewhile
>>> list(takewhile(lambda c: len(c) < 7, fruit))
['apple', 'orange', 'banana', 'pear']
>>> from itertools import dropwhile
>>> list(dropwhile(lambda c: len(c) < 7, fruit))
['raspberry', 'peach', 'plum']

Here’s the same TakeWhile and SkipWhile in C#:

fruit.TakeWhile (f => f.Length < 7);
fruit.SkipWhile (f => f.Length < 7);

First, FirstOrDefault, & Last

With LINQ you can easily get the first item from an IEnumerable. This throws an exception if the sequence is empty, so FirstOrDefault can be used alternatively. With Python, the “next” method can be used on an iterable (but not on a list). Let’s use Python to get the first fruit starting with “p” and to return a default value when our generator looking for the first fruit starting with “q” doesn’t find any elements.

>>> next(f for f in fruit if f.startswith("p"))
'pear'
>>> next((f for f in fruit if f.startswith("q")), "none")
'none'

There does not seem to be any built-in Python function to implement LINQ’s “Last” or “LastOrDefault” methods, but you could quite easily create one. Here’s a fairly rudimentary one:

>>> def lastOrDefault(sequence, default=None):
...     lastItem = default
...     for s in sequence:
...         lastItem = s
...     return lastItem
...
>>> lastOrDefault((f for f in fruit if f.endswith("e")))
'orange'
>>> lastOrDefault((f for f in fruit if f.startswith("x")), "no fruit found")
'no fruit found'
You could do the same if you really needed the LINQ “Single” or “SingleOrDefault” methods, which also have no direct equivalent.

Count

The LINQ Count extension method lets you count how many items are in a sequence. For example, how many fruit begin with ”p”?

fruit.Count(f => f.StartsWith("p"))
Probably the most logical expectation would be that Python’s “len” function would do the same, but you can’t call len on an iterable. There is a neat trick though you can use with the “sum” built-in function.
>>> sum(1 for f in fruit if f.startswith("p"))
3

Select & Where

We saw in the last blog post that a list comprehension already includes the capabilities of LINQ’s Select and Where, but there may be times you want to them to be available as functions. Python’s “map” and “filter” function take an iterable and a lamba and return an iterator (this is Python 3 only – in Python 2 they returned lists). Here’s a couple of simple examples of them in action, with the output turned into a list for debug purposes:

>>> list(map(lambda x: x.upper(), fruit))
['APPLE', 'ORANGE', 'BANANA', 'PEAR', 'RASPBERRY', 'PEACH', 'PLUM']
>>> list(filter(lambda x: "n" in x, fruit))
['orange', 'banana']

 

GroupBy

At first glance it might appear that itertools groupby method behaves the same as LINQ’s GroupBy, but there is a gotcha. Python’s groupby expects the incoming data to be sorted by the key, so you have to call sorted first. This example shows us first trying to group without sorting (resulting in two “p” groups), and then doing it the right way. We’re grouping by first letter of the fruit, and I’m using a helper method to print out the contents of the grouped data:

>>> def printGroupedData(groupedData):
...     for k, v in groupedData:
...         print("Group {} {}".format(k, list(v)))
...
>>> from itertools import groupby
>>> keyFunc = lambda f: f[0]
>>> printGroupedData(groupby(fruit, keyFunc))
Group a ['apple']
Group o ['orange']
Group b ['banana']
Group p ['pear']
Group r ['raspberry']
Group p ['peach', 'plum']
>>> sortedFruit = sorted(fruit, key=keyFunc)
>>> printGroupedData(groupby(sortedFruit, keyFunc))
Group a ['apple']
Group b ['banana']
Group o ['orange']
Group p ['pear', 'peach', 'plum']
Group r ['raspberry']

OrderBy

As we saw above, the “sorted” built-in function in Python can be used to order a sequence. It returns a list, but this is understandable since to implement OrderBy it must iterate through the entire sequence first. Here we sort the fruit by their string length:

>>> sorted(fruit, key=lambda x:len(x))
['pear', 'plum', 'apple', 'peach', 'orange', 'banana', 'raspberry']

Distinct

As far as I can tell there isn’t a built-in function in Python to emit a distinct iterable sequence, but the easiest way is probably to just construct a set. If you wanted to create a generator function, allowing you to abort early before reaching the end of a sequence, you could create your own helper method:

def distinct(sequence):
    seen = set()
    for s in sequence:
        if not s in seen:
            seen.add(s)
            yield s

Zip

The last example I’ll look at is the Zip method. In Python there is an equivalent zip function, and it is actually a little simpler as it assumes you want a tuple, rather than LINQ’s where you need to explicitly create a result selector function. It actually supports zipping more than two sequences together which is nice. As with LINQ’s Zip, the resulting sequence is the length of the shortest. Here’s a quick example of the Python zip function in action:

>>> recipes = ['pie','juice','milkshake']
>>> list(zip(fruit,recipes))
[('apple', 'pie'), ('orange', 'juice'), ('banana', 'milkshake')]
>>> list(f + " " + r for f,r in zip(fruit,recipes))
['apple pie', 'orange juice', 'banana milkshake']

Conclusion

As can be seen, most of the main LINQ extension methods have fairly close Python equivalents, and those that don’t could be quite easily recreated. I don’t pretend to be an expert on Python, so if I’ve missed any cool tricks, let me know in the comments.