Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
System.Linq.Parallel Is Now Open Source (github.com/dotnet)
146 points by fekberg on Dec 3, 2014 | hide | past | favorite | 59 comments


Sorry to be "that guy", but what is System.Linq.Parallel? Can anyone provide a brief description/ list of links where to read more about it? Why is open sourcing it such a big deal?


Linq is a library that injects a whole bunch functional style operations on top of list-like objects. Technically speaking, its implementing a type-safe query (think SQL) language on top of them, but the difference is very minimal in practice.

What this does is allow for certain operations to automatically be run in parallel. If you wanted to add 1 to every element in an array, you could do that with with Linq:

    IEnumerable<int> numbers = new List<int>() { 1, 2, 3, 4, 5 };
    var newList = from num in numbers.AsParallel()
                    select (x => x+1);
    
but by adding ".asParallel()" to the chain:

    var newList = from num in numbers.AsParallel()
                    select (x => x+1);
your code is automatically run on each element in a parallel fashion and the runtime handles marshalling it all into a queue for processing... or however it does it's magic.

The MSDN document on the way it does it is quite clear: http://msdn.microsoft.com/en-us/library/dd997399(v=vs.110).a... - describing many linq operations as 'delightfully parallel'.

I personally love Linq. To me it feels like using a Functional Relational Mapping system when combined with Linq-To-SQL, and the same semantics can be used to create a reactive programming model similar to what is intended with kefir.js and Akka ( http://msdn.microsoft.com/en-us/data/gg577609.aspx )

The idea itself isn't new, Perl 6 has something like this planned for trivial cases like for (1..10) { $_ = $_ +1; }

----

I don't know the full story, but for me the significance of open sourcing this code is that it will potentially encourage forks and extensions that allow running stuff upon GPGPUs or other massively parallel architectures while still using Linq style syntax.

TL;DR: With Linq, Anything that implements iEnumerable (lists, arrays, streams, etc) can be made to automatically support all sorts of nifty transforms. The Parallel bits let many of them run in parallel quite trivially


I hate to be nit-picky, what you're presenting is non-idiomatic C#, I would guess you're not a C# programmer. No-one really uses the SQL-esque langugage. The simple methods reads better, are quicker to type, are clearer and are more succinct. 'p', 'q' also tends to be the de facto standard operator in lambda expressions, rather than 'x', 'y' or 'i', 'j'. And why would you use IEnumerable<int> where you used it, it makes no sense?

Your code should have read:

    var numbers = new List<int>() { 1, 2, 3, 4, 5 };
    var newList = numbers.AsParallel().Select(p => p + 1);


> 'p', 'q' also tends to be the de facto standard operator in lambda expressions, rather than 'x', 'y' or 'i', 'j'.

The authors of the plinq library are clearly not C# developers then. https://github.com/dotnet/corefx/search?p=1&q=select&utf8=%E...


I tend to use a letter somewhat related to the object I'm running the query against. Like 'c' for a List<Customer>.

While using i,j,k as loop variables have a long tradition behind them (from FORTRAN77, IIRC), no reason to limit myself when using Linq.


>I hate to be nit-picky

Then don't :). What you've presented is very much like saying "I don't encounter many people with your style of programming and thus, no one uses that style and you're not an actual programmer in this language".

Don't forget that there are varying styles of doing things. You use 'p'? great! Others will use foo, i, j, lalal, etc. No need to get worked up over such a minor issue and detract from the actual conversation at hand.


> No-one really uses the SQL-esque langugage.

Of course we do. I wrote this just the other day:

  var salesAndReturns = from sale in sales
				 join return in returns on new { StockId = sale.StockId, ClientStockId = sale.ClientStockId, SellPointId = sale.SellPointId }
					equals new { StockId = return.StockId, ClientStockId = return.ClientStockId, SellPointId = return.SellPointId } into gpu
					from subreturn in gpu.DefaultIfEmpty()
					select new
					{
						sale.ClientStockId,
						sale.StockId,
						Quantity = -sale.Quantity + (subreturn ?? new SellPointStock()).Quantity
					};

The are no rules restricting us from how we should use LINQ.


No one uses the sql esque language? Where do you get that idea from? I've been a c# dev for 7 years and have seen plenty. For complicate queries it's significantly more compact (and readable, in my opinion).

And using IEnumerable was probably making the point that it didn't have to be a list, it could be anything implementing that interface.


Ach, come on, the briefest trawl through recent SO questions and you'll see the % of DSLs has dropped to a small amount since it launched.

Downvoters be damned, they know not what they talk about. ultimape is clearly not a C# developer.


I know people with 10 years experience with the language that still use the DSL for simple cases.

There is a lot of discussion about the usage of var, and ultimape's non-usage of it in his example is a personal preference, as well as making sense in an educational post.

And lastly, I think you'll find many developers, young and old alike, use x, y and z as their lambda parameters. I do too. It comes from the mathematical function origin of the syntax. In most early examples when it was introduced, and loads of educational materials now, lambda syntax is still presented akin to a mathematical formula:

x => x+1

In that context, using x makes a lot more sense than p, and has stuck around as a habit. It's a style thing, and everybody has their own style. You don't get to claim superior experience or skill based on a 2 line snippet that doesn't match your style.

And as an aside: you're not being downvoted by "the ignorant masses". You're being downvoted for being confrontational and not contributing to a conversation about the topic, which is Parallel LINQ and not coding style.


"Ach"? "Trawl"? No one uses those words. You're clearly not an English speaker.


They're reasonably prevalent in Scotland


Tch, come on, the briefest trawl through recent blog posts and you'll see the % of Scottish idioms has dropped to a small amount since people have been blogging. Downvoters be damned, they know not what they talk about. mattmanser is clearly not an English speaker.


For simpler stuff, ya, DSL is not handy at all. But for the complex ones, shame on you if you don't DSL it. And don't come with percentage of SO questions bullshit, that's bad programmer excuse.

For longer queries, that little DSL syntax it's a bless. It's way more pleasant to read, write or think about it than the alternative.

Ah, and since you care so much about it, I've been developing in C# for freaking years that I don't bother to sum it up anymore.


SO is as much about making yourself look as smart as possible while still needing help. I would bet half of the folks submitting questions over linq use resharper to "nerdify" their code before they submit it...


Not a fan of the DSL myself, but plenty of people in this C# shop here use it. Plus, replacing 'let' becomes ugly using methods rather quickly, for example.


I use the query syntax. And I have since .net 3 came out. I also use extension methods. In particular "let" can be much easier to accomplish in the query syntax. And I've usually used the first letter of the type I'm dealing with for my lambda parameters. I've never seen any standard that favored p or q.


I prefer the non SQL-esque way also, I find it prettier, and I rarely see it at work. As for q and p, and I always use i, and I see lots of other random letters. Does the letter used really matter?


How do you write joins? I find them readable with the SQL-esque way, but I haven't found a nice way to write them with the 'code' way.


I'd have to disagree. Sure for the simple one liner this is true. But start doing let bindings, group by, etc.. and the standard methods can get very messy.

There is nothing wrong with the DSL syntax at all.


http://msdn.microsoft.com/en-us/library/dd460688(v=vs.110).a...

Parallel LINQ is an extension to LINQ (Language Integrated Querying). The "parallel" hints that it is a way to process results in parallel - using multiple threads, possibly on multiple CPUs/cores.

It allows you to express patterns like fork/join in a very elegant, fluent way.

If I have a set of Orders each with a collection of OrderItems, and each OrderItem* has a Consolidate method (I know, contrived, but bear with me).

    Orders.SelectMany(o => o.OrderItems).AsParallel().ForAll(Consolidate);
will "consolidate" all of the order items in parallel (or with the optimal degree of parallelism given CPUs/cores).


This is actually becoming my chief complaint about submissions. It appears to me as if folks now believe that it's cool to submit links which resolve as deeply and closely as possible to subject (in this case a the subdirectory inside the project's repo) with total disregard to the fact that there is insufficient (or no) information describing why we should be interested in the link or what, if anything, we should discuss.

I'm willing to bet the submitter went through a series of links beginning with some sort of announcement that something changed to get to the link they submitted. Why do folks think that we would not also benefit from that information or that it would not help to foster more interesting discussion?


Why? Probably because the "blog spam" or "only submit the original source" comments will show up and submissions get flagged. Happens enough that I would imagine the submitter probably followed all the links in response to prior submissions.


It would be better if we could change titles, but more often than not mods revert them back to whatever the page title is.


LINQ = Enumeration Monad

Parallel = executes tasks on multiple cores simultaneously if possible


Hmm, I wonder why this was added:

    #if DEBUG
                    currentKey = unchecked((int)0xdeadbeef);
    #endif
https://github.com/dotnet/corefx/blob/master/src/System.Linq... line 110 :P


It is a magic marker http://en.wikipedia.org/wiki/Magic_number_%28programming%29#....

Both its 16-bit parts have odd values, it is an illegal address (at least on the systems it was introduced for), and it is recognizable in hex dumps. That makes it a good value for uninitialized memory.


looks like a marker to see if the code got past two previous lines?


It seems to be to avoid using currentKey inadvertently. If someone refers it elsewhere they get deadbeef.


would be awesome if any of the dotnet open sourcing would allow to port linq to python. i would use it in a hearbeat.


There is more than one Python for .NET AFAIK. And an awesome VS support as well: https://pytools.codeplex.com


I can't use Python on .NET, I use it with numpy and the rest of the great native ecosystem. There are great ORMs, but I also really like LINQ.


I hardly do any Python nowadays, but why do you want LINQ if you already have itertools and comprehensions?


LINQ is way more than just itertools and comprehensions.


Only if you are speaking about things like LINQ2SQL, LINQ2XML and similar.

Which can be done if the right ABC are implemented.


Also LINQ Rx.


Which is just reactive programming.


...readily available and idiomatic as part of the LINQ universe.


A simple google search gave me

https://rxpy.codeplex.com/


I guess that's useful for people who are already bought in to the .NET world, but isn't LINQ encumbered by patents that pose a risk to anyone building on top of it?


No. That was/is just FSF FUD.

You have always been able to build anything on top of it without concerns about patents from the implementation. This claim was so hilarious that it is incomprehensible that anyone ever believed it.

You have also been able to reimplement .NET CLR and core libraries without fear of patent litigation (from Microsoft). They have placed the CLR and core libraries under the legal estoppel of the community promise since 2007 (IIRC), in addition to publicly granting patent license to anyone creating an implementation of NET CLR and core libraries from the specifications. This latter was part of the process under which C#, .NET CLR and core libraries was standardized under ISO. A precondition for the standardization was that any necessary patents (for implementation) be offered on RAND terms (reasonable and non-discriminatory). Microsoft has always offered the patent grants free.

The community promise was created in response to FUD from (among others) FSF that Microsoft would just sue anyway (despite patent grants), and with their vast army of lawyers and deep coffers they could bury in court. The community promise creates legal estoppel, whereby a case by Microsoft would be dismissed if you acted "in good faith" by relying on the promise.

Open sourcing Parallel LINQ has no bearing on the patent status of anything building on top of it. If you believe the FSF FUD, you will be ensnared in .NET technology and Microsoft will sue you out of existence if you ever become successful. If you do not believe the FUD, you can continue to take advantage of LINQ, Parallel LINQ, .NET, C#, F# etc.


No, it's not just FUD, and it doesn't just apply to LINQ either. Microsoft only promised not to sue over certain kinds of patent infringement in core libraries; that promise didn't extend to code you wrote on top of those libraries, which is unsurprising. Unfortunately they've patented a bunch of common ways of using .NET functionality, including many of the things you might use LINQ for and one of the most common idioms for using delegagtes and events. (Both of the commonly used GUI libraries - Windows.Forms and Gtk# - infringe the latter patent. So if you're running a GUI-based .NET app on anything but Windows then Microsoft can sue you if they ever decide it's to their advantage to do so.)


If Microsoft is willing to spend millions of dollars and take a huge PR hit to sue you, it's safe to assume that whatever you've built is successful/popular enough to make you very a wealthy man.

At that point, who cares?


Possibly it matters if it's not you that's sued, but you have similar usage as the company that is. Investors may or may not find that an acceptable risk if MegaCorp XYZ is currently embroiled in a large lawsuit.

For example, if your company had the same strategy as Google (and/or made some similar mistakes) with regard to circumventing Java licensing/restrictions, I could see investors or possible investors that were aware of that being somewhat spooked at certain points in the past.


When I read the community promise, it seemed to still allow Microsoft to go after competitors and use patents "defensively". So it's not impossible that you end up fighting MS on an unrelated matter, then they could use their promised patents against you. Probably incredibly unlikely, but a strict interpretation might view it as a risk.

Or did I misunderstand?


Microsoft is explicit about not doing that: https://github.com/dotnet/corefx/blob/master/PATENTS.TXT

However, as far as I understand it, the patent grant only holds for .NET runtimes you write yourself or programs to be run in a .NET runtime.


Interesting that the patent grant from Microsoft is much more sane than that from Facebook:

https://github.com/dotnet/corefx/blob/master/PATENTS.TXT :

If you file, maintain, or voluntarily participate in any claim in a lawsuit alleging direct or contributory patent infringement by any Covered Code, or inducement of patent infringement by any Covered Code, then your rights under this promise will automatically terminate.

vs

https://github.com/facebook/react/blob/master/PATENTS :

The license granted hereunder will terminate, automatically and without notice, for anyone that makes any claim (including by filing any lawsuit, assertion or other action) alleging (a) direct, indirect, or contributory infringement or inducement to infringe any patent: (i) by Facebook or any of its subsidiaries or affiliates, whether or not such claim is related to the Software, (ii) by any party if such claim arises in whole or in part from any software, product or service of Facebook or any of its subsidiaries or affiliates, whether or not such claim is related to the Software, or (iii) by any party relating to the Software; or (b) that any right in any patent claim of Facebook is invalid or unenforceable.


Which, TBH, is all anyone should really care about. We can use dotnet for free if we like and it's getting officially open-sourced and supported across platforms; this is great.

-Not directing anything at you specifically


The part that has been opensourced is under MIT license.

According to the wiki (http://en.wikipedia.org/wiki/MIT_License),

> Whether or not a court might imply a patent grant under the MIT license therefore remains an open question.

This probably means MSFT is unlikely to enforce those patents.


Can anyone knowledgeable tell us how this compares to Java's parallel streams?


LINQ/PLINQ are the obvious inspirations for Java 8s stream API. I don't know if they are exactly functionally equivalent but streams bring declarative style programming to Java iterable collections that C# has had for years.

But, don't be confused by the .NET Rx libraries which provide LINQ style methods to observable event streams and have been ported as RxJava. Personally, I think that is a better use of the term streams and find the Java 8 standard streams name confusing.


As far as I am concerned, all of the BCL has pretty much been 'open source' since Reflector came around in 200..2?


There's a big BIG difference between "source readable" and "open source".


By that logic, essentially every binary is open source. After all, you can dump a listing for any binary. And decompilers will create C source code (though it isn't always compilable). It's much different to getting the local var names, comments, actual source structure.


No, you still have to deliberately decompile it. JavaScript web sites on the other hand are always open-source (readable with a standard browser).


Having access to the source code doesn't mean it's "open source". Open source has to do with being able to reuse the code yourself, not to being able to read it.


But the server literally pushes the source to me, and I don't typically explicitly accept any agreements. Why cannot I simply use a readily available feature of my browser and take a piece of code and just use it? Especially if there is no licence agreement right in that JS file.


Unlicensed code does not grant permission to use, in fact it's the exact opposite.

The server is not pushing the source to you so that you can take it and use it in your own projects. It's pushing the source because that's how the browser will run the program.

Of course you CAN take and use the code. But you MAY not.


For much the same reason you can't make VHS recordings of TV shows and sell them?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: