Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
I signed up to write apps for my Amazon Echo, Amazon sent me an NDA
85 points by Sidnicious on March 13, 2015 | hide | past | favorite | 49 comments
I ordered an Amazon Echo as soon as it was announced. Privacy concerns aside, a device like this could be the perfect UI for a smart home. My main interest in the Echo was (and remains) developing for it. Could I ask it about the status of my backups? Turn on lights and air conditioning? Use it to verbally send email while cooking? The possibilities were enticing.

I signed up for the beta API here:

https://developer.amazon.com/public/solutions/devices/echo

This week I got the below email. It promises access, but only if I agree to an NDA. I was planning to do all of my development on GitHub, permissively licensed (or public domain), and of course I'd want to get feedback and ideas from friends and other developers. It also makes me question how friendly the Echo will be to hackers and open development in the future — do I really want to invest my time in it?

I understand that NDAs have their place, but I don't think this is one. I have experience developing for Apple devices, where any information relating to prerelease OSs is covered by NDA, and it's easy to find questions on Stack Overflow where someone asks an innocent question and is immediately shot down by another developer: "This API is embargoed." I don't want to encourage that spirit in the developer community.

Have any other developers signed up and gotten this email? If you were in this situation, would you fight it?

The email and NDA: https://gist.github.com/Sidnicious/c0483c73653b3c2c619f



This is typical for a high-importance API that a company starts as private or closed.

Generally these are big B2B partnerships.

They might go public eventually with it, but until then most people will probably experience this.

Source: Have run (and do currently help run) a couple of private API programs for large companies.

EDIT: Part of the reason they are doing it is because their API might be crap right now, or rife with potential security/use case holes. They don't want open apps out there either yet. Basically they are trying to control the experience.


Warning, dirty, dirty, quick hack idea ahead... to use it for voice interface alone with all thinking and actions done by some other computer or machine(s). Encode commands for the other computer as sound files which that computer can listen for. Using just a few, easy to bandpass-filter tones in the sound files, one can encode a great many commands that are easy for the actual automation machine(s) to distinguish. (One bit of beauty here to this hack is that, it doesn't even have to be a single automation computer hooked to everything, little embedded systems throughout the house can each listen for just the commands they need to worry about.)

Then one says "Echo, play 'Start my heating'". The Echo plays what it thinks is a short song named 'Start my heating' by that band "My Hacked Home Controls" which you seem to like so much, which goes "beep beep boop whisle click", and the other machine(s) hears that easy to interpret command and does what you want. If you need information back, have the(a) machine put that information into a "Results" sound file and add/overwrite it to the command/music library which the Echo can see and play from, then tell the Echo to play that file for you to hear. You say, "Echo play 'Backups Status'" and it plays the voice you assigned to your backup monitor telling you whatever.

Basically, the Echo becomes a translation droid between you and the computer(s) that you have deeper/more control of. And as a bonus, depending on how you encode the commands into sounds, you might be able to learn some of the more common whistle sequences from your translation droid, and just do them yourself.


You just turned it into the Rube Goldberg version of Home Automation. Hahahaha.


On the Echo page: "If you are interested in being part of a limited-participation beta ahead of the SDK's public release, please provide your contact information."

I don't see any problem here: RIGHT NOW they're not ready to show anything PUBLICLY (and to be bound to an unstable API for example) but when everything will be completly cooked, they'll make it public

The NDA is just a way to ensure that the API wont become a de-facto liability and wont be made public by good souls... ;)


Not wanting people to depend on an unfinished version of the API makes sense, but an NDA as the mechanism for accomplishing this goal seems really strange. The idea that Amazon would be "bound" by API docs shared by early participants but will not be bound by the same API docs if early participants don't share them is weird.


The one thing that fellow big companies never get is that people find it anyway (see, Snapchat)


I don't think they care so much about people finding it, as they do all of the people who will try to build their own businesses around the API and then try to sue when the API changes.


That's also true.


I don't think it's worth a fight. Either you are willing to agree to Amazon's terms for access or not. I agree that NDAs are usually dumb, and this one is as well, but I also doubt that access to this developer program is worth the effort it would take to change the practice.

My guess is that Amazon realizes the capabilities of this device are toeing the line of acceptability to most people, and they are taking whatever steps they can to avoid bad press (such as an app developer revealing how much they can eavesdrop, or what security or lack-thereof exists).


Disclaimer: I'm an AWS employee

"If you are interested in being part of a limited-participation beta ahead of the SDK's public release"

It says right there that the SDK is going to go public. Can you blame them for wanting to slowly ramp up their API capacity and not have a sea of devs whining every time their early beta API changes? Why must everything be evil?


That's a valid concern, but what does a non-disclosure agreement have to do with that? If they don't want people to use their API, can't they just not authorize them to use the API? I don't see how a full NDA is necessary for that.


Maybe so the initial impressions of an incomplete API won't make others think that it isn't a platform worth developing for? I really don't know the answer, but I could see it from a "brand image" perspective. You don't want people spouting all over the internet how much v0.0.01-alpha of your API is absolutely terrible and turning other developers off to your platform in the future.


You're an exception. They don't want people hacking around with it making apps for themselves and friends. They want proper shops making apps for a marketplace, with a specific business directive and opportunity for monetization.

I know that sucks, but if you want something more open, perhaps there is something you can do with a Raspberry Pi w/ some kind of voice module?


> I know that sucks, but if you want something more open,

Raspberry Pi isn't all that "open", either, but I suppose at least it doesn't have an NDA. See the Issues section here:

https://wiki.debian.org/RaspberryPi


This page is rather out of date since the release of the VideoCore IV documentation and open source driver work for it. I would consider it an order of magnitude more open than the Echo.


100% Correct.

They're afraid people will write an app for themselves, and distribute it.

They want to control access/experience.


"If you are interested in being part of a limited-participation beta ahead of the SDK's public release"

Then why would they make it public?


I'm talking about the early life of the program. Some people start private, get a feel for what people want, make changes and then go public.

Some people start public and go private (Netflix, LinkedIn)


Take a look at this for OS VR + Control https://jasperproject.github.io/


What I think a lot of people don't realize is just how easy speech recognition is these days. By that I mean, to create an application that has as good of voice recognition capability as, say, Google Now, is not that much harder than setting up any other event-driven input system. I'm sure the actual recognition of speech is quite difficult. But there are a fair number of APIs available that make it easy to use in your applications.

Almost every programming language has a canonical API that is extremely simple to set up. In .NET and Google Chrome, it's even first-party, and I only mention those because they happen to be the ones I've used.

And text-to-speech is no different. You can get to "good enough" so easily that I'm somewhat perplexed why more people don't do it.

I just think there is a perception that adding a speech recognition feature to your app is "hard". It's difficult to design a good speech-based user experience (I've found longer phrases work better than single-words, and it's good to try to match on homophones as well), but actually integrating the code is not that tough.

So I would encourage you to ignore Amazon. Seriously, you could make this in a weekend in a Google Chrome browser on an Android device that you plug a nice microphone into.


In google chrome it reaches out to google to do the recognition.

.NET seems to be built on the technology that has shipped with Windows for a while, and yet Cortana etc still reach out to the mothership for their speech recognition, so there are probably significant benefits to doing so, otherwise someone would have made theirs work offline and touted it.

I'm guessing that matching a pre-specified pattern with some amount of error is easier than transcribing arbitrary voice, which you need for search.

But I definitely agree, I regularly wish I could stop google maps' directions with a voice command when I get to a part of a route that I already know well enough to complete myself.


With the cloud-based systems, it seems they made them from the perspective of their mobile devices, then ported the APIs to the desktop experience. The idea they are trying to put across is that it costs less energy on the phone to fire up the radio and send it to The Cloud--waiting a whole second just for network and processing queue latency--than it is to process locally. I don't think that is actually the case and it's just a sell to get all data to go through their systems.

There aren't more popular client-side APIs because there aren't more popular client-only applications, and there aren't more popular client-only applications because developing the user experience is hard. I think, if you used one of these systems to figure out a good UX, then it would give you the impetus to chase down a good client-side solution for speech recognition.


It is not just a sell. I build Neural Networks as part of some work I am doing, and the kinds of multi-layer deep networks that do the recognition just don't perform well on small devices. Also the best systems often have many different trained networks and specialized routing to find the best network for your accent, cadence, noise level, etc. Having all of these weights on your local device, running all the tests, running through any of the required preprocessing, and asking the network what the answer is, converting the results, etc. is going to be too much for your phone or low powered device. While there might be other reasons for them to send data to the cloud, to get the accuracy you really need beefy servers (potentially with GPU's or something more exotic like FPGA's), and potentially multiple servers working in concert.


Since I don't know the factors involved and you do, I must then ask how to reconcile this with the fact that the first version of Dragon NaturallySpeaking was released in 1997. My smartphone is definitely more powerful than even a top-of-the-line 1997 desktop PC.

Now, I am sure Dragon V1 would be quite underwhelming to my modern notions of how computers should work. But I would even say my smartphone is better than a middle-of-the-road PC from 2010. It's better than my PS2, and even my PS2 had a few games that had speech interface (some Navy Seals game, it worked pretty well).

What factors are at play? Is it the noise cancellation? Seems like a stationary device in a home has the ideal chance to create a reusable noise cancellation profile.


I was thinking about making my own as well! I haven't done any research but you'd need a smartphone dock, a good microphone/speaker (wireless microphones/speakers around the house?!?). Better yet, mount a tablet on the wall somewhere convenient/attractive (living room, next to thermostat, etc.) and wire everything behind it.

Change Android settings to always listen when screen is on, then keep screen always on (dims when not active). Voice-controlled App support galore! Google Now! Develop whatever you need/want!

The Echo is a neat idea put into action, but it can be hacked together in a much better way, very easily.


I've thought about doing it, but I don't already have the devices in my home that are capable of being automated and I've got enough money dumped into projects that required bits like this that I never finished as it is.

I would, however, be interested in trying to create a room simulator. I did work on living space automation for a while, a few years ago (specifically for hotels, not homes). One of the most difficult parts of designing good user experiences was the lack of ability to rapidly test, due to the need to setup a ton of devices, associate them together, maybe even flash them with new ROMs. You could get in two, maybe three tests as day, if you were really cooking. I wanted to be able to test as fast as I could compile code, you know, like I had gotten used to in every other software project I had ever worked on. But electrical engineers work with primitive tools and seem to like it.


This just came up in the Github Explore Today newsletter: https://github.com/jhauswald/sirius


It's not as hard as it was, yes, but only if you accept to depend on closed APIs. On .NET, I guess you're depending on WSR?


It's a sight better than signing a whole NDA. And it's no different a situation than all the other APIs we have to deal with in Android or iOS. You gotta fight the right battles at the right time.

But yes, I believe the System.Speech.Recognition namespace wraps WSR. But given that it's in the System namespace, it should be a part of the ECMA standard for .NET. Aaaaaand, yep, it appears it is available in Mono, so I assume they are managing the cross-platform dependency issues for you.


Could also check out api.ai they do a pretty good job of voice recognition into machine commands.


Interesting, thanks.


Correct me if I'm wrong, but isn't every single early access or developer agreement from Apple just as restrictive in wording?

I'd say sign it and start developing.

Either the product will reach critical mass and the NDA will be much more loosely enforced, or it will fail and the NDA won't matter. With big companies you'll always have to sign something like this to get into their walled garden. Try not to let it get in your way.


As of this last year's WWDC I think Apple is finally backing down on that. None of the presentations carried the usual alert to not share what you have seen in there and they have started (slowly) opening new OS X and iOS betas to the public.


Yeah, it's actually pretty neat of Amazon to even consider a non-business type of developer IMO.

Many times they (the company running this type of API) just kicks them to the side.


> Correct me if I'm wrong, but isn't every single early access or developer agreement from Apple just as restrictive in wording?

Two wrongs don't make a right.


I'd probably think about returning the device, if I were you. Keeping an API closed is their decision, and they can build proper authentication around it if they want to, but keeping all knowledge of the API behind bars is a bit excessive.


Not in the short term. They want to limit exposure, and probably collect feedback and make changes before throwing it out to everyone.

There are version guarantee considerations to make.

Also, having told my current company this, people will find the API and expose it anyway.


Given how difficult it is to obtain an Echo, selling it on eBay would probably be quite profitable.


I don't think this is too bad until the official release. Amazon probably doesn't want public discussions about unfinished products.


Unsurprising. Personally, I won't have an Echo in my home, and I wouldn't really feel comfortable around one in someone else's - I wouldn't sign up for the dev program or the NDA for same. :)


This NDA has no expiration, neither date nor conditions. You're arguably bound by it after the API is public. Yuck.


that's not how NDAs work publicly available information can be talked about. you may be held to any constraints in the NDA associated with disparaging comments -- but about the actual information -- public is public


To be fair, the agreement is clear that it only applies to 'confidential information' as opposed to reviews or things you create (and as opposed to feedback "provided to Amazon", which they own).

But 'confidential information' is not defined in the agreement, and without expiration, I'd still be nervous. Perhaps I overworry.


Once the SDK is publicly available, putting your code on Github should be fine. I work for the Echo team and we want to engage hackers and open source and create a community around the product. But right now the API is a work in progress and we don't want to build too much hype around it until we are confident that we've got it down right.


I wouldn't fight it, but I wouldn't sign it either. I won't develop for a platform with that level of restriction.


I can't wait to get my Echo. All I want is for the wake up word to be "computer". Is that so much to ask?


That is not too much to ask. I support you!


This is how they rolled out the Kindle SDK. That app platform was a flop, but I'm sure there were a number of contributing factors.


sign the nda and don't disclose your work until the api is made public. this is really not a big deal.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: