Does this support using the Parakeet model locally? I'm a MacWhisper user and I find that Parakeet is way better and faster than Whisper for on-device transcription. I've been using push-to-transcribe with MacWhisper through Parakeet for a while now and it's quite magical.
Not yet, but I want it too! Parakeet looks incredible (saw that leaderboard result). My current roadmap is: finish stabilizing whisper.cpp integration, then add Parakeet support. If anyone has bandwidth to PR the connector, I’d be thrilled to merge it.
Some lovely folks have left some other open-source projects that implement Parakeet. I would recommend checking those out! I'll also work on my own implementation in the meantime :D
Parakeet is amazing - 3000x real-time on an A100 and 5x real-time even on a laptop CPU, while being more accurate than whisper-large-v3 (https://huggingface.co/spaces/hf-audio/open_asr_leaderboard). NeMo is a little awkward though; I'm amazed it runs locally on Mac (for MacWhisper).
Yeah, Parakeet runs great locally on my M1 laptop (through MacWhisper). Transcription speed of recordings feel at least 10x faster than Whisper, and the accuracy is better as well. Push to talk for dictation is pretty seamless since the model is so fast. I've observed no downside to Parakeet if you're speaking English.
A bit tangential statement, about parakeet and other Nvidia Nemo models, i never found actual architecture implementations as pytorch/tf code, seems like all such models, are instant-ized from a binary blob making it difficult to experiment! Maybe i missed something, does anyone here have more experience with .nemo models to shed some more light onto this?
Yeah, that's the way that I cut onions: you make vertical cuts followed by one single horizontal cut slightly above the cutting board.
This way of calculating doesn't take into account the creative ways you can make cuts. You could also do mostly vertical slices, and then slightly angle inwards when you do the final few cuts. That would get you a more optimal distribution as well.
A lot of people insist that this is the way. However, at some point, I figured out that making the horizontal cut (or cuts) before you make vertical cuts is a lot easier. You can do it by simply putting the onion with the root on the board and cutting down at an angle of about 5-10 degrees. When the tip of the knife hits the board, simply don't press down all the way to keep the root intact. Then put it down normally and make the vertical cuts. You can easily manage 3 or 4 horizontal cuts this way. And there's no awkward cutting towards yourself with a sharp knife. All this business of first making lots of vertical cuts and then attempting a horizontal cut is a lot more fiddly. The vertical cuts affect the structural integrity of the onion. This makes the horizontal cuts harder. And it also makes the process of dicing harder.
Of course, as the article points out, the horizontal cuts don't really do much that a chef should care about. You can dice an onion super fine with just vertical cuts very close together. And it's a lot faster and easier. You might angle some of the cuts towards the edges. But honestly, even that is unnecessary and a bit overkill. With a good knife, you can put the vertical cuts really close together. So close that any kind of angle would mean the cuts cross each other. Once you are that close, a horizontal cut really does not matter. And if you do a rough cut, the size matters even less.
If you are interested in this topic, there's a French chef on Youtube called Jean Pierre who is full of practical wisdom and techniques. You can learn a lot from him. And he's highly entertaining to watch too. He's very opinionated on onions. Or Onyo as he pronounces it. You won't see him making horizontal cuts, ever.
The attitude of the blog writer in their interactions also feels off. Just reading this blog post makes me think that this person is difficult to work with, requires extremely clear guidelines and instructions, and has a hard time making their own decisions. Maybe this is a good fit for a large, established company, but startups have their own needs.
"Create a terminal inspired email client so we can do an alpha test with some customers" is a reasonable ask for an engineer at an early stage startup. Of course, there would be a bit more specification, but a lot of the details would still be up to the engineer. This applicant wants more certainty than they can get.
This is illustrated by the line: "I would like to know what kind of response I could expect from Kagi if I drive it to completion." This is not a great request to make. There's no way they can answer that question, because there is no certainty available. They're probably getting a few hundred or a few thousand more submissions to evaluate.
In your comments history someone replied to a job posting "Just an FYI, if you do a takehome project for William, he won't respond to you (not even with an auto reject)."
Seriously? A candidate puts in a week of work and he can’t be guaranteed a 30 minute discussion?
Nobody is getting a few hundred or few thousand submissions to evaluate. Nobody. If you are getting 1k applicants, at best 50 are asked to do a take-home and even then, not all at once.
If by some miracle 100 people did this to completion at the same time, there should be a notice to the effect that due to high volume, blah blah blah.
…which is totally understandable… if the hiring manager had communicated that. They could’ve easily mailed back “Hi this proposal seems much more detailed than what we need for evaluation, please save yourself some time and energy.”
The author may have had issues (I personally don’t count “need clear instructions” as an issue - edit - I see they didn’t adhere to the TUI prompt), but the hiring manager definitely did.
> …which is totally understandable… if the hiring manager had communicated that.
I agree that the hiring manager could have handled it much better, but as a rule: If at any point during any hiring process you feel like you need to spend even close to a full time week of work on anything without being very explicitly told so, you are wrong.
I am a hiring manager and we do take home homeworks. I fully agree that this is a key piece of communication. I always take the time to tell candidates that although they have as much time as they want, we expect 2-3 hours of effort at most, to be respectful of their time.
Without that, take home problems would seem predatory.
Yes on this: It reminds me of when candidates ask me how they did at the end of the interview. It shows an extreme lack of decorum and empathy. What if you did terribly and I wouldn't hire you in a million years? Do you really want me to tell you that?
There's no good answer and asking a question like that shows real narcissism imo.
Conversely, if you can't handle some straightforward feedback to a candidate that took the time to interview you without violating decorum or hurting their feelings, then how can I expect you to be a good manager or supervisor? How are you possibly going to be able to handle minor personnel conflicts or provide guidance during the training period? It comes across as a complete lack of basic managerial skills.
Okay, but basic interpersonal skills are a prerequisite for anybody in a senior or team lead position, or any position that will involve code reviews.
I'm sympathetic to how awkward it can feel to provide honest feedback to a candidate, but look: we're all people here. I think we forget that sometimes when we're assembling hiring processes. As a candidate, you need some kind of feedback mechanism that allows you to improve even if you're not a good fit for a particular organization. And if you're involved in the hiring process in any way, you ought to be equipped to handle that.
> you need some kind of feedback mechanism that allows you to improve even if you're not a good fit for a particular organization
"Need". That is a strong term. I disagree. It would be nice, but it is not a need.
This topic has been discussed ad nauseam on HN. In most companies, there is specific company policy that prohibits providing feedback to candidates. There is literally no upside for these companies to provide feedback to candidates that they reject (except Fake/Feel-Good Internet Points, only redeemable on HN forums). Really: There is no way around it, no matter how many tears are spilled about it on HN.
This is simply a defense of bad policy couched in unnecessarily dehumanizing language.
There is widespread resentment of this and many other common hiring practices in the tech sector, and that is further impacting both the quality of candidates as well as employee motivation and satisfaction. The upside for companies is higher quality candidates whose first experience with the company is a hiring process that makes the candidate want to work there.
I broadly agree with this being an unfortunate outcome but you do understand that making candidates who failed your interview want to work at your company is fundamentally limited in how much it actually helps you. Yes, yes, I know some of them may come back and pass the next time, or they tell their friends about how you were super nice and gave them great feedback, but this is pretty rare. If you're doing this, you're doing it out of the goodness of your heart, not because it helps your recruiting pipeline. And, even though I agree with the idea of providing feedback, assuming that people will have positive feelings when you tell them why you didn't accept them is misguided. I have friends who I know personally that have gotten interview feedback and not taken it well. Of course I tell them to shut up and stop poisoning the well for everyone else, but the point is that this is largely not the picture you are presenting it as.
Sorry, I wasn't clear. Providing constructive feedback to a candidate is unlikely to have a direct positive impact on the relationship between that specific candidate and that specific company. It's more of ... whatever the opposite of the tragedy of the commons is. A policy that, if improved, would broadly improve the quality of many candidates for many companies.
Companies have been optimizing for candidates that are an immediate ideal cultural and technological fit. They are all competing for candidates that are the idealized developer, with perfect social skills, a brilliant CV, and deep technical experience that is an exact match for whatever the company is doing at the moment.
That's fine and rational and all, but a necessary consequence of this is that that pool is quite small and there are lots of companies competing for those people. Meanwhile, there are a lot of very good candidates who are underemployed because they aren't getting the opportunity or resources needed to become those idealized employees. This is a game theory outcome where both parties are optimizing themselves into a losing position.
I've been employed in this industry, off and on, for a long time. I assure you that companies didn't always behave this way. There has been a clear, obvious, and severe decline in the hiring experience, and these policies are hurting the entire industry.
It's generally socially frowned-upon to go on a couple of dates with someone and then ghost them. It happens, but it's not considered good practice. We recognize that it's cruel but also leads to a more cynical, detached, overall worse dating experience for everyone. Saying "I don't think this will work out, you seem nice but you're not what I'm looking for right now" is difficult and awkward, but it's also a necessary skill that needs to be maintained. Sometimes people don't react well, but that doesn't make it less necessary: it closes a feedback loop that ultimately allows earnest people who are looking for relationships to learn and grow and become better candidates for the next relationship.
I agree, but my point is that the tragedy of the commons here is more divorced than usual. Companies can barely understand that doing layoffs hurts morale, and that connection is really easy to demonstrate. Trying to convince them that taking on some liability for a slightly better applicant pool seems difficult.
> In most companies, there is specific company policy that prohibits providing feedback to candidates. There is literally no upside for these companies to provide feedback to candidates that they reject
This is the long and short of it.
In the US at least, discrimination laws are expansive. You can -very- easily end up saying something that violates this and putting your company at risk, no matter how good hearted you were attempting to be.
How do you "accidentally" end up saying something that implicates you in discrimination on the basis of legally protected characteristics - what are some examples of that?
This has always felt like an excuse used by people who who just don't want to be caught in their own lies when asked to come up with a real, non-discriminatory reason.
The other comments gave good answers. A lot of people think it means saying something horrible and racist or something, but not at all.
As one pointed out, there's a "well you said it was X, but person Y who got hired did that too. And they're a different race or gender or religion, so that leads me to believe discrimination."
There's also you trying to be helpful, saying something along the line of "well you hesitated a bit and sounded unsure in your answers", only to find out they have some disability that caused that and now have admitted you're discriminating based on it.
Maybe you'll say "well, if I had known, I wouldn't have noticed it or cared." And a lot of candidates would likely say as much up front. But they don't have to tell you about it at all. See how that creates a weird dynamic?
Is it common? Probably not. But it obviously happened or else such rules wouldn't exist. It's one of those things that the bad actors ruin it for everybody. Bigots are never going to admit their reasons - good people will. But bad people will always try to take advantage, regardless.
I think it's more of a case for legal and HR being conservative and super defensive. Not sure if you've ever handled a contract with an internal lawyer, but in my experience they often go for crazy suggestions that the other side would never accept for the sake of protecting the company as much as possible. Might be the same here - HR/legal being super protective and the hiring manager not caring enough to fight back.
> How do you "accidentally" end up saying something that implicates you in discrimination on the basis of legally protected characteristics - what are some examples of that?
Say you say it was for failure to meet a specific performance standard (because that is the documented reason); then the ex-employee has a starting point for an discrimination claim by looking for evidence that trnds to support the claim that people who differ on some protected-from-discrimination axis who failed to meet that standard were not fired. No reason given, no starting point. In theory, this policy helps make false nuisance claims more work and less likely, but a substantive reason for it is that HR knows that they cannot eliminate all prohibited acts by managers that would create liability, so making it harder to get a starting point for gathering evidence is important to prevent valid claims from materializing. HR policy does not exist to protect employees from unlawful treatment, it exists to protect the company from liability for such treatment. Sometimes thise two interests align, but when it comes to information about firing decisions they do not.
There’s similar things that can be done with other prohibited reasons for dismissal, loke retaliation; but the idea is any information you give makes it easier for them to make a case against you.
This is also, in reverse, why, as a departing employee (whether departing voluntarily or not), you should never participate in an exit interview or, if you must as a condition of some severance or other pay or benefit, never volunteer any information beyond the bare minimum necessary; one significant purpose of such interviews is to document information useful either for potential claims against you or to defend against any potential claims you might have, including those you have not yet discovered, against the company.
Part of it is, if anything can be taken slightly out of context to imply something discriminatory, there are those who will abuse the system and sue. At a large enough scale this can become a real problem. If the company policy is "never say anything" there's nothing to be taken out of context, reducing the chance of a lawsuit.
I bet you this comes back to insurance, as many things do in the corporate world. Sufficiently large companies probably have insurance coverage for discrimination lawsuits, or at least employment disputes in general. The coverage probably costs less if you have a "no feedback" policy.
Who are you trusting as a technical interviewer if you don't already trust them to give negative feedback internally?
Do you not code review? Are you a rubber stamp "LGTM" shop that should just be pushing to main but cargo culted the ceremony because github has it built in?
I had someone email me after being rejected at the final round of an interview. "Everything seemed to mesh just perfectly, and I'm at a loss to understand."
I broke it down for them. "This was nothing to do with you, and we would have had no objection to hiring you. However, the candidate who beat you out simply had more domain experience in XYZ area" and went on to say "For what it's worth, we had 500+ applications, of which we in-depth reviewed 100 resumes, had 40 first-round interviews, 15 second-round, and three final round."
They emailed me back to express appreciation and that though this didn't work out, it renewed their confidence to know they didn't "mess something up".
Since then, if we're at that point in a process and I'm rejecting you, I'll at least give you something to work with.
This is so important for people to understand, and its why I give people feedback.
People, being humans and prone to pattern seeking, assume that if they didn't get the job, it's something specific they did, or failed to do.
And sometimes, that's true. But for a lot of candidates, it just came down to another candidate being slightly better, or slightly cheaper, or some combination of value markers.
A lot of my interview feedback comes down to "I don't see any reason you wouldn't be a good fit, but we have other interviews and it's going to come down to value."
Some people will take this as me saying "Don't ask for what you're worth," or "we're gonna low-ball your salary." The reality is, we're a business, and if I can produce the same widget with person X or person Y and person X costs 10K less a year, I'm going with person X. Every time.
Yes we want to know. Framing this as an empathy issue when in reality you're just to afraid to be honest or afraid of any kind of conflict IS an empathy issue. At that point they're not a person. They're an annoyance that you want gone immediately.
Don't take this the wrong way, but I deliberately ask how I did because it helps me weed out interviewers who think like this. Not so much "how did I do?" as "now we're close to the end of the interview, do you think we're a good fit for each other?" I give my own feedback and talk honestly about points of friction.
I interview pretty well, but if I go into an interview with a company that wants hungry hustlers and I've spent the whole interview talking about kindness and team spirit, or if you think I don't know enough pl/pgsql to deal with your gnarly legacy backend, or I'm getting the vibe that none of the engineers seem to like working here, then we need to speak honestly about that.
No need. Just walk away. Remember: You are interviewing them, just as they are interviewing you. Any company worth its weight will not allow red flags to leak into the interview process, e.g., "getting the vibe that none of the engineers seem to like working here". So many times, I have reached the final round of an interview process, met the senior manager... and thought: "Barf, I don't want to work for that person. What a waste of my time."
I know I'm interviewing them. That's why we need to talk.
If they wanted to hire me enough to interview me, but at the end of a half-day of interviewing I'm going to walk away without a job, then they need to rewrite their position description so I know not to apply, deal with their morale problem, or directly ask me how much PL/pgSQL I've done. We both stand to benefit from talking about how the interview went.
But you also need to factor in their position in the situation right?
Like suppose they do hate their job. Do you expect them to speak that plainly and honestly to every candidate who asks "So how do you like working here?" and risk getting that posted to the front page of HN?
You're asking them to risk their own livelihood so you get a better signal for your own job search, that doesn't seem like a proportional trade to me.
Obviously I'm not advocating for complete opaqueness, but your interviewer is hardly ever in a good position to part with their true feelings towards questions like "How did I do compared to other candidates? How is it truly working here?"
I've basically almost always given direct and obvious non-answer to the first question: "I cannot tell you right now, because I'll need to write down and collate my thoughts. And I'm not allowed to share feedback directly, so your recruiter will be in touch with the feedback afterwards."
> What if you did terribly and I wouldn't hire you in a million years? Do you really want me to tell you that?
Yes, that sounds like extremely valuable feedback.
Why do you suppose asking a question like that shows narcissism? To me it shows a willingness to infest feedback to improve.
I will add the caveat that if someone asked me that in an interview I would likely give a non-answer because I’m not totally sure what all I’m even allowed to say.
A simple request for feedback is not evidence of narcissism or lack of empathy. Could be anxiety. Could be curiosity. Could be zeal. Could be any number of things. It's certainly not an "extreme lack of decorum" though.
It's okay to avoid giving feedback if you don't want to. I can think of a few ways to answer that question in a neutral or positive fashion to defuse the situation and legally protect the company.
Not to mention there are legal liabilities with sharing interview performance with candidates. "Oh but the interviewers told me I did extremely well on their interviews. Therefore it must be the case that I was rejected because of ${protected attribute X}."
Really?? I always appreciated candidates that would ask that at the end - being willing to step aside from the pretense of professionalism to ask a real question and listen to my answer is a signal to me that this is someone who is willing to be real with me, not pretentious or perfunctory.
I do get what you’re saying, but I disagree, there is a good answer; and as is often the case, it’s an honest one.
> asking a question like that shows real narcissism imo.
Precisely the opposite. Asking for criticism and genuinely being interested in what others think of you with the goal of taking the feedback on board and improving is the polar opposite of typical narcissistic behavior. As far as I'm aware that sort of self-reflection is inherently incompatible with NPD.
Maestro AI | Senior Full Stack Engineer | REMOTE (US) Seattle onsite preferred, Remote OK | https://www.getmaestro.ai/
Hi, I'm William, the co-founder and CEO of Maestro AI. We're building an all-knowing chief of staff for engineering and product leaders. Maestro provides real-time, comprehensive insights, synthesized from Slack conversations, Jira tickets, and more. We allow leaders to stay in control of deadlines, accelerate their team, and understand and prioritize work.
Our goal is to eliminate information silos and become the OS that helps run engineering teams (and later every type of team).
We're an early-stage, venture-backed startup, and you'll have the opportunity to shape our product and culture from the ground up.
We're looking for someone who takes pride in their craft, pushes their limits, and takes action to create something new. Our stack is Python on the backend and React/Redux/Typescript on the frontend. We're training our own models in addition to using existing LLMs.
We've been thinking about the things you need to be careful about when building AI products for businesses and put them in this article. Are there important things that we missed?
It's insane that that this works, and that it works fast enough to render at 20 fps. It seems like they almost made a cross between a diffusion model and an RNN, since they had to encode the previous frames and actions and feed it into the model at each step.
Abstractly, it's like the model is dreaming of a game that it played a lot of, and real time inputs just change the state of the dream. It makes me wonder if humans are just next moment prediction machines, with just a little bit more memory built in.
It makes good sense for humans to have this ability. If we flip the argument, and see the next frame as a hypothesis for what is expected as the outcome of the current frame, then comparing this "hypothesis" with what is sensed makes it easier to process the differences, rather than the totality of the sensory input.
As Richard Dawkins recently put it in a podcast[1], our genes are great prediction machines, as their continued survival rests on it. Being able to generate a visual prediction fits perfectly with the amount of resources we dedicate to sight.
If that is the case, what does aphantasia tell us?
Worth noting that aphantasia doesn't necessarily extend to dreams. Anecdotally - I have pretty severe aphantasia (I can conjure milisecond glimpses of barely tangible imagery that I can't quite perceive before it's gone - but only since learning that visualisation wasn't a linguistic metaphor). I can't really simulate object rotation. I can't really 'picture' how things will look before they're drawn / built etc. However I often have highly vivid dream imagery. I also have excellent recognition of faces and places (e.g.: can't get lost in a new city). So there clearly is a lot of preconscious visualisation and image matching going on in some aphantasia cases, even where the explicit visual screen is all but absent.
> Many people with aphantasia reports being able to visualize in their dreams, meaning that they don't lack the ability to generate visuals. So it may be that the [aphantasia] brain has an affinity to rely on the abstract representation when "thinking", while dreaming still uses the "stable diffusion mode".
(I obviously don't know what I'm talking about, just a fellow aphant)
Obviously we're all introspecting here - but my guess is that there's some kind of cross talk in aphantasic brains between the conscious narrating semantic brain and the visual module. Such that default mode visualisation is impaired. It's specifically the loss of reflexive consciousness that allows visuals to emerge. Not sure if this is related, but I have pretty severe chronic insomnia, and I often wonder if this in part relates to the inability to drift off into imagery.
Pretty much the same for me. My aphantasia is total (no images at all) but still ludicrously vivid dreams and not too bad at recognising people and places.
What’s the aphantasia link? I’ve got aphantasia. I’m convinced though that the bit of my brain that should be making images is used for letting me ‘see’ how things are connected together very easily in my head. Also I still love games like Pictionary and can somehow draw things onto paper than I don’t really know what they look like in my head. It’s often a surprise when pen meets paper.
I agree, it is my own experience as well. Craig Venter In one of his books also credit this way of representing knowledge as abstractions as his strength in inventing new concepts.
The link may be that we actually see differences between “frames”, rather than the frames directly. That in itself would imply that a from of sub-visual representation is being processed by our brain. For aphantasia, it could be that we work directly on this representation instead of recalling imagery through the visual system.
Many people with aphantasia reports being able to visualize in their dreams, meaning that they don't lack the ability to generate visuals. So it may be that the brain has an affinity to rely on the abstract representation when "thinking", while dreaming still uses the "stable diffusion mode".
I’m no where near qualified to speak of this with certainty, but it seems plausible to me.
We are. At least that's what Lisa Feldman Barrett [1] thinks. It is worth listening to this Lex Fridman podcast: Counterintuitive Ideas About How the Brain Works [2], where she explains among other ideas how constant prediction is the most efficient way of running a brain as opposed to reaction. I never get tired of listening to her, she's such a great science communicator.
Interesting talk about the brain, but the stuff she says about free will is not a very good argument. Basically it is sort of the argument that the ancient greeks made which brings the discussion into a point where you can take both directions.
So are gravity and friction. I don't know how well tested or accepted it is, but being just a theory doesn't tell you much about how true it is without more info
It's unclear how that compares to a high-end consumer GPU like a 3090, but they seem to have similar INT8 TFLOPS. The TPU has less memory (16 vs. 24), and I'm unsure of the other specs.
Something doesn't add up, in my opinion, though. SD usually takes (at minimum) seconds to produce a high-quality result on a 3090, so I can't comprehend how they are like 2 orders of magnitudes faster—indicating that the TPU vastly outperforms a GPU for this task. They seem to be producing low-res (320x240) images, but it still seems too fast.
There's been a lot of work in optimising inference speed of SD - SD Turbo, latent consistency models, Hyper-SD, etc. It is very possible to hit these frame rates now.
Also recursion and nested virtualization. We can dream about dreaming and imagine different scenarios, some completely fictional or simply possible future scenarios all while doing day to day stuff.
Penrose (Nobel prize in physics) stipulates that quantum effects in the brain may allow a certain amount of time travel and back propagation to accomplish this.
Image is 2D. Video is 3D. The mathematical extension is obvious. In this case, low resolution 2D (pixels), and the third dimension is just frame rate (discrete steps). So rather simple.
This is not "just" video, however. It's interactive in real time. Sure, you can say that playing is simply video with some extra parameters thrown in to encode player input, but still.
I think you're mistaken. The abstract says it's interactive, "We present GameNGen, the first game engine powered entirely by a neural model that enables real-time interaction"
Further - "a diffusion model is trained to produce the next frame, conditioned on the sequence of past frames and actions." specifically "and actions"
User input is being fed into this system and subsequent frames take that into account. The user is "actually" firing a gun.
Okay, I think you're right. My mistake. I read through the paper more closely and I found the abstract to be a bit misleading compared to the contents. Sorry.
Academic authors are consistently better at editing away unclear and ambiguous statements which make their work seem less impressive compared to ones which make their work seem more impressive. Maybe it's just a coincidence, lol.
It's interactive but can it go beyond what it learned from the videos. As in, can the camera break free and roam around the map from different angles? I don't think it will be able to do that at all. There are still a few hallucinations in this rendering, it doesn't look it understands 3d.
You might be surprised. Generating views from novel angles based on a single image is not novel, and if anything, this model has more than a single frame as input. I’d wager that it’s quite able to extrapolate DOOM-like corridors and rooms even if it hasn’t seen the exact place during training. And sure, it’s imperfect but on the other hand it works in real time on a single TPU.
Then why do monsters become blurry smudgy messes when shot? That looks like a video compression artifact of a neural network attempting to replicate low-structure image (source material contains guts exploding, very un-structured visual).
Uh, maybe because monster death animations make up a small part of the training material (ie. gameplay) so the model has not learned to reproduce them very well?
There cannot be "video compression artifacts" because it hasn’t even seen any compressed video during training, as far as I can see.
Seriously, how is this even a discussion? The article is clear that the novel thing is that this is real-time frame generation conditioned on the previous frame(s) AND player actions. Just generating video would be nothing new.
In a sense, poorly reproducing rare content is a form of compression artifact. Ie, since this content occurs rarely in the training set, it will have less impact on the gradients and thus less impact on the final form of the model. Roughly speaking, the model is allocating fewer bits to this content, by storing less information about this content in its parameters, compared to content which it sees more often during training. I think this isn't too different from certain aspects of images, videos, music, etc., being distorted in different ways based on how a particular codec allocates its available bits.
I simply cannot take seriously anyone who exclaims that monster death animations are a minor part of Doom. It's literally a game about slaying demons. Gameplay consists almost entirely of explosions and gore, killing monsters IS THE GAME, if you can't even get that correct then what nonsense are we even looking at.
I guess you are being sarcastic, except this is precisely what it is doing. And it's not hard: player movement is low information and probably not the hardest part of the model.
Uff, I guess you’re right. Mea culpa. I misread their diagram to represent inference when it was about training instead. The latter is conditioned on actions, but… how do they generate the actual output frames then? What’s the input? Is it just image-to-image based on the previous frame? The paper doesn’t seem to explain the inference part at all well :(
Video is also higher resolution, as the pixels flip for the high resolution world by moving through it. Swivelling your head without glasses, even the blurry world contains more information in the curve of pixelchange.
Correct, for the sprites. However, the walls in Doom are texture mapped, and so have the same issue as videos. Interesting, though, because I assume the antialiasing is something approximate, given the extreme demands on CPUs of the era.
Maestro AI | Senior Full Stack Engineer, Senior AI Engineer | REMOTE (US) Seattle onsite preferred, Remote OK | https://www.getmaestro.ai/
Hi, I'm William, the co-founder and CEO of Maestro AI. Throughout my career in software engineering, I've repeatedly witnessed how fragmented workflows and broken processes can stifle even the most brilliant teams. That's why I started Maestro AI. Our mission is to streamline the process of software development, turning chaos into clarity so that teams can accomplish more.
Maestro AI provides real-time, comprehensive insights for teams so everyone can easily see what's happening, how projects are progressing, and where people are blocked. We eliminate information silos by aggregating data from multiple collaborative tools like Slack, Jira, Notion, and Github, enabling seamless communication and informed decision-making across the entire organization.
We're an early-stage, venture-backed startup, and you'll have the opportunity to shape our product and culture from the ground up.
We're looking for someone who takes pride in their craft, pushes their limits, and takes action to create something new. Our stack is Python on the backend and React/Redux/Typescript on the frontend. We're training our own models in addition to using existing LLMs.
Does anyone know whether the 128K is input tokens only? There are a lot of models that have a large context window for input but a small output context. If this actually has 128k tokens shared between input and output, that would be a game changer.
This may be why things like Substack and Beehiiv have taken off. The only way to combat reposting through Google is to deliver content directly to email inboxes before it gets ranked and reposted.
There is something additional at play with TechCrunch, though. Recently I feel like they haven't been posting as many articles that are about smaller startups as they used to. They tend to post more about Google, Nvidia, Intel, etc. I find myself reading it less and less because it's mostly news that you can also find elsewhere.
Yeah, you're right. Data breaches are essentially just slaps on the wrist to companies like AT&T. Maybe it's possible to fine them based on the proportion of the userbase that was affected and the profits they generated for a certain time period.
I wonder if this will push companies to stop using external vendors to store and process data. If companies stored all of their info in house, it would prevent the case where compromising one vendor compromises everyone's data. But it would also mean that each individual company needs to do a good job securing their data, which seems like a tall ask.
I propose that the fines should be based on what the data would be sold for on a dark web forum. These breaches should be exponentially more expensive, which would incentivize companies to retain less sensitive data.