Hacker Newsnew | past | comments | ask | show | jobs | submit | unlord's commentslogin

> People don’t develop video codecs for fun like they do with software. And the reason is that it’s almost impossible to do without support from the industry.

As someone who lead an open source team (of majority volunteers) for nearly a decade at Mozilla, I can tell you that people do work on video codecs for fun, see https://github.com/xiph/daala

Working with fine people from Xiph.Org and the IETF (and later AOM) on royalty free formats Theora, Opus, Daala and AV1 was by far the most fun, interesting and fulfilling work I've had as professional engineer.


Daala had some really good ideas, I only understand the coding tools at the level of a curious codec enthusiast, far from an expert, but it was really fascinating to follow its progress

Actually, are Xiph people still involved in AVM? It seems like it's being developed a little bit differently than AV1. I might have lost track a bit.


Someone asked about benchmarks, so I ran these just now to bring data to the discussion:

  nathan@arm1:~/git/rav1d.new/target$ hyperfine --warmup 2 "release/dav1d -q -i ~/Chimera-AV1-8bit-1920x1080-6736kbps.ivf -o /dev/null --threads 8"
  Benchmark 1: release/dav1d -q -i ~/Chimera-AV1-8bit-1920x1080-6736kbps.ivf -o /dev/null --threads 8
    Time (mean ± σ):     31.532 s ±  1.971 s    [User: 244.512 s, System: 1.644 s]
    Range (min … max):   28.498 s … 34.270 s    10 runs

  nathan@arm1:~/git/dav1d.new/build$ hyperfine --warmup 2 "tools/dav1d -q -i ~/Chimera-AV1-8bit-1920x1080-6736kbps.ivf -o /dev/null --threads 8"
  Benchmark 1: tools/dav1d -q -i ~/Chimera-AV1-8bit-1920x1080-6736kbps.ivf -o /dev/null --threads 8
    Time (mean ± σ):     29.696 s ±  2.308 s    [User: 230.507 s, System: 1.479 s]
    Range (min … max):   26.618 s … 35.105 s    10 runs
It shows that as of this moment rav1d is (31.532 - 29.696)/29.696 * 100 = 6.2% slower to decode this Netflix test sequence. Note, this is an improvement from the (32.775 - 29.696)/29.696 * 100 = 10.4% when Prossimo posted a bounty for improving rav1d [1]:

  nathan@arm1:~/git/rav1d.old/target$ hyperfine --warmup 2 "release/dav1d -q -i ~/Chimera-AV1-8bit-1920x1080-6736kbps.ivf -o /dev/null --threads 8"
  Benchmark 1: release/dav1d -q -i ~/Chimera-AV1-8bit-1920x1080-6736kbps.ivf -o /dev/null --threads 8
    Time (mean ± σ):     32.775 s ±  2.694 s    [User: 254.120 s, System: 1.659 s]
    Range (min … max):   28.847 s … 37.606 s    10 runs
None of this is particularly new, I reported this gap in performance over a year ago [2]. Here are some questions for HN:

Even if rav1d closes the gap, who will ship rav1d over dav1d? More than 85% of dav1d is hand written assembly that rav1d takes as-is and calls via unsafe blocks. Is this really memory safety?

The dav1d improvements must continuously be backported to rav1d, see the 82 closed PR's here [3]. Who is going to pay for this maintenance in perpetuity?

VideoLAN fuzzes dav1d and fixes bugs extremely quickly, typically less than 24h. The rav1d developers are Rust experts, but not codec experts. Is it reasonable to expect the same support?

[1] https://www.memorysafety.org/blog/rav1d-perf-bounty/

[2] https://github.com/memorysafety/rav1d/issues/804

[3] https://github.com/memorysafety/rav1d/pulls?q=backport+is%3A...


> Even if rav1d closes the gap, who will ship rav1d over dav1d? More than 85% of dav1d is hand written assembly that rav1d takes as-is and calls via unsafe blocks. Is this really memory safety?

> The dav1d improvements must continuously be backported to rav1d, see the 82 closed PR's here [3]. Who is going to pay for this maintenance in perpetuity?

No one?

Isn't the point of rav1d to show that it's possible to write a performant encoder in semi-idiomatic Rust and that it doesn't have to be very different from one written in C?

It seemed pretty clear to me that rav1d was supposed to be a fun "research" side-project from its author shared with everyone and with an unclear future which will depends of what the community wants to actually do with it but I might be completely wrong.


Thanks! Is this the same hardware used here http://paste.debian.net/1374884?

Can you share what system was used (ARM CPU, OS, etc.)?


> The dav1d improvements must continuously be backported to rav1d, see the 82 closed PR's here

so basically the work is done in dav1d, but claimed in rav1d.

It really makes rust look like a parasite.


It is actually quite a bit more misleading. I was not able to reproduce these numbers on Zen2 hardware, see https://people.videolan.org/~unlord/dav1d_6tap.png. I spoke with the slide author and he confirmed he was using an -O0 debug build of the checkasm binary.

What's more, the C code is running an 8-tap filter where the SIMD for that function (in all of SSSE3, AVX2 and AVX512) is implemented as 6-tap. Last week I posted MR !1745 (https://code.videolan.org/videolan/dav1d/-/merge_requests/17...) which adds 6-tap to the C code and brings improved performance to all platforms dav1d supports.

This, of course, also closes the gap in these numbers but is a more accurate representation of the speed-up from hand-written assembly.


The thing I found interesting in the AVX512 gains over AVX2. That's a pretty nice gain from the wider instruction set which has often been ignored in the video community.


The important thing to understand about why AVX512 is a big deal is not the width. AVX512 adds new instructions and new instruction encodings. They doubled the number of registers (16->32), and added mask registers that allow you to remove special cases at the end of loops when the array is not a multiple of the vector width. And there is piles for new permutation operations and integer operations that allow it to be useful in more cases.

The part Intel struggles with is that in many places if they had the 256-bit max width but all the new operations then they could build a machine that is faster than the 512-bit version. (assuming the same code was written for both vector widths) The reason is the ALUs could be faster and you could have more of them.


For most operations, on most CPUs, you can get the same results with twice as many AVX2 instructions. And that's excluding the CPUs with no AVX512 at all.

But the number of situations where AVX512 has a significant advantage is growing, so interest will grow alongside it.


Sadly AVX512 is still very easy to ignore given how badly Intel has botched its rollout across consumer processors.

Ironically AMD has stronger AVX512 support at this point despite the spec originating at Intel.


The article claims that "Google wants control, and JPEG XL could take that away from them" however this is disingenuous. The JPEG XL format was co-authored by Google employees, and they have just as much say over the direction of that format as they do WebP.

The JPEG XL authors make claims about it's superiority over formats like AVIF, but there is no support or even timeline for hardware encode or decode on important platforms like mobile.

By contrast, Qualcomm is adding support for AV1 encoding to Snapdragon X (https://www.qualcomm.com/products/mobile/snapdragon/pcs-and-...) which could lead to efficient encoding of AVIF photos and animations.


Google most definitely had input in the creation of the format, but from all I can research, only at an early stage. As the project grew away from Pik they seem to have dropped interest and moved away from it.

I have updated the article with more details about their involvement for transparency.

If you have any information you can share with further information regarding Google's input, I'd open to hear it as evidence of their input as a company beyond preliminary stages has been difficult to find.

---

Regarding JPEG XL's mobile support, it makes sense it would see limited development if the company that manages one of the biggest mobile players has been the greatest restriction on their success. The lack of support also disincentivises manufacturers to prioritise support.


> Regarding JPEG XL's mobile support, it makes sense it would see limited development if the company that manages one of the biggest mobile players has been the greatest restriction on their success. The lack of support also disincentivises manufacturers to prioritise support.

There was literally no involvement from any hardware vendor in the standardization of JPEG XL. It went from a Call for Proposals in Sept 2018 to Committee Draft in Aug 2019 with very little time for industry feedback. Contrast this with AV1 which had involvement from hardware vendors Intel, NVIDIA, Arm, AMD, Broadcom, Amlogic from the beginning as well as companies who ship media on hardware at scale such as Cisco, Netflix, Samsung and yes Google. These companies reviewed and provided significant feedback on the format that made it suitable for hardware implementation.

https://news.ycombinator.com/threads?id=JyrkiAlakuijala is a lead on the project and a Google employee, and active in JPEG XL development https://github.com/libjxl/libjxl/commits?author=jyrkialakuij...


I very much agree with your observation that the involvement from hardware vendors was minimal. It definitely would have been advantageous to slow down, and it's beyond disappointing that it wasn't pursued further. They absolutely dropped the ball in that regard.

AV1, however, is first and foremost a video format. A very popular one at that. It's perfect for video, and that explains the great industry support. The fact it is the most promising option for video is why it's seen hardware vendor support, not because AVIF is ideal. JPEG XL unfortunately doesn't have this luxury, but still could have done better. This doesn't mean that JPEG XL can't see support now, though, and there are plenty of opportunities for hardware support now it's been proven viable.

While there are certainly employees at Google that have contributed to JPEG XL recently, I'm still yet to see any evidence that the company itself has provided any direct support lately.


Isn't that kind of the point though? AV1/AVIF has an extremely strong case for hardware implementations: you need efficient (both in terms of processing and compression) video decoding on modern devices because the battery cost/bandwidth cost is so high otherwise.

Image decoding is nice to have, but less important. When you are deciding to put custom hardware into a device, that's a huge investment in something only useful for that one task. Being compatible with a video format so you can share that hardware between the two tasks is a huge win for hardware manufacturers who get two-for-one, and for the rate of adoption.

With AV1 already very efficient and rolled out in a lot of hardware already, it just has a huge advantage.


While that most definitely is a consideration, AVIF lacks many features and much of the functionality offered by JPEG XL. An investment in JPEG XL, while only being beneficial for one task like you say, would have huge benefits given just how many images we see and process on the web.

AVIF certainly gets the hardware support advantage, but it fails in other regards where JPEG XL shines. Looking at it either way, JPEG XL is far from slow, and I'd argue that the other benefits outweigh that single shortcoming. Bandwidth should also be treated as a consideration, as you mentioned, and JPEG XL generally leads to smaller file sizes.

Realistically, this boils down to AVIF being largely worse than JPEG XL, with the exception of performance. Performance that could be improved for JPEG XL should hardware vendors choose to provide better support.


Any features that matter to consumers more than slightly better battery life? I haven't seen one yet despite people always touting "better features".

It feels to me like JPEG XL's advantages are mostly hypothetical, and practically AVIF is the format that has more value. I'm no expert on this, to be clear, but I have yet to be convinced despite all the JPEG XL hype on HN.


Disclosure: I worked on JPEG XL, opinions are my own.

The radio(network) on phones can consume more power than the SoC(CPU). Thus smaller size can translate into energy savings.

As to hypothetical advantage, we are talking about HW _potentially_ being used for image decode. AFAIK this does not happen in practice, despite WebP non-lossless being a VP8 frame and the hardware being plentifully available.

As evidence for the advantages of JPEG XL being real, consider the fact that it is increasingly being adopted in serious SW including ACR, Darktable, Krita, and Lightroom.


JPEG XL is significantly smaller? All the stuff I've seen has show a slight edge for the same visual quality, but not enough I'd suspect it'd offset hardware decode, although if software doesn't actually do hardware decode, then yeah, the argument falls apart.

I can 100% see JXL being adopted in production tools, where the motivations are different, I was mainly talking about the web adoption perspective for end users (which is the context of Google's supposed war on it, of course).


> While that most definitely is a consideration, AVIF lacks many features and much of the functionality offered by JPEG XL.

What features are these? You have not named a single concrete advantage. The places where AVIF out-performs JPEG XL are exactly where it make sense as an image format for the web: high fidelity images with bit rates at or below 1 bit/pixel. Nobody is browsing the web on a 16-bit panel and AVIF supports 12bpc images anyway.

Unfortunately, JPEG XL authors chose not include AVIF when they performed a subjective image comparison in 2020 under controlled viewing conditions (https://research.google/pubs/benchmarking-jpeg-xl-lossylossl...), but previous subjective studies showed AVIF outperforming PIK and FUIF over the evaluated bit-rates.


I'm curious what the evidence is for AVIF outperforming below 1bpp?

Have you seen this more recent data that includes AVIF? https://cloudinary.com/labs/cid22


> Have you seen this more recent data that includes AVIF? https://cloudinary.com/labs/cid22

The graph from Cloudinary uses libaom to do the encoding at speed preset 7 (aom s7), which is far from speed preset 0 and disables many AVIF coding tools. I do not know why this was chosen by the author, but it does not reflect AVIF performance. According to https://github.com/AOMediaCodec/libavif/issues/440#issuecomm... speed preset 8 loses 20-35% compression efficiency.


At a guess, probably because it matches or at least is in the same order of magnitude encode speed as JPEG XL? It starts to feel like bait and switch if we say "look what we can do with >1000x slower encode", without noting that JPEG XL can also do a bit better with more encode time, and yet practical encoders, especially HW, use much higher speed settings.

Note: the 20-35% is BDrate, which in this context likely(?) involves some form of PSNR, which has almost nothing to do with human perception and IMO should not be used to guide such decisions.


Encoding speed is not really a concern for web uses, where the image is decoded potentially millions or even billions of times more than it is encoded.

I agree, PSNR is a terrible measure of quality. The study "Benchmarking JPEG XL lossy/lossless image compression" (https://research.google/pubs/benchmrking-jpeg-xl-lossylossle...) which you are an author on included a controlled subjective evaluation done by EPFL using an ITU recommend methodology. The subjective results concluded:

"HEVC with SCC generally results in better rate/distortion performance when compared to JPEG XL at low bitrates (< 0.75), better preserving sharp lines and smooth areas"

It is known that AVIF performs better than HEVC. Can you say why it was not included in your 2020 subjective evaluation? It would be nice to not need to speculate on the relative quality of AVIF v JPEG XL at web bitrates.


> PSNR is a terrible measure of quality

Glad we can agree on that. Unfortunately many evals use that or plain SSIM, which is still L2 at heart.

> Encoding speed is not really a concern for web uses

hm, in many discussions with Jon of Cloudinary, I did not get the impression that this is the case. Imagine their enthusiasm about 100x-ing their compute costs.

> Can you say why it was not included in your 2020 subjective evaluation?

The paper's comment on this is: "The selection of anchors is based on the general popularity of the codecs and the availability of the software implementations. We only intended to use codecs which are standardized internationally."

> using an ITU recommend methodology

While this has some helpful guidance on viewing conditions, it is unfortunately still subjective (what parts are observers looking at, what counts as "annoying") and is more useful for detecting severe artifacts, which less relevant in practice because that's hopefully not the quality range people are using.

Also, these results are quite old and both encoders have changed since then.

> need to speculate on the relative quality of AVIF v JPEG XL at web bitrates.

No need to speculate :) Just to first agree on what are actual web bitrates. From Chrome metrics, IIRC it was over 1bpp. Here's some newer data: https://discuss.httparchive.org/t/what-compression-ratio-ran... Even for AVIF, the median is 0.96 and q3 is 1.79(!).

Jon has written several articles on comparisons, including https://cloudinary.com/blog/contemplating-codec-comparisons and https://cloudinary.com/blog/jpeg-xl-and-the-pareto-front#med....


> The paper's comment on this is: "The selection of anchors is based on the general popularity of the codecs and the availability of the software implementations. We only intended to use codecs which are standardized internationally."

how very convenient, looks like politics outweigh any benchmarking

I have no game in the matter, but as a large website provider perspective, handling millions of images and processing thousands per day, I am glad not to have to deal with yet another format that would double our cache costs and force eternal support on the web. Not everybody has infinite google money to afford any kind of image format existing on the planet, and google can get only so much leeway after poisoning the web with webp.

j2c


small typo in the article https://research.google/pubs/benchmarking-jpeg-xl-lossylossl... (very interesting read!)


how about a conversion path for regular legacy jpeg? and look at cloudinary, jpeg-xl is superior in quality to avif


> it makes sense it would see limited development if the company that manages one of the biggest mobile players has been the greatest restriction on their success

Huh? Isn't Apple pro-JPEG XL?


I was referring to Google with Android. You're correct in thinking Apple is pro JPEG XL.


> which could lead to efficient encoding of AVIF photos

Efficient in what sense? HW encoders usually only explore a fraction of the search space, and in the case of JPEG often result in 3..5 bit per pixel images.


> If you avoid it then you won't be using RISC-V at all, right now.

I'm sorry this just isn't true. The K230 has RVV 1.0 hardware and has been available for 5 months.

> Which matters much less than it used to, as gcc 14 can compile C code with RVV intrinsics to either.

This is still a huge problem for fragmentation. Multimedia libraries in FFmpeg and VideoLAN use hand written assembly and only support standards compliant RVV 1.0.

There is no reason to ever produce a binary for RVV 0.7.1, it will simply fail if run on standards compliant hardware.


Note that AVIF is not just AV1 video keyframes. The entire compliment of AV1 video coding tools (including inter prediction with motion vectors) are available. This includes spatial and temporal scaling.

Note this means that animated images on the web (like GIF) are significantly smaller with AVIF than JPEG-XL which has no inter prediction.


Yes, for animation a video codec like AV1 is much more suitable than a still image codec like JPEG XL.

JPEG XL does have some weak forms of inter prediction though (but they were designed mostly for still image purposes). One of them is patches: you can take any rectangle from a previously 'saved' frame (there are four 'slots' for that) and blit it at some arbitrary position in the current frame, using some blend mode of choice (just replace, add, alpha blend over, multiply, alpha blend under, etc). This is obviously not as powerful as full motion vectors etc, but it does bring some possibilities for something like a simple moving sprite animation. This coding tool is currently only used in the encoder for still images, namely to extract repeating text-like elements in an image (individual letters, icons etc) and store them in a separate invisible frame, encoded with non-DCT methods (which are more effective for that kind of thing) and then patch-add them to the VarDCT image. The current jxl encoder is not even trying to be good at animation because this is not quite its purpose (it can do it, but 'reluctantly').

Anyway, I think that animation is in any case best done with video codecs (this is what video codecs are made for), and I wish browsers would just start accepting in an <img> tag all the video codecs they accept in a <video> tag (just played looping, muted, autoplay), so we can for once and for all get rid of GIF.


I don't think this should be solved with tags (or other OOB methods). Just set a "loop" flag in the container (or metadata). GIF is perfect because it doesn't require ANY additional info when it comes to looping or animation.

Any format that doesn't have this is doomed to fail as a GIF replacement.


++ to making it easier to use other codecs in place of .gif!


Good point. AV1 and AVIF could improve on how some sites (like Twitter) turn actual GIFs into video now.

Also a plus for saving phone snaps, since the camera often saves a short video these days anyway.


Yes, and in fact you can see a demo of dav1d compiled to wasm here:

http://jabberfr.org/tmp/ogv.js/

Note this works fine in desktop and mobile Safari.


> It really is the state of the art Video Codec, at a decoding complexity that is quite manageable. ( Lower than AV1 )

I am afraid this has not been substantiated by any of the public decoder demonstrations I've seen. Please see the most recent VVC technical update presented at the MC-IF meeting on March 2nd of this year:

https://a7dce6fd-e8f0-45f7-b0b0-255c5c9a28e1.filesusr.com/ug...

On slide 10 is a graph of VVC performance showing the VTM (VVC) decoder at 2.0x the complexity of HM (HEVC). On slide 12, the Ittiam production decoder boasts 1920x1080 @ 24fps on a 4-core Cortex A75 @ 2.5GHz.

Compare that with this recent study of dav1d (AV1) decoder complexity on a broad set of mobile SOCs, where 1920x1080 @ 24fps was easily reached by a Google Pixel 1 from 2016! Using just the two LITTLE cores! Even higher frame rates were achieved with more modern devices:

https://www.reddit.com/r/AV1/comments/gncplq/av1_multithread...

Full disclosure, I contribute to the dav1d project and performed this study.


Yes. But I think it should be noted the HM and VTM are reference encoder and decoder, they are not meant to be used for production nor in any way optimised.

But dav1d is a truly amazing pieces of work!


> But I think it should be noted the HM and VTM are reference encoder and decoder, they are not meant to be used for production nor in any way optimised.

Sure, but my comparison was to the Ittiam production decoder.


You may be interested in reading the license page for the royalty-free Opus audio codec:

https://opus-codec.org/license/

Full disclosure, I am a member of the Xiph.Org Foundation.


Importantly, Xiph.org is a 501(c)(3).


Please see the W3C Patent Policy regarding Web standards.

https://www.w3.org/Consortium/Patent-Policy-20040205/

> The goal of this policy is to assure that Recommendations produced under this policy can be implemented on a Royalty-Free (RF) basis.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: