I don't think so, pdf.js is able to render a pdf content.
Which is different from extracting "text".
Text in PDF can be encoded in many ways, in an actual image, in shapes (think, segments, quadratic bezier curves...), or in an XML format (really easy to process).
PDF viewers are able to render text, like a printer would work, processing
command to show pixels on the screen at the end.
But often, paragraph, text layout, columns, tables are lost in the process.
Even though, you see them, so close yet so far.
That is why AI is quite strong at this task.
You are wrong. Pdf.js can extract text and has all facilities required to render and extract formatting. The latest version can also edit PDF files. It’s basically the same engine as the Firefox PDF viewer. Which also has a document outline, search, linking, print preview, scaling, scripting sandbox… it does not simply „render” a file.
The purpose of my original comment was to simply say: there’s an existing implementation so if you’re building a pdf file viewer/editor, and you need inspiration, have a look. One of the reasons why mozilla is doing this is to be a reference implementation. I’m not sure why people are upset with this. Though, I could have explained it better.
What about all the designers that used to handle this issue by tweaking their font weights, colors...?
Changing a long standing issue with a lot of users will break things for some people (using edge, with an eyes for these details..)
Web designers must simply accept that they are working in a medium that is not amenable to (sub)pixel-perfect reproducibility. It is not—and never has been—safe to assume that future versions of a given browser will render identically to the current one. It is not safe to assume that the browsers you are fine-tuning for will remain popular in the future. It is not safe to assume that other users with the same browser version (or a browser claiming to be the same version) as you are using will get an identical rendering as you get on your system.
As a web designer, you must accept that the browser is fundamentally not yours to control. It is an agent acting on behalf of the user, not on behalf of you.
this could be used as an incredible low bitrate codec for some streaming use cases.
(video conferencing/podcasts on <3G for ex, just use some keyframes + the audio).
If you understand how LLMs work, you should disregard tests such as:
- How many 'r's are in Strawberry?
- Finding the fourth word of the response
These tests are at odds with the tokenizer and next-word prediction model.
They do not accurately represent an LLM's capabilities.
It's akin to asking a blind person to identify colors.
Ask a LLM to spell "Strawberry" one character per line. Claude's output, for example:
> Here's "strawberry" spelled out one character per line:
s
t
r
a
w
b
e
r
r
y
Most LLMs can handle that perfectly. Meaning, they can abstract over tokens into individual characters. Yet, most lack the ability to perform that multi-level inference to count individual 'r's.
From this perspective, I think it's the opposite. Something like the strawberry-tests is a good indicator how far the LLM is able to connect individually easy, but not readily interconnected steps.
The funny thing about those "tests" is that LLMs are judged by their ability to do that themselves, as opposed to their ability to write code that does it. The best LLMs still fail at doing the task themselves, because they fundamentally are not designed to do anything except predict tokens. But they absolutely can write code that does it perfectly, and can write code that does so many things better than that.
I'm not going to argue these are good tests, if you asked a coworker these questions they'd look at you weird, but what surprised me is how well you can encode a sentence never written down before, put it through base64 encoding, and then ask an llm to decode it. And the good models can do this surprisingly well.
> Minor rant incoming: Something's not working? Maybe a service is down. docker-compose ps. Yep, it's that microservice that's still buggy. No issue, I'll just restart it: docker-compose restart. Okay now let's try again. Oh wait the issue is still there. Hmm. docker-compose ps. Right so the service must have just stopped immediately after starting. I probably would have known that if I was reading the log stream, but there is a lot of clutter in there from other services. I could get the logs for just that one service with docker compose logs --follow myservice but that dies everytime the service dies so I'd need to run that command every time I restart the service. I could alternatively run docker-compose up myservice and in that terminal window if the service is down I could just up it again, but now I've got one service hogging a terminal window even after I no longer care about its logs. I guess when I want to reclaim the terminal realestate I can do ctrl+P,Q, but... wait, that's not working for some reason. Should I use ctrl+C instead? I can't remember if that closes the foreground process or kills the actual service.
What a headache!
Memorising docker commands is hard. Memorising aliases is slightly less hard. Keeping track of your containers across multiple terminal windows is near impossible. What if you had all the information you needed in one terminal window with every common command living one keypress away (and the ability to add custom commands as well). Lazydocker's goal is to make that dream a reality.
yeah, I wonder if there's room for a different networking abstraction that could address most of complex orgs networking issues, I, for sure, don't think that we should still think about cidr range limitations when making networks, for ex.
that said, I'm not sure the tailscale approach scales well in typical modern corporate environments, where you've got a small army of junior devops often overlooking security or cost implications (don't forget about egress costs!).
the traditional, meticulous approach of segmenting networks into VPCs, subnets, etc., with careful planning of auth, firewall rules and routes, helps limit the blast radius of mistakes.
tailscale's networking & security model feels simple and flat, which is great for usability, but it lacks the comforting "defense in depth" that will be asked in most big corps.
Oh, I see the links now, thanks! But they reference four different licenses, and those are the licenses just for model weights I think?
If the intention was to make something that you can only use with Llama models, stating that clearly in a separate code license file would be better IMO. (Of course, this would also mean that the code still isn’t open source.)
> How would I monitor the replica?
A simple cron task that pings a health check if everything is OK (lag is < x) would be a good start.
There are many solutions, highly dependent on your context and the scale of your business. Options range from simple cron jobs with email alerts to more sophisticated setups like ELK/EFK, or managed services such as Datadog.
> How do I failover to the replica if the primary goes down?
> Should I handle failover automatically or manually?
> Do I need two replicas to avoid a split-brain scenario? My head hurts already.
While it may be tempting to automate failover with a tool, I strongly recommend manual failover if your business can tolerate some downtime.
This approach allows you to understand why the primary went down, preventing the same issue from affecting the replica. It's often not trivial to restore the primary or convert it to a replica.
YOU become the concensus algorithm, the observer, deciding which instance become the primary.
Two scenarios to avoid:
* Falling back to a replica only for it to fail (e.g., due to a full disk).
* Successfully switching over so transparently that you will not notice that you're now running without a replica.
> After a failover (whether automatic or manual), how do I reconfigure the primary to be the primary again, and the replica to be the replica?
It's easier to switch roles and configure the former primary as the new replica. It will then automatically synchronize with the current primary.
You might also want to use the replica for:
* Some read-only queries. However, for long-running queries, you will need to configure the replication delay to avoid timeouts.
* Backups or point-in-time recovery.
If you manage yourself a database, I strongly recommand to gain confidence first in your backups and your ability to restore them quickly. Then you can play with replication, they are tons of little settings to configure (async for perf, large enough wall size to restore quickly, ...).
It's not that hard, but you want to have the confidence and the procedure written down before you have to do it in a production incident.
This is easily the most appealing thing to me about Go. I learned Go through the "Learn Go with Tests" way and I had a ton of fun.
It is hard for me to recommend using Go internally since .NET/Java are just as performant and have such a mature ecosystem, but I crave simplicity in the core libraries.
OpenJDK and .NET compilers run circles around Go one. It's not even close. The second you go beyond "straight-line" code where function body has limited amount of locals and does not make much calls, the difference becomes absolutely massive. Go also does not do any sort of "advanced" devirtualization that is bread and butter of both to cope with codebase complexity and inevitable introduction of abstractions. Hell, .NET has surpassed Go in compilation of native binaries too. Here's a recent example: https://news.ycombinator.com/item?id=41234851
In terms of GC, Go has specialized design that makes tradeoffs to allow consistent latency and low memory usage. However, this comes with very low sustained allocation and garbage collection throughput, and Go the language itself does not make it necessarily obvious where allocations happen, so, as sibling discussions here and under Go iterators submission indicate, this results in the amount of effort to try to get rid of all allocations in a hot path that is unthinkable in C#, which makes it much more straightforward, and is also able to cope with high allocation throughput with ease, much like Java.
It is indeed true that Java makes different design choices when tuning its GC implementations, but you might see much closer to Go-like memory usage from .NET's back-end services now that DATAS is enabled by default, without the tradeoffs Go comes with.
Noting that the article's findings from 2018 need to be re-evaluated on up-to-date versions before deriving conclusions because in the last 6 years (and especially in the last 3 or so for .NET) the garbage collector implementations of both Go and .NET have evolved quite significantly. The sustained multi-core allocation throughput graph more or less holds but other numbers will differ significantly.
One of the major factors that play in Go's favour is the right attitude to architecting the libraries - the zero-copy slicing is much more at the forefront in Go than in .NET (technically incorrect but not in terms of how the average impl. looks like), and the flexible nature of C# combined with it being seen as "be glad we even support this Microsoft's Java" by many vendors lead to poor quality vendor libraries. This results in the experience where developers see Go applications be more efficient, not realizing that it's the massively worse quality implementation of a dependency their .NET solution has to deal with (there was a recent comparison video, where .NET was estimated to be slower, but the reality was that it wasn't .NET but the AWS SDK dependency and the benchmark author being most familiar with Go and making optimal choices with significant impact there like using DB connection pooling).
I'm often impressed by how much punishment GC and compiler can take, continuing to provide competitive performance despite massive amounts of data reallocations and abstraction bloat thrown at it by developers who don't want to even consider to approach C# in an idiomatic C# way (at the very least by listening to IDE suggestions and warnings). In some areas, I even recommend to look at community libraries first which are likely to provide far superior experience if documentation and brief code audit indicate that its authors care(tm) which is one of the most important metrics.
> Go also does not do any sort of "advanced" devirtualization
Depends on the implementation. gc doesn't put a whole lot of effort into optimization, but it isn't the only implementation. In fact, the Go project insists that there must be more than one implementation as part of its mandate.
GoGC is the fastest overall implementation and the one that is being used in >95% cases, with the alternatives not being-up-to-date and producing slower code, aside from select interop scenarios.
Until this changes, the "Depends on the implementation" statement is not going to be true in the context of better performance.
That's not me, and I use it. I like TS, but in the browser. It has not much use elsewhere, certainly not in the backend. Go is not only simple and stable, it's quite flexible, has a good eco-system, a wonderful build system, and is really fast and light at runtime.
- prioritize security: get patchs ASAP
- prioritize availability: get patchs after a cooldown period
Because ultimately, it's a tradeoff that cannot be handled by Cloudflare. It depends on your business, your threat model.