I did not (OpenAI does not allow Chinese to pay for Plus due to politics, and I couldn't be bothered to try circumventing it), and I am curious about it because it does seem like a very difficult job.
I did try the latest Qwen though, and it was able to locate to the correct city, but it's still tens of km off (it guessed a tourist attraction in the city center instead of the actual district).
o3 is a massive difference indeed; i just took a random picture outside my country house (so the house is not on there) in the middle of nowhere, removed all the meta info, set my vpn to the other side of the world and tried it; it guessed it very close from a few far away landmarks. Very impressive.
Nice, these “Markdown for X” tools are super neat. Wish this worked nicely with mobile view, seems like a lot of the text are overflowing and margins are squished in the demo
Assuming running this new computer interactivity feature is as fast as cursor composer (which I don’t think it is)—it still doesn’t support codebase indexing, inline edits or references to other variables and files in the codebase. I can see how someone could use this to make some sort of cursor competitor but out of the box there’s a very low likelihood it makes cursor obsolete.
i really want cursor to integrate this so it can look at the results of a code change in the browser and then make edits as needed until it's accomplished what i asked of it. same for errors in the console etc. right now i have to manually describe the issue or copy and paste the error message and it'd be nice for it to just iterate more on its own
Great work! One of the things that would be incredibly useful/interesting would be generating a reusable script with an LLM, instead of just grabbing the data. In theory, this should result in a massive cost reduction (no need to call the LLM every time) as long as the source code doesn’t change which would make it sustainable for constant and frequent monitoring.
This approach was studied in a paper called Evaporate+ - https://www.vldb.org/pvldb/vol17/p92-arora.pdf They used active learning to pick the best function among candidate functions generated by the LLM on a sampled set of data.
I’ve worked on this exact problem when extracting feeds from news websites. Yes calling LLM each time is costly so I use LLM for the first time to extract robust css selectors and the following times just relying on those instead of incurring further LLM cost.
I'm working on this problem now. It's possible in some sources - whenever the HTML structure is enough that you map it to the feature of interest - but it could also happen that the information is hidden within the text, which makes it virtually impossible
Cool but why does clicking both “see what people have added” and “add an entry” link me to the same page to add an entry? Is there no way to view the collection without adding an entry?