Suppose there's a pdf with lots of tables i want to scrape. I mention the pdf url in my message and with gemini's url context tool, i now have access to the pdf.
I can ask gemini to give me the pdf's content as a json and it complies most of the time. But at times, there's an introductory line like "Here's your json:". Those introductory lines interfere with programmatically using the output. They're sometimes there, sometimes not.
If I could have structured output at the same time as tool use, I can reliably use what gemini spits out as it'll be in a json, no annoying intro lines.
Try with https://aistudio.google.com
Think the page limit is a vertex thing
The only limit in reality is the number of input tokens taken to parse the pdf.
If those tokens + tokens for the rest of your prompt are under the context window limit, you're good.
This is for anyone coming across this link later. In their latest SDKs, if you want to completely switch off their safety settings, the flag to use is 'OFF' and not 'BLOCK_NONE' as mentioned in the docs in the link above.
The Indian Express page should explicitly state that the comparison was done using AI. Maybe it's because I am on mobile but I don't see anything of that sort.
i am writing about this on hacker news, have posted a twitter thread about it and also put up my code on Github. Nothing is being hidden here mate, dont know how much more transparent I can get :)
A lot more by mentioning on the actual page that the main content is AI generated considering that the neither the GitHub repo nor the Twitter thread nor this thread are mentioned on the page which is meant for public consumption.
There is a disclaimer for this on the desktop page. But when rendered on mobile it isn't there.
I'm a data journalist and I live in a village in Kerala, a state in southern India.
We're getting into summer here, so I wanted to see how hot villages in Kerala get in general, and how the temperatures now compare to 10 or so years ago.
Made use of satellite data for my analysis, mainly land surface temperature data from MODIS.
Was thinking of using Google Earth Engine for this story, but decided to go with Planetary Computer since it's Python-centric and I'm more familiar with that language. (GEE is more javascript-oriented.)
Pretty sure earth/geo/environment science guys won't be happy with my methodology, but hopefully there's enough of a logic to it for the story to be acceptable as a basic analysis.
Hi, my name is Shijith and I'm a freelance data journalist from India.
Just wanted to plug my new story on streaming services in India and how well they cover western music.
I specifically look at much access they give to critically-acclaimed albums from the past and present.
I used features like 'Best Albums Ever', 'Best albums of 2021' and so on from top music publications and websites to come up with a list of albums in each genre (Rock, EDM etc.) these services should have.
Services were rated out of 10 in each genre, with the rating corresponding to what percent of the album list they have in their library. Guess it won't be a surprise for people in India that Apple or Spotify came first in most lists.
In the overall rating that looks at performance across all genres, the top four spots were taken by global services — Spotify (9.3/10), Apple Music (9.3), YouTube Music (8.9) and Amazon Music (8.5). Of services based in India, only JioSaavn (8.1) came anywhere close.
Hi, my name is Shijith, and I'm a freelance data journalist from India (Worked previously at Hindustan Times and IndiaSpend).
Just posting a data story I did recently about wikipedia abuse in India. Such abuse is an old problem, but it's getting more media attention with users distorting facts on pages about the Delhi riots or farmer protests. Sometimes users engage in straight out vandalism where they delete whole sections from a page.
I tried to determine which wikipedia pages faced the most abuse this year, and also introduce a twitter account that allows people to track wikipedia abuse weekly.
(Am in the process of re-working the code. Right now it's querying the wikipedia API every week for edit histories of over 150k articles, and the whole run is taking 2 days now. Discovered an API endpoint for recent changes that should make things more efficient.)
Have any questions or feedback, do let me know below!
Just submitting a link here for visibility, hopefully it'll come up in someone's google search later on and help them. The blogpost is about a list of filters I came up with to try and make websites like Twitter and Reddit less stressful. (List of filters at https://github.com/shijithpk/hide_like_counts_with_ublock )
You copy the filters into the adblocker uBlock Origin, after which all elements on their webpages that show any stats, eg. like counts, retweet counts, karma points, upvotes etc. get hidden.
This is mainly for desktop users. You could possibly use these filters in adblockers other than uBlock Origin, not tested it though.
Hopefully, Twitter and Reddit will introduce settings like Instagram/Facebook have to hide like counts, but till that happens, this filter list is a handy option.
twitter_list_mgmt is a python package I created to make it easier to add users to your Twitter list from other lists, among other things.
Say you've created a covid twitter list to keep track of news around the pandemic. You've just found another list on covid curated by an epidemiologist in London, and you want to add members from that to your own Covid list. This is the package you use for it.
Now for most basic operations like retrieving the current membership of a Twitter list, adding users to it, removing them etc. the Tweepy library is good enough. twitter_list_mgmt just adds extra functionality on top of Tweepy to make working with lists easier.
This package will help heavy twitter and tweetdeck users, especially those who use lists to manage the firehose of information from social media.
Have any questions or comments, do let me know! Also, not a professional developer/programmer, so be nice :)