More

shijithpk · 2025-10-21T12:05:10 1761048310

Here you go -- https://zhipuai.cn/en/aboutus

shijithpk · 2025-09-30T18:05:02 1759255502

Suppose there's a pdf with lots of tables i want to scrape. I mention the pdf url in my message and with gemini's url context tool, i now have access to the pdf.

I can ask gemini to give me the pdf's content as a json and it complies most of the time. But at times, there's an introductory line like "Here's your json:". Those introductory lines interfere with programmatically using the output. They're sometimes there, sometimes not.

If I could have structured output at the same time as tool use, I can reliably use what gemini spits out as it'll be in a json, no annoying intro lines.

shijithpk · 2025-02-06T12:16:54 1738844214

Try with https://aistudio.google.com Think the page limit is a vertex thing The only limit in reality is the number of input tokens taken to parse the pdf. If those tokens + tokens for the rest of your prompt are under the context window limit, you're good.

shijithpk · 2025-02-06T12:05:28 1738843528

This is for anyone coming across this link later. In their latest SDKs, if you want to completely switch off their safety settings, the flag to use is 'OFF' and not 'BLOCK_NONE' as mentioned in the docs in the link above.

The Gemini docs don't refect that change yet. https://discuss.ai.google.dev/t/safety-settings-2025-update-...

shijithpk · 2025-02-03T14:05:38 1738591538

Hi, my name is Shijith and I'm a data journalist from India.

So we have a state election in Delhi this week (Feb 5), and I wanted to make a guide to help residents decide who to vote for.

Now parties usually declare their political platforms, their programs etc. in manifestos released before the election.

These manifestos are 20-30 pages long sometimes, and can be a slog to get through.

And if there is a particular issue you're worried about, say pollution, it's difficult to compare them all at once.

I decided to bring AI into the picture to compare the manifestos on 10 key issues like pollution and eduction.

Specifically, I used Google's AI model Gemini 1.5 Pro to scrape the various pdfs and generate the text for the comparison.

The github repo contains the python code I used to interact with the API.

You can see the guide I created at https://data.indianexpress.com/projects/manifesto-guide

I also wrote a long twitter thread that goes into more detail on how this was made: https://x.com/shijith/status/1879145563234648564

If you have any questions or feedback, do let me know. Thanks! -shijith

ksynwa · 2025-02-03T14:10:12 1738591812

The Indian Express page should explicitly state that the comparison was done using AI. Maybe it's because I am on mobile but I don't see anything of that sort.

shijithpk · 2025-02-03T14:25:34 1738592734

i am writing about this on hacker news, have posted a twitter thread about it and also put up my code on Github. Nothing is being hidden here mate, dont know how much more transparent I can get :)

ksynwa · 2025-02-03T15:26:44 1738596404

> dont know how much more transparent I can get

A lot more by mentioning on the actual page that the main content is AI generated considering that the neither the GitHub repo nor the Twitter thread nor this thread are mentioned on the page which is meant for public consumption.

There is a disclaimer for this on the desktop page. But when rendered on mobile it isn't there.

:)

shijithpk · 2025-02-03T16:40:59 1738600859

> There is a disclaimer for this on the desktop page. But when rendered on mobile it isn't there.

Hmm, wonder why this guy chose to do it this way. Was it a design decision? Was it because of the tight space? We will never know.

ksynwa · 2025-02-03T17:32:13 1738603933

Ah yes. Mobile web pages famous for not being able to fit in 58 characters.

:)

shijithpk · 2025-02-03T18:07:12 1738606032

Because mobile design is all about fitting in text wherever you can, right? :)

ksynwa · 2025-02-03T18:37:00 1738607820

Absolutely. With one tenth of the effort that you put into bragging about throwing three documents at an LLM it is certainly doable.

:)

shijithpk · on Feb 24, 2023

I'm a data journalist and I live in a village in Kerala, a state in southern India.

We're getting into summer here, so I wanted to see how hot villages in Kerala get in general, and how the temperatures now compare to 10 or so years ago.

Made use of satellite data for my analysis, mainly land surface temperature data from MODIS.

Was thinking of using Google Earth Engine for this story, but decided to go with Planetary Computer since it's Python-centric and I'm more familiar with that language. (GEE is more javascript-oriented.)

Pretty sure earth/geo/environment science guys won't be happy with my methodology, but hopefully there's enough of a logic to it for the story to be acceptable as a basic analysis.

You can see my code at the github repo (https://github.com/shijithpk/hottest-panchayats-kerala)

Have any suggestions or feedback, let me know, thanks!

shijithpk · on Sept 11, 2022

Hi, my name is Shijith and I'm a freelance data journalist from India.

Just wanted to plug my new story on streaming services in India and how well they cover western music.

I specifically look at much access they give to critically-acclaimed albums from the past and present.

I used features like 'Best Albums Ever', 'Best albums of 2021' and so on from top music publications and websites to come up with a list of albums in each genre (Rock, EDM etc.) these services should have.

Services were rated out of 10 in each genre, with the rating corresponding to what percent of the album list they have in their library. Guess it won't be a surprise for people in India that Apple or Spotify came first in most lists.

In the overall rating that looks at performance across all genres, the top four spots were taken by global services — Spotify (9.3/10), Apple Music (9.3), YouTube Music (8.9) and Amazon Music (8.5). Of services based in India, only JioSaavn (8.1) came anywhere close.

There's all this and more in my blog post at https://shijith.com/blog/music-streaming-india/

And if you're interested in my code it's here: https://github.com/shijithpk/music-streaming-india

If you have any questions or feedback, do let me know. Thanks! -shijith

shijithpk · on Dec 21, 2021

Hi, my name is Shijith, and I'm a freelance data journalist from India (Worked previously at Hindustan Times and IndiaSpend).

Just posting a data story I did recently about wikipedia abuse in India. Such abuse is an old problem, but it's getting more media attention with users distorting facts on pages about the Delhi riots or farmer protests. Sometimes users engage in straight out vandalism where they delete whole sections from a page.

I tried to determine which wikipedia pages faced the most abuse this year, and also introduce a twitter account that allows people to track wikipedia abuse weekly.

This is the twitter account for tracking wikipedia abuse every week: http://twitter.com/abuse_checker

And here's the python code I used for the project: https://github.com/shijithpk/wikipedia_abuse_checker

(Am in the process of re-working the code. Right now it's querying the wikipedia API every week for edit histories of over 150k articles, and the whole run is taking 2 days now. Discovered an API endpoint for recent changes that should make things more efficient.)

Have any questions or feedback, do let me know below!

shijithpk · on July 30, 2021

Just submitting a link here for visibility, hopefully it'll come up in someone's google search later on and help them. The blogpost is about a list of filters I came up with to try and make websites like Twitter and Reddit less stressful. (List of filters at https://github.com/shijithpk/hide_like_counts_with_ublock )

You copy the filters into the adblocker uBlock Origin, after which all elements on their webpages that show any stats, eg. like counts, retweet counts, karma points, upvotes etc. get hidden.

This is mainly for desktop users. You could possibly use these filters in adblockers other than uBlock Origin, not tested it though.

Hopefully, Twitter and Reddit will introduce settings like Instagram/Facebook have to hide like counts, but till that happens, this filter list is a handy option.

shijithpk · on July 12, 2021

twitter_list_mgmt is a python package I created to make it easier to add users to your Twitter list from other lists, among other things.

Say you've created a covid twitter list to keep track of news around the pandemic. You've just found another list on covid curated by an epidemiologist in London, and you want to add members from that to your own Covid list. This is the package you use for it.

Now for most basic operations like retrieving the current membership of a Twitter list, adding users to it, removing them etc. the Tweepy library is good enough. twitter_list_mgmt just adds extra functionality on top of Tweepy to make working with lists easier.

This package will help heavy twitter and tweetdeck users, especially those who use lists to manage the firehose of information from social media.

Have any questions or comments, do let me know! Also, not a professional developer/programmer, so be nice :)