Hacker Newsnew | past | comments | ask | show | jobs | submit | tallytarik's commentslogin

Working on improving the data pipeline for https://iplocate.io - an IP intelligence service I've worked on since 2017. A couple of recent focuses:

1. VPN and proxy detection. We already track dozens of providers, but we can do better here. There's also a bunch of metadata we collect as part of this process which we don't currently surface, so I'm looking at what else we can bring to our databases and free API.

2. Better detail and evidence on how we build and test our own geolocation database, which we create from scratch. There's been a recent trend of misinformation about geo accuracy, including from some other providers, so I want to better explain the accuracy (and inaccuracy) of various techniques, our policy for when we prefer certain data, and so on.

(Open to partnerships for any folks looking for a new provider!)


There are plenty of VPN and proxy detection services, either as a service (API) or downloadable database, which are surprisingly comprehensive. Disclaimer: I’ve run one since 2017. Years on, our primary data source is literally holding dozens of subscriptions to every commercial provider we can find, and enumerating the exit node IP addresses they use.

There are also other methods, like using zmap/zgrab to probe for servers that respond to VPN software handshakes, which can in theory be run against the entire IP space. (this also highlights non-commercial VPNs which are not generally the target of our detection, so we use this sparingly)

It will never cover every VPN or proxy in existence, but it gets pretty close.


> Years on, our primary data source is literally holding dozens of subscriptions to every commercial provider we can find, and enumerating the exit node IP addresses they use.

Assuming your VPN identification service operates commercially, I trust that you are in full compliance with all contractual agreements and Terms of Service for the services you utilize. Many of these agreements specifically prohibit commercial use, which could encompass the harvesting of exit node IP addresses and the subsequent sale of such information.


TOS are pretty meaningless in cases like this. It amounts to getting rejected as a customer and your account canceled.


I think ToS violations can also run afoul of CFAA.


Those are pretty old cases that I think the courts have moved away from and even in those cases it was a TOS violation and explicit c&d that the company ignored.


I don't think they can any longer, I think there is case law on this.

Illinois law makes it a misdemeanor to violate web site ToS, though. And felony for the second time IIRC. Other states probably also.


Maybe the tables could be turned and we can build a service with dozens of subscriptions to every VPN detection service and report them for ToS violations ;)


> I trust that you are in full compliance with all contractual agreements and Terms of Service

Why? It's not like there's any real moral (or, likely, legal) reason to care beyond avoiding the service's ban hammer.


In Illinois you could, in theory, be jailed for up to three years for violating a web site ToS. (classified as "Computer Tampering")


I don't think that would hold up in court anymore.


It's a statutory offense, so you could get lucky and the prosecutor wouldn't prosecute it, but it's there for them to use:

https://www.ilga.gov/Documents/legislation/ilcs/documents/07...

... "the owner authorizes patrons, customers, or guests to access the computer network and the person accessing the computer network is an authorized patron, customer, or guest and complies with all terms or conditions for use of the computer network that are imposed by the owner;"


There's a little secret that most of the business world knows but individuals do not know: You don't have to follow Terms of Service. In most cases, the maximum penalty the company can impose for a ToS violation is a termination of your account. And it's not illegal to make a new account. They can legally ban you from making a new account, and you can legally evade the ban.

Unless you're the one-in-a-million unlucky user who gets prosecuted under the CFAA's very generic "unauthorized access to a protected computer" clause, like Aaron Swartz. It seems the general consensus is this doesn't apply to breaking a website ToS, and Aaron was only in so much trouble because he broke into a network closet, as well as for copyright violation. But consult a lawyer if unsure. (That's another difference: A business will ask a lawyer if it wants to do something shady, while an individual will simply avoid doing it)


Tangent: if you hold access to all VPN providers, have you thought about also releasing benchmarks for them? I would be interested in knowing which ones offer the best bandwidth / peering (ping).


> which are surprisingly comprehensive

How does the buyer even know what the precision and recall rates might be?


Probably contrary to the stealth aspect.


This will also cause problems with anyone that happens to (even accidentally/unknowingly) use apps that integrate services from companies such as BrightData/Luminati/HolaVPN/etc. where they sell idle time on your device/connection to their VPN/proxy customers.

The legitimate end-user will then no longer be able to use e.g. SoundCloud.


I fail to see the problem if people that allow their internet connection used by scammers/AI crawlers are banned from every service


I’m with you on this one. Some of my projects are flooded with sus traffic from Brazil. I don’t believe there are a million eager Brazilian hackers targeting me in particular. It’s pretty clear from analysis that they’re all residential hosts running proxies, knowingly or otherwise.

The more concise word for this is “botnet”. Computers participating in one should be quarantined until they stop.


> unknowingly

Often times random shovelware apps will have these proxy SDKs embedded in them, and the only mention of it being part of the software is buried in some long ToS that nobody reads.


Sort of valid today.

But the more sites that require a residential VPN for normal use, the less legitimate that argument becomes.


You might want to learn how internets work today: https://en.wikipedia.org/wiki/Network_address_translation


Interesting. I assumed all VPNs switched to IPv6 by now, making detection much harder.


IPv6 isn't magically unrouteable, it just routes much larger blocks of "end IP addresses."

You just track and block /24 or /16 as necessary.


Much of the internet still does not support IPv6, so most providers will give you an IPv4 address. In fact only a few providers even support IPv6 at all.

Even with IPv6 it's not a huge problem. With a few samples we can know that a provider is operating in a given /64 or /48 or even /32 space, and can assign a confidence level that the range is used for VPNs.


Many websites including Soundcloud are still only accessible through IPv4, so this is moot, even if VPNs support IPv6 it's enough to block their V4 exit nodes for Soundcloud.


just out of curiosity: if i'm located in spain and i setup an ec2 or digital ocean instance in germany and use it as a socks proxy over ssh, do you will detect me?


It is even easier to block hosting providers. They typically publish official lists. Here's the full list for both of those providers:

https://ip-ranges.amazonaws.com/ip-ranges.json

https://digitalocean.com/geo/google.csv

(And even if they don't publish them, you can just look up the ranges owned by any autonomous network with the appropriate registry.)


It won’t end up in our proxy detection database, but we track hosting provider ranges separately: https://www.iplocate.io/data/hosting-providers/


That's a hosting service IP block. Some sites block them already. Netflix for instance.


who's buying your service ?


Sounds like snitching as a service


Most of these providers are in fact open about the fact that these locations are “virtual”, so it’s misleading to say they don’t match where they claim to be.

There is however an interesting question about how VPNs should be considered from a geolocation perspective.

Should they record where the exit server is located, or the country claimed by the VPN (even if this is a “virtual” location)? In my view there is useful information in where the user wanted to be located in the latter case, which you lose if you only ever report the location of servers.

(disclaimer: I run a competing service. we currently provide the VPN reported locations because the majority of our customers expect it to work that way, as well as clearly flagging them as VPNs)


Yeah, Proton is quite explicit about that: https://protonvpn.com/support/how-smart-routing-works


I work for IPinfo, and I appreciate your comment.

Our product philosophy is centered on accuracy and reliability. We intentionally diverge from the broader IP geolocation industry's trust-based model. Instead of relying primarily on "aggregation and echo", we focus on evidence-backed geolocation.

Like others in the industry, we do ingest self-reported IP geolocation data, and we do that well. Given our scale and reputation, we receive a significant volume of feedback and guidance from network operators worldwide. We actively conduct outreach, and exchange ideas with ISPs, IXPs, and ASNs. We attend NOG events, participate in research conferences, and collaborate with academia. We have a community and launch hackathon events, which allow us to talk to all the stakeholders involved.

Where we differ is in who our core users are. Our primary user base operates at a critical scale, where compromises on data accuracy are simply not acceptable. For these users, IP geolocation cannot be a trust-based model. It must be backed by verifiable data and evidence.

We believe the broader internet ecosystem benefits from this approach. That belief is reflected in our decision to provide free data downloads, a free API with unlimited requests, and active collaboration with multiple platforms to make our data widely accessible. Our free datasets are licensed under CC-BY-SA 4.0, without an EULA, which makes integration, even for commercial use straightforward.

I appreciate you recognizing that our product philosophy is different. We are intentionally trying to differentiate ourselves from the industry at large, and it is encouraging to see competing services acknowledge that they are focused on a different model.


If we can pay them in virtual dollars, no problem


Working on improving the data pipeline for https://iplocate.io - an IP intelligence service I've worked on since 2017.

Recent focus has been on geolocation accuracy, and in particular being able to share more data about why we say a resource is in a certain place.

Lots of folks seem to be interested in this data, and there's very little out there. Most other industry players don't talk about their methodology, and those that do aren't overly honest about how X or Y strategy actually leads to a given prediction, or the realistic scale or inaccuracies of a given strategy, and so on. So this is an area I'm very interested in at the moment and I'm confident we can do better in. And it's overall a fascinating data challenge!


Our government has been paying Deloitte & co. to produce slop for years before AI was being used to generate said slop.

Can we get a refund for all of the others too?


ISPs have no obligation, although the ubiquity of sites and apps relying on IP geolocation mean that ISPs are incentivized to provide correct info these days.

I run a geolocation service, and over the years we've seen more and more ISPs providing official geofeeds. The majority of medium-large ISPs in the US now provide a geofeed, for example. But there's still an ongoing problem in geofeeds being up-to-date, and users being assigned to a correct 'pool' etc.

Mobile IPs are similar but are still certainly the most difficult (relative lack of geofeeds or other accurate data across providers)


Mobile IPs reflect the user's "registered area" at best, not their actual location.

This is mostly because of how APNs / G-GNS / P-GW systems work. E.G. you may have an APN that puts you straight in a corporate network, and the mobile network needs you to keep using that APN when roaming. This is why your roaming IP is usually in the country you're from, not the one you're currently in.

I've heard of local breakout being possible, but never actually seen it in practice.


I thought this was going to be an analysis of articles that are clearly AI-generated.

I feel like that’s an increasing ratio of top posts, and they’re usually an instant skip for me. Would be interested in some data to see if that’s true.


G2, Sourceforge (yes, that one), and Gartner’s Capterra/GetApp/SoftwareAdvice all have the same business plan: charge vendors $x,xxx+ per month to outrank other vendors in their made up categories.

Of course, you can technically list for free.

But look! For the low low price of $x,xxx per month, now you can show one of 40 tailor-made award icons on your site!

Or, unlock the privilege of showing “user reviews” from our site on your site! (of course if you had managed to get reviews independently, you’re not allowed to use the widget without paying)

Don’t have reviews? Ah, I forgot to mention. The $x,xxx plan also comes with “review generation” — we’ll pay users to write reviews for you!

Oh, and on an unrelated note, the $x,xxx plan just also happens to unlock dofollow links across each of those 40 made up categories, which all rank highly in google. And the $xx,xxx plan means that - user ratings aside - you can end up at the top of those categories.

It’s hard to describe it other than the author says: a grift. Seeing those logos on other companies sites are now a huge turn off to me personally, and I haven’t yet capitulated for my own SaaS, but I suspect this isn’t the feeling of the execs they seek to target. Or maybe it is, and it’s just the price of doing business.


I think this is the at same model that the NYTimes book reviews back had in the 1990s. Pay us money and we'll say nice things.

It'll be interesting to see how AI Agents approach things. My prediction is that more of our media is going to be controlled by our AI Agent's Algorithm instead of Google, Twitter, and Facebook's algorithm or some distant editors who decided what went on the front page of the newspaper.


I've tried variations of this. I find it will often cause it to include cringey bullshit phrases like:

"Here's your brutally honest answer–just the hard truth, no fluff: [...]"

I don't know whether that's better or worse than the fake flattery.


You need a system prompt to get that behaviour? I find ChatGPT does it constantly as its default setting:

"Let's be blunt, I'm not gonna sugarcoat this. Getting straight to the hard truth, here's what you could cook for dinner tonight. Just the raw facts!"

It's so annoying it makes me use other LLMs.


Its response is still flattery, just packaged in a different form. Patronizing, actually.


Similar experience, feels very ironic


Curious whether you find this on the best models available. I find that Sonnet 4 and Gemini 2.5 Pro are much better at following the spirit of my system prompt rather than the letter. I do not use OpenAI models regularly, so I’m not sure about them.


That is not the spirit nor the letter though.


That is a good point. I guess the reason that distinction came to mind is that what’s happening here is the LLM trying to manifest its obedience in letter (i.e., by saying it).


SEEKING FREELANCER | Remote | Integration Engineers, Content Writers

IPLocate is on a mission to provide developers with reliable, affordable, and easy-to-use IP address intelligence - geolocation, threat data, network information and more.

We're looking for engineers to write SDKs and integrations to use our APIs with popular programming languages, frameworks, and tools. We would prefer to work with multiple folks who are experts in their respective language/framework rather than a single engineer to write 20 integrations, so we'd love to hear about your experience.

We're also looking for content writers to help write practical tutorials, step-by-step guides, and real-world use cases for our website and blog, and for publication elsewhere (e.g. Medium, Dev.to).

Details and contact links: https://www.iplocate.io/build-for-iplocate

(We've recently launched this page as an open offer to interested folks. Get in touch with your details and we're happy to formalize an offer.)


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: