Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Analyzing iMessage with SQL (arctype.com)
142 points by tosh on April 20, 2022 | hide | past | favorite | 89 comments


If you are on Mac, open up your Messages and go to any thread with some historical messages. Now scroll up to load past messages, and do that a few times. How slow is that for you?

For me, when I scroll up to load historical data, it takes always multiple _seconds_. This is on the newest, highest specced m1 macbook. What the hell is it doing in that time? Multiple seconds to do read-only queries a <100MB sqlite DB (and sqlite can be incredibly fast), on an SSD with >5GB/s read speeds, with 64GB of ram (where the whole sqlite can/should be loaded into memory), to render a blue bubble with some text inside.

What is Messages doing, and why is it so bad and inefficient. I'm certain the local sqlite db has the data I need because the above bothered me so much i wrote some scripts to read my historical chats from the cli. Quality of Apple software is certainly not increasing but going downhill, you see that across many of their apps.


We're hardly the first people to complain about this, but it drives me nuts. I rarely delete messages (for better or worse), so I have an ever-growing iMessage database that is basically inaccessible to me. I don't know why I pay to back up 20 gigs of messages that I can't even access, but here we are. For instance, I wanted to figure out the exact date I met my partner. The only way to do this on iMessage is to scroll up...all the way up. With a few seconds of god-knows-what in between every page. The endeavor is usually futile, though; because when you get a ways back, the iMessage app will crash.


The worst part: if the person sends you a text while you're scrolled up it will jump all the way to present time.


Btw, if you double click to open a new window, that window won't have the new message jump


I recently figured out that iMazing allows you to export PDFs (among other formats) of all your text convos. It’s pretty perfect for my needs, which were precisely the same scenario (plus printing out a nice history)


I've always wondered about this, but it's not limited to Messages on macOS. WhatsApp is painful when you want to do the same thing. I don't know about other messaging apps, but I'm not sure why there isn't UI/UX where you can easily/quickly scrub through months/years.


But at least in WhatsApp I can scroll as fast as I want while the App never is stuck.

With iMessage it loads for seconds every 20 messages or so. I also really don't get it.


I've been tempted to cobble together a simple GUI that lets you search text, images, sender etc etc. But it would only be desktop since I doubt it would make it through approval for iOS.


To be fair it’s been this bad as far as I can remember with iMessage.

Incredibly frustrating and slow, but not new.


The search is also terrible.


Must be just me or an American thing, but imessage isn't used nearly as much here; probably because Android takes up 80% market share, so people stick to cross-OS applications like Whatsapp.


I think iMessage is uniquely popular in the US since they all got hooked on SMS since their plans got "unlimited free texting" pretty early on, and iMessage is a mostly-transparent layer onto of SMS. Whereas in the rest of the world it was more common to be charged by the message so people adopted apps for "free texting". Also outside of the US it's probably more common to send international texts, which cost even more.


Not really, in Europe you have unlimited sms’ from like forever. Think people went to different platforms because communication was less clumsy and it was easier to do it also with Android friends. So all in one


SMS used to be an important revenue source for telcos. Basically they were charging for a best effort service on a signalling side channel. Free money.


Living in multiple European countries in the beginning of Smartphones (2007 - 2012) I can tell you that your experience is not universal.

In many countries carriers did offer unlimited sms plans, but they would cost a significant amount more over base plans which would say include 100 free texts a month. Also until EU regulations hit if you had any family or friends in other countries or ever went on holiday the cost of SMS was astronomical.

It took many carriers awhile to realize that "free" messaging apps were a significant threat, I assume because they were also charging heavily for data and just assumed because these used data they would make just as much money. Ignoring things like WiFi and how in the long run these apps would use such a small % of the data a user demanded they would be effectively free.


It's a very country-specific thing. In the Netherlands approximately everyone switched to WhatsApp long before unlimited SMS became the norm. WhatsApp was ubiquitous for years before Facebook bought them in 2014, but even 5 or so years ago my cheap monthly plan did not include unlimited SMS.


SMS fees are definitely the main reason a lot of people use apps today in Europe. They might not still be charging the same fees, but the damage is done.

The US phased out their SMS fees about the same time smart phones started to become popular. I think it was only due to regulation.


I thought it was the other way: iMessage lets you “trade” data, which might be ~free when on WiFi, for SMS, which were initially pay-per-use.


I paid ~$10/mo for unlimited SMS back in 2006.


I'm sure unlimited plans existed, but I remember them being "top of the line" and friends certainly got into (minor) trouble for exceeding their plan.

Here's a contemporary article (2008) about the "Rising Cost of Texting" (voice+data+200 SMS/month for $75): https://www.cnet.com/culture/the-rising-cost-of-texting/


It's very popular in Australia and New Zealand, the majority of people I know and work with have iPhones.


Speaking as an American iMessage user, it is very pervasive.

I know a lot of people that cringe when one non-iPhone user is added to a group chat and it turns everything you send green instead of blue.


It also sends it all basically unencrypted at that point, too. (Not that iMessage is much better, as its end-to-end crypto has been backdoored by Apple and kept insecure so that the US feds can read it.)


If you think that any of the consumer services are “secure” from the Feds, I got a bridge with great views of Brooklyn for sale.


I care less about the "Feds" than Comcast, my mobile provider, or literally any other actor on the SMS chain. Not to mention potentially corrupt local law enforcement or litigious former-friends.

Encryption, like deadbolts, helps society function.


https://www.rollingstone.com/politics/politics-features/what...

Signal doesn't keep logs, and is end-to-end encrypted without a key escrow or plaintext escrow backdoor for the transiting middle service. You can review their actual response documents to law enforcement.


love the nihilism


I don't think you can even do iMessage groups with non-iOS users. Does it not send the message directly to that single person and their reply goes directly to the sender (not to the group)?


You can. It’s annoying af when people start liking photos or videos. Also annoying if you’re paying for international texts.


It converts the entire group to SMS when you have a single non-iOS user.


if Apple allowed an iMessage compatible app to be developed on android, r even sold it themselves, I would buy it for this reason. I hate thinking negatively of my friends for buying into this type of attitude instead of getting a 3rd party app that provides the same experience for everyone - but it just didnt happen

so now i have to drop $700 on an iphone for a text messaging app. fucking ridiculous, but i want better communication with my friends without telling them what to do


iPhone is popular and defaults matter. People are not going out of their way to install iMessage.


I use iMessage and WhatsApp with my contacts, but if everyone has iMessage, then we use that so pictures and videos are sent in full quality. But the iMessage groups are usually 4 to 6 people at most.

For the bigger groups, or groups with older people, we use WhatsApp since there might be an Android user or less risk of having to exclude someone because they switched to Android.


Pictures and videos are sent in full quality if they are sent over WiFi. If not, the images and videos could be compressed depending on what your cellular carrier requires.

https://discussions.apple.com/thread/250501900


That is news to me. I had peace of mind thinking that iMessage sends in full quality. At least it should make it known that it was not sent as the original.

Edit: it seems like the way to ensure sharing photos/video at original quality in iOS/macOS is to select the media in photos app, and then share, and then select “copy iCloud link”.


AFAIK, anything shared via iCloud is not original quality

> When shared, photos are reduced to 2048 pixels on the long edge, except panoramic photos, which can be up to 5400 pixels wide. You can share GIFs that are 100MB or smaller.

https://support.apple.com/en-us/HT202786


Damn, I am wrong again. Guess the only quick, integrated way is Airdrop if the other person is nearby.

I do not see a material advantage of iMessage over WhatsApp now.


Same, literally everyone I talk to in the UK uses WhatsApp


In the US your choice of type of phone (iOS or Android) is extremely correlated with the choice of phone of the people you communicate with.


It's because of two reasons:

1. iMessage interacts with SMS just like normal

2. It gives you the "blue bubble" which is a sign of higher class, because you have the more expensive iphone platform.


This has been beat to death, but another perspective is that the blue bubble indicates a better level of service, and is thus preferred. More reliable, faster, better media sharing, etc. Green bubble means the opposite because sms is a vastly inferior user experience. The shunning of green bubble people is because if one person on a group chat is a green bubble person, that ruins the blue bubble experience for everyone else. It's not just arbitrary status signaling for me at least.


I typically use an Android phone. If I use any of the multitude of "over the internet" instant messaging platforms, we all get the full level of service.

But if someone messages me from iMessage, I can only reply via old school SMS/MMS because Apple won't allow any non-Apple users access to their IM platform. This is incredibly annoying and well established as an intentional selling point.

All I hear is "if you want me to really listen to you, then you need to buy a specific brand of device."

A "green bubble person" is just someone forced to use crappy fallback measures because your preferred messaging platform wants to be exclusive. I'd gladly install iMessage (as I've installed many other IM apps) if it was an option, if only so I could stop the endless complaints about "green bubble people" ruining everyone's day.


My experience as an iPhone user in the US is that it's like pulling teeth to convince older Android users to install a third party chat app. They refuse to participate in the solution; they don't even know there's a problem. Most Android users I know are old people. I'm basically on my knees begging them to pick an app, I'll install literally anything. They won't do it. It's not worth the hassle to even try convincing them. Green bubbles it is.


I have an Android phone and if you're desperate you can buy an old Mac Mini and set up an iMessage bridge over Matrix (https://github.com/mautrix/imessage) or BlueBubbles to work around this. I do not recommend it though as it locks you into a facet of the Apple ecosystem.


> More reliable, faster, better media sharing, etc

All my friends on Telegram have iPhones and Android. Oh, and Telegram is objectively a better experience from a UI and UX standpoint than iMessage. And it's more reliable and much faster when sending images or videos, and does not randomly convert my links into obnoxious previews. I don't mind spending ten seconds installing the app _and_ making an account (since that's literally how long it takes) to get that experience.


On the other hand, iMessage is a vastly inferior user experience to Messages, WhatsApp, or Discord. Which is why I leave it disabled, despite having an iPhone. I leave group chats and media sharing to the platforms that are built for it, and use texting for logistics while out and about.


IMO the best messaging app is Telegram. Tons of thoughtful features, constant maintenance, doesn't seem bloated, a desktop and tablet clients that work as well as the phone app, groups, notes, easy bots and stickers, etc. Compared to it, most messaging apps seem primitive and stale.

If only it was fully e2e encrypted, but maybe that's why it's so frictionless. OTOH, e2e encryption is weak when you have groups, as anyone can enter public groups, and a compromised participant in a private group can read everyone's messages.


I suppose if it's that much of a bother to a group of people they could club together and buy me a blue bubble approved device.


> 1. iMessage interacts with SMS just like normal

iMessage doesn't interact with SMS just like normal — the Messages app does. It just switches between SMS and iMessage based on the recipient.


Heads up: Arctype makes a series of outbound requests when launching that, when blocked at the network level, cripple the app and won't allow it to function.


I'm not sure that figuring out who among them you chat most frequently with constitutes "surprising facts about the people you text along the way," I was expecting some lurid metadata leakage not exposed through the UI and left feeling disappointed and a bit clickbaited.


I think that's just a parsing issue/ ambiguity

(surprising facts about the people) you text

Vs

surprising facts about (the people you text)

While i don't use iMessage and therefore have no intention of doing this myself, I find exploring on-device sqlite pretty interesting and enjoyed this read. Doesn't seem at all like click bait


How on Earth does the blogger have only 1GB of iMessage data since 2009?

Mine gets bloated with images and videos to multi gigabytes in days, and it's unlikely that it's used only for text for 13 years. (I mean, it's either never used at all, or used for all sorts of media not just text, and since it has been used, it should be the latter)

Even if the actual media isn't stored in the database, all the metadata for all those messages should exceed 1GB IMO.


> How on Earth does the blogger have only 1GB of iMessage data since 2009?

Anecdatum: My `chat.db` on this laptop is 44MB for 30863 messages going back just under 4 years (basically just a handful of family.) I can see that being 1GB even for triple the length of time and many more interactions, sure.

The `Attachments` folder next to it, on the other hand, is 6.4GB.


Yeah for some reason I assumed the size is for both (db + attachments) while writing the comment, which is wrong.


In the iOS settings under Messages you can choose how long you want to meep your messages and content.

https://www.macrumors.com/how-to/auto-delete-old-messages-ip...


Yeah but I think the author has all the messages retained dating back to the beginning.


Author here. I have sent/received 643,576 messages since 2009 which totaled 590MB. To my knowledge media is never stored in the SQL database (for iMessage).


Wow. I thought it would be much bigger with all the indexes and many columns. (Even if the media itself isn't in the DB)

Thanks for the clarification.


> it's either never used at all, or used for all sorts of media not just text

This seems to be where you went wrong.


This post needs some caveats/updates: 1) You have to enable full disk access to be able to read the db, on Monterey at least. 2) The schema has changed a bit so the queries presented don't work.


A query that works on my Monterey box:

https://news.ycombinator.com/item?id=31103404


Nice writeup. I've been self-studying SQL lately for kicks -- I can bumble my way through a SELECT, INSERT, CREATE TABLE, etc. but things like joins and subqueries are foreign to me. Always neat to see something I'm studying turn up on the HN front page so I can see how far I've come :)

The writeup makes me wonder what other Apple software uses SQLite under the hood. In particular if the iTunes library is backed by a SQLite file, I could actually use that for some future projects. Cursory inspection shows that it's hdf-compressed so I'll need to get past that first... anyway, nice task for after-work hours.


https://selectstarsql.com/ may help you for JOIN I recommend that your do it in its entirety even if you understand the first concepts


`join` isn't so bad -- you are connecting two tables based on a common value.

If you have `person` and `address`:

    create table person (
        id serial primary key,
        first_name text not null,
        last_name text not null,
        date_of_birth date not null
    );

    create table address (
        id serial primary key,
        person_id integer not null references person (id),
        street_number text not null,
        apartment_number text,
        city text not null,
        state text,
        postal_code text not null
    );
Then you'd join using `person.id` to `address.person_id`. An inner join hides rows from the joining table if there is no corresponding row in the joined-to table, while an outer join includes rows from the joining table if there are no corresponding rows in the joined-to table.

If we've got:

    insert into person (first_name, last_name, date_of_birth)
    values ('Brian', 'Kernighan', '1950-01-01'),  -- gets id 1
           ('Dennis', 'Ritchie', '1948-01-01');   -- gets id 2

    insert into address (person_id, street_number, city, postal_code)
    values (1, '123 Main St.', 'Berkeley', '94701');
Then these two queries will return one and two rows, respectively:

    -- an inner join, only returns `person` rows with a corresponding
    -- row in the `address` table
    select person.first_name,
           person.last_name,
           address.street_number,
           address.city,
           address.postal_code
    from person
      inner join address on person.id = address.person_id
    -- output:
    --  first_name | last_name | street_number |   city   | postal_code
    --  Brian      | Kernighan | 123 Main St.  | Berkeley | 94701



    -- a left outer join, returns all `person` rows, regardless of
    -- the existence of a corresponding row in the `address` table
    select person.first_name,
           person.last_name,
           address.street_number,
           address.city,
           address.postal_code
    from person
      left outer join address on person.id = address.person_id

    -- output:
    --  first_name | last_name | street_number |   city   | postal_code
    --  Brian      | Kernighan | 123 Main St.  | Berkeley | 94701
    --  Dennis     | Ritchie   |               |          |
One important thing to know is if you are joining to multiple tables, and there are multiple corresponding rows (i.e., a `favorite_food` table with three rows per person, and a `favorite_color` table with three rows per person), you will get _multiplicative_ data -- `n_favorite_foods * n_favorite_colors` rows will be returned per person! This is where things like `group by` and `distinct` come into play.

There are other types of joins, but these two are the usual suspects in most queries.


This was interesting and surprisingly fun too, especially the "rate your friendships bit" based on reply ratio.


For vanity measures, I'd like to see which of my friends thinks I'm the funniest and therefore gives me the most "Haha"s on my messages.


A similar thing: I've scraped the 'sms.db' from iPhone backups to create messages in my IMAP mailbox for years. It's nice to have all my textual correspondence in one place.


It's worth noting that if you have the "Messages in iCloud" feature turned on, the local sqlite database can't be relied on any more. Or, rather, you become able to access messages in the client that aren't actually stored on-disk, and querying the local database will only turn up messages received since you set up the particular computer you're using.


That seems not true for me. I have that setting enabled, and I last updated my computer November last year, and before that in 2019. Just checked my imessage db file and it goes back to 2014.

What you experienced sounds worrying though, I wouldn't want to lose that db on a potential computer upgrade. Do you have any more details on it or what you think happened?


Actually, I might be thinking of the ~/Library/Messages/Archive folder, which used to contain a readable backup of your messages and which stops updating with the iCloud sync.


iMessage is next to useless for me due to SMS spam from political fundraising groups that send texts from so many different numbers that manually blocking each phone number is practically impossible.

I want an app that will prowl my pool of SMS texts on my phone every couple of minutes and delete every SMS text that matches a regexp I give it, without sending all of my texts off-device.


I use TextKiller on iPhone to do that, but it might be sending all my unknown sender texts to them for analysis. The iPhone framework won’t allow a text filtering app access to all your texts, just Unknown Senders. https://developer.apple.com/documentation/sms_and_call_repor...

Unfortunately there doesn’t seem to be any solution for MMS image spam from politicians that follows the same pattern of fake numbers and not honoring a “STOP” response.


Stop donating to political entities and giving them your number.

I made the mistake of giving my real phone number to an organization giving a relative an award 8 years ago. (Had to buy tickets) I still get SMS spam as a result.


That isn't sufficient. I have never done that, yet I still receive the texts (oddly, they usually address my sister, and sometimes my mother). It's my understanding that voter registration lists in the US are public.


If you select a party affiliation when you register, There Will Be Spam.


Your family probably fell for some sort of scam where their contacts were skimmed.

Voter registration lists are public, your phone number is not.

I get texts from Trumpsters because I worked at a place with a public sector union and some ass gave the GAL to a anti-union group, which subsequently sold it to some other groups.


I had the same problem until I discovered the “Filter Unknown Senders” setting that appears about 80% of the way down the settings screen.


That doesn't block them though, you'll still receive their messages.


Time to switch to Android, we have all sorts of productive stuff like that.


Maybe it's just me, but I'm surprised that no one has posted the query that works on latest macOS, as it appears that the schema on my machine differs from the author's. The below works on my Monterey machine; you might want to add back the limit 50:

  select
     h.id as sender_name,
     m.text as message_body
  from
     message m
     join handle h on h.rowid = m.handle_id
  order by
     m.date;


Funny timing on that, I just spent some time last week doing a very basic "sentiment analysis" (basically emoji analysis) on the iMessage db. I was a bit surprised by the database layout, but it's pretty easy to get into and could make for some more interesting analysis and visualization like this.


TBH I'm surprised iMessage hasn't been reverse engineered more--it's a very popular app yet it seems like the only thing backing Apple's claim that it's secure and end-to-end encrypted is "trust me"


There has been some research around the APN protocol and iMessage in the past by NowSecure[0] and otherse (see links in the article). I would guess the usefullness of a reverse engineered variant is just too low given the requirement for a valid serial number for each account.

[0]: https://www.nowsecure.com/blog/2021/01/27/reverse-engineerin...


Or, people who do this sort of work don't want to encourage more people to stay in the walled garden of Apple.


I don't think that's the case. There are plenty of open source projects that open up Airdrop to devices other than Apple's.


Isn't it just a simple question of the ROI for sich a project? If you need an iPhone anyway why should you use an alternative app on another phone?

The only reasonable usecase I can identify are people writing iMessage bots/ spammers but those groups most likely wouldn't want to open-source their stuff.


For some pretty thorough reverse engineering, there's a Matrix bridge for iMessage [1] that's being used in production by Beeper [2].

[1] https://github.com/tulir/mautrix-imessage

[2] https://beeper.com/


Hmm so their schema contains both camelCase and snake_case ? ewww




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: