Experience: Among other things, I have developed APIs, contributed to frontend designs, provided database assistance, and helped with production deployment. I do my best to follow industry best practices and deliver clean, maintainable code.
I'm happy to act as a reference for Alex as well. I found him on HN in Nov of 2022 and have worked frequently with him over the last year on a number of projects.
While junior, Alex is very very smart and often worked without oversight while I was busy with other concerns.
For reference: I have 23 years of development experience and now working on building a startup. My details can be found below (and on my profile) if you want to discuss Alex further.
Thank you everyone. I've been contacted by some great people, and for the time being I'm no longer available for freelance. However, if you'd like to establish a contact for possible collaboration at a later date, please feel free to reach out and let's have a talk.
Super, this has inspired me to look more into using a keyboard to write on a Kindle on an external server, and now I have a much better writing set up.
I've found I enjoy writing on Kindle, despite how limited it was. I make notes by highlighting words in a book and writing them down in a note that was associated with the highlighted word. Clunky but it works.
Using an external server is much better though as it works with my keyboard, allows to keep the writing elsewhere. Very happy with it.
> We start by parsing documents into chunks. A sensible default is to chunk documents by token length, typically 1,500 to 3,000 tokens per chunk. However, I found that this didn’t work very well. A better approach might be to chunk by paragraphs (e.g., split on \n\n).
Hmm good insight there. I've done some experimenting formerly by chunk length and it's been pretty troublesome due to missing context.
You don't do a sliding window? That seems like the logical way to maintain context but allow look up by 'chunks'. Embed it, say, 3 paragraphs at a time, advancing 1 paragraph per embedding.
If you're concatenating after chunking , then the overlapping windows add quite a lot of repetition. Also, if it cuts off mid-json / mid-structured output then overlapping windows once again cause issues.
Define a custom recursive text splitter in langchain, and do chunking heuristically. It works a lot better.
That being said, it is useful to maintain some global and local context. But, I wouldn't use overlapping windows.
In place of simply concatenating after chunking, a more effective approach might be to retrieve and return the corresponding segments from the original documents that are relevant to the context. For instance, if we're dealing with short pieces of text such as Hacker News comments, it's fairly straightforward. Any partial match can prompt the return of the entire comment as it is.
When working with more extensive documents, the process gets a bit more intricate. In this case, your embedding database might need to hold more information per entry. Ideally, for each document, the database should store identifiers like the document ID, the starting token number, and the ending token number. This way, even if a document appears more than once among the top results from a query, it's possible to piece together the full relevant excerpt accurately.
I don't think the repetition is a problem. He's using a local model for human-assisted writing with pre-generated embeddings - he can use essentially an arbitrary number of embedding calls, as long as it's more useful for the human. So it's just a question of whether that improves the quality or not. (Not that the cost would be more than a rounding error to embed your typical personal wiki with something like the OA API, especially since they just dropped the prices of embeddings again.)
I've thought about doing this as well, but I haven't tried it yet. Are there any resources/blogs/information on various strategies on how to best chunk & embed arbitrary text?
I’ve been experimenting with sliding window chunking using SRT files. They’re the subtitle format for television and have 1 to _n_ sequence numbers for each chunk, along with time stamps for when the chunk should appear on the screen. Traditionally it’s two lines of text per chunk but you can make chunks of other line counts and sizes. Much of my work with this has been with SRT files that are transcriptions exported from Otter.ai; GPT-3.5 & 4 natively understand the SRT format and the concepts of the sequence numbers and time stamps, so you can refer to them or ask for confirmation of them in a prompt.
<center>
<b>notice</b>
<p>javascript required to view this site</p>
<b>why</b>
<p>measured improvement in server performance</p>
<p>awesome incremental search</p>
</center>
It does load faster now that it doesn't display anything.
That's a surprisingly great idea. A mobile phone can be used as a server, and for their capabilities, they are cheaper than Raspberry Pi when with some issues especially. Out of curiosity I just found an offer for a used Pixel 6 Pro for 70 EUR, supposedly only broken screen and the rest is working, where it has 12GB RAM with CPU Octa-core (copypasting: 2x2.80 GHz Cortex-X1 & 2x2.25 GHz Cortex-A76 & 4x1.80 GHz Cortex-A55), that's a fairly good offer.
Good question I did this once by plugging in a keyboard and mouse and using a combination of them to unlock the phone and enable onscreen dictation (meant for blind people, you move your mouse over the thing and it tells you what is under).
Mostly blind luck, clicking the windows button then typing out the unlock PIN code and enabling the voice detection via Ok Google I think although many things were tried.
Once the onscreen dictation was enabled we were able to navigate the phone by voice to do what we wanted (take all photos off it).
Edit: also some phones support video output via USB-C and then it is much easier. Unfortunately the one I was working on did not support that.
I'm also wondering this. I tried connecting a USB keyboard and a TV via HDMI but couldn't get the phone to unlock let alone the screen to show up on the TV.
Hi, I'm Alex Sheiko, a freelance developer with well over a year of experience working with companies worldwide.
Tech: Node.js, TypeScript, React, Next.JS, Sveltekit, PostgreSQL, Tailwind, Prisma, GraphQL, Git, Drizzle-kit, tRPC, etc.
Experience: Among other things, I have developed APIs, contributed to frontend designs, provided database assistance, and helped with production deployment. I do my best to follow industry best practices and deliver clean, maintainable code.
Languages: English, Ukrainian, Russian, German
Location: Ukraine (GMT +03:00)
Rate: 15 EUR/h
Linkedin: https://www.linkedin.com/in/oleksiisheiko
Github: https://github.com/Kelamir
Email: kelamir@protonmail.com