This isn’t “I want to believe”, this is “it would be irresponsible to not consider”.

One of many.

  • 0 Posts
  • 13 Comments
Joined 2 years ago
cake
Cake day: September 3rd, 2023

help-circle


  • Try asking one to write a sentence that ends with the letter “r”, or a poem that rhymes.

    They know words as black boxen with weights attached for how likely they are to appear in certain contexts. Prediction happens by comparing the chain of these boxes leading up to the current cursor and using weights and statistics to fill in the next box.

    They don’t understand that those words are made of letters unless they have been programmed to break each word down into its component letters/syllables. None of them have been programmed to do this because that increases the already astronomical compute and training costs.

    About a decade ago I played with an LLM whose markov chain did predictions based on what letter came next instead of what word came next (pretty easy modification of the base code). It was surprisingly comparably good at putting sentences and grammar together when working at the letter-scale. It also was horribly less efficient to train (which is saying something in comparison to word-level prediction LLMs) because it needs to consider many more units (letters vs words) leading up to the current one to maintain the same coherence. If the markov chain was looking at the past 10 words, a word-level prediction has 10 boxes to factor into its calculations and trainings. If those words have an average of 5 letters, then letter-level prediction needs to consider at least 50 boxes to maintain the same awareness of context within a sentence/paragraph. This is a five-fold increase in memory footprint, and an even greater increase in compute time (since most operations are at least of linear order and sometimes more).

    That efficiency hit would allow for LLMs to understand sub-word concepts like alphabetization, rhyming, root words, etc. The expense and energy requirements aren’t worth this modest expansion of understanding.

    Adding a General Purpose Transformer just adds some plasticity to those weights and statistics beyond the markov chain example I use above.







  • I love organic maps and openstreetmaps. The biggest thing missing is satellite view. I like to wander around and explore an area on maps before visiting. OSM has more interesting/relevant details and better visual color coding than the vector street map on google. Google has a satellite map, which is non-negotiable for me especially if I need to quickly orient myself while driving in a new place. I use three layers loaded into qgis for planning trips: OSM, google maps satellite, and a topographic map from USGS. I sometimes use organic maps on my phone if I don’t have access to a computer with qgis. I rely on Google while on location because organic maps lacks a satellite feed.