GaryS Posted February 16, 2016 Share Posted February 16, 2016 At first glance, this looks to be an issue with character encoding. The old table uses Latin encoding and the new uses UTF-8, and I noticed some dodgy characters in some of the new posts that may have been line breaks in the old... I'll have a play later and see if I can get anything useful to happen. Quote Link to comment Share on other sites More sharing options...
staff0rd Posted February 16, 2016 Share Posted February 16, 2016 Lucky you have that backup, it looks like the conversion was process was indeed destructive - there are characters stripped from the new one. Quote Link to comment Share on other sites More sharing options...
staff0rd Posted February 17, 2016 Share Posted February 17, 2016 On 16/02/2016 at 11:30 PM, rich said: Feel free to see what you can do with this - I will need a SQL script (or php script is ok) back from anyone who wants to take this on, as obviously sending me the SQL back is useless. Because the upgrade process was destructive, I've written a script that will just replace <pre> blocks in the new version with <pre> blocks from the old version. Please excuse the php, haven't written that language in 15+ years. https://gist.github.com/staff0rd/56ae30ae26888377ef4e Notes; The first sql statement is longer that it needs to be because originally I was counting the occurences of <pre> blocks per post at the SQL level. The script will exit the moment it sees any post where the new version of the post does not have an exact match of <pre> blocks to the old version. The first of which occurs in the dataset on this post due to the author editing the post after original backup is made. This then implies, that the script should not actually be run against any post edited after the backup was produced as it does not check the contents of <pre> blocks and could revert edits. As such, the initial SQL SELECT should be updated to include a where clause that limits the output to posts created/edited prior to the original backup. Quote Link to comment Share on other sites More sharing options...
staff0rd Posted February 17, 2016 Share Posted February 17, 2016 (edited) A couple of additional comments on that script; <moved into post above> Edited March 6, 2016 by staff0rd moved notes Quote Link to comment Share on other sites More sharing options...
GaryS Posted February 24, 2016 Share Posted February 24, 2016 So I've had a dig into the data and it seems to me that there simply arent any line breaks in the code blocks - in either the old or the new versions of the data. There may be a caveat here: The dump contains both the new and the old tables in a single file. That file is necessarily saved as UTF-8 to accomodate the new table, however the old table is set to latin1_swedish_ci. It's not infeasible that this messes with the characters in the old table - however, I think that's a red herring. In the old table, code blocks are marked up with a class of '_prettyXprint', vs the standard 'prettyprint' in the new table. I scoured the internetz for reference to that class, but came up blank. I'm wondering if perhaps the old class was used by the previous version of the forum software, to perhaps run through a modified version of PrettyPrint that somehow included 'beautification' - though again, that's a long shot! In any case, the situation as it stands is that I can't find any line breaks to work with... However, I've tried running a few sample posts through a javascript beautifier and it seems capable of re-inserting line breaks at sensible points. I may be able to write a script that would loop through all the posts, pick out code blocks and run them through a server side JS beautifier, saving the result back into the table... I could then either provide a dump of the data for reimport, a script containing a butt-load of 'UPDATE' statements, or I could give access to my own database server and you could write a script to join the tables and update that way. Would any of this be acceptible @rich? It's worth noting, I'm a ColdFusion developer... all the server side beautifiers seem to be written in NodeJS or Python. I'm sure I'll be able to get something working, but someone with some NodeFu or PythonPower may be able to get this sorted with considerably less pissing about. Quote Link to comment Share on other sites More sharing options...
staff0rd Posted February 24, 2016 Share Posted February 24, 2016 @GaryS how are you checking for new lines? There are definitely new lines in the old tables that have been stripped from the new tables... Check pid 91887. If you are able to run the php script I posted with that ID, it will print to the console the old post with linebreaks, the new post without, and the fix. Quote Link to comment Share on other sites More sharing options...
GaryS Posted February 24, 2016 Share Posted February 24, 2016 Hmm, yes... you're quite right! It seems that there are new lines in some of the posts, but not in others... To be clear, there are line breaks in all the posts, but not all the code blocks. Take a look at pid 96 for instance. I'd still advocate running each post through a server-side beautifier - firstly it'll put line breaks in where there aren't any, and secondly it should have a good go at indenting. There's still the problem of comments... the beautifiers have no way of knowing where a comment ends and code begins (in the case where line breaks have not survived) It may be worth running through the script a few times with some string matching to catch obvious cases - such as comments containing the characters: 'function(', etc. Quote Link to comment Share on other sites More sharing options...
staff0rd Posted February 25, 2016 Share Posted February 25, 2016 I see. pid 1016 in that same thread also supports your findings. Perhaps @rich updated multiple times during the forum's lifetime and earlier posts were stripped prior to the backup received. My script does not fix these earlier ones (it has no affect on them), only the later ones where the backup retains the new lines. Quote Link to comment Share on other sites More sharing options...
staff0rd Posted February 25, 2016 Share Posted February 25, 2016 @GaryS actually, the following query looks like it limits the posts without newlines in the old data to only 55 of 14181. Includes the pids above. select pid from orig_posts where post like '%<pre%</pre>%' and post not like '%<pre%\n%</pre>%'; Quote Link to comment Share on other sites More sharing options...
AzraelTycka Posted March 2, 2016 Share Posted March 2, 2016 Hello, I don't know if it's a known issue, so just to be sure I just noticed that when a code is a bit wider the background color gets cut when you scroll to the right - at least the latest firefox on windows 10 I'm currently using. Illustration below taken from this post. Not that it's serious though :-). Quote Link to comment Share on other sites More sharing options...
moue Posted March 4, 2016 Share Posted March 4, 2016 (edited) Are code blocks supposed to look so weird? This is what every code block that I've encountered on this site looks like, and it's really hard to read. I'm on Chrome Version 48.0.2564.116 (64-bit) on OSX El Capitan. Source Edited March 4, 2016 by moue Added browser version. Quote Link to comment Share on other sites More sharing options...
mattstyles Posted March 5, 2016 Share Posted March 5, 2016 On 3/4/2016 at 8:51 PM, moue said: Are code blocks supposed to look so weird? This is what every code block that I've encountered on this site looks like, and it's really hard to read. I'm on Chrome Version 48.0.2564.116 (64-bit) on OSX El Capitan. Source I'm on the same platform, most code blocks look fine, there are a few like that but I just assumed that was user error copying and pasting from some system that mucked with line-endings and/or tabs/spaces yada yada yada Quote Link to comment Share on other sites More sharing options...
staff0rd Posted March 6, 2016 Share Posted March 6, 2016 @moue @mattstyles a large percentage of the code blocks have had newlines stripped from them due to the upgrade (per previous posts in this thread). Thats why the syntax highlighting looks broken. mattstyles 1 Quote Link to comment Share on other sites More sharing options...
staff0rd Posted March 6, 2016 Share Posted March 6, 2016 @rich Any thoughts on whether that script will resolve it? Quote Link to comment Share on other sites More sharing options...
rich Posted April 5, 2016 Author Share Posted April 5, 2016 Like what exactly? I'm not wasting any time messing around with the default template, and that is exactly what the forum uses.. If they update it, I'll update it. Quote Link to comment Share on other sites More sharing options...
joemi Posted May 20, 2016 Share Posted May 20, 2016 I don't want to sound annoying, but is the newlines being stripped issue getting fixed? Seems like every thread I'm trying to read has this issue, making all the code examples so hard to read. This site seems like it'd be such a great resource if the code examples weren't like that. Quote Link to comment Share on other sites More sharing options...
BdR Posted May 31, 2016 Share Posted May 31, 2016 When you make a new post and insert a code snippet, the default language is set to <not set>. Is it possible to make the default selection to "JavaScript"? This forum is mostly aimed at HTML5 development, and Phaser in particular, so that would make more sense. mattstyles 1 Quote Link to comment Share on other sites More sharing options...
andrii.barvynko Posted August 1, 2016 Share Posted August 1, 2016 So, a long time passed but code in some topics still broken. Is it possible to fix that? Very uncomfortable... Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.