javascript remove words less than 3 characters
I am tired to remove all the words less than 3 characters, like in, on ,the...
.
My code not work for me, Uncaught TypeError: Object ... has no method 'replace'
ask for a help.
var str = 'Pro开发者_开发技巧in néc turpis eget dolor dictǔm lacínia. Nullam nǔnc magna, tincidunt eǔ porta in, faucibus sèd magna. Suspendisse laoreet ornare ullamcorper. Nulla in tortòr nibh. Pellentesque sèd est vitae odio vestibulum aliquet in nec leo.';
var newstr = str.split(" ").replace(/(\b(\w{1,3})\b(\s|$))/g,'');
alert(newstr);
You need to change the order of split
and replace
:
var newstr = str.replace(/(\b(\w{1,3})\b(\s|$))/g,'').split(" ");
Otherwise, you end up calling replace
on an array, which does not have this method.
See it in action.
Note: Your current regex does not correctly handle the case where a "short" word is immediately followed by a punctuation character. You can change it slightly to do that:
/(\b(\w{1,3})\b(\W|$))/g
^^
Apart from that, you also have to take care of the fact that the resulting array may contain empty strings (because deleting consecutive short words separated by spaces will end up leaving consecutive spaces in the string before it's split). So you might also want to change how you split
. All of this gives us:
var newstr = str.replace(/(\b(\w{1,3})\b(\W|$))/g,'').split(/\s+/);
See it in action.
Update: As Ray Toal correctly points out in a comment, in JavaScript regexes \w
does not match non-ASCII characters (e.g. characters with accents). This means that the above regexes will not work correctly (they will work correctly on certain other flavors of regex). Unfortunately, there is no convenient way around that and you will have to replace \w
with a character group such as [a-zA-Zéǔí]
, and do the converse for \W
.
Update:
Ugh, doing this in JavaScript regex is not easy. I came up with this regex:
([^ǔa-z\u00C0-\u017E]([ǔa-z\u00C0-\u017E]{1,3})(?=[^ǔa-z\u00C0-\u017E]|$))
...which I still don't like because I had to manually include the ǔ
in there.
See it in action.
Try this:
str = str.split( ' ' ).filter(function ( str ) {
var word = str.match(/(\w+)/);
return word && word[0].length > 3;
}).join( ' ' );
Live demo: http://jsfiddle.net/sTfEs/1/
var words = str.split(" "); //Turns the string into an array of words
var longWords = []; //Initialize array
for(var i = 0; i<words.length; i++){
if(words[i].length > 3) {
longWords.push(words[i]);
}
}
var newString = longWords.join(" "); //Create a new string of the words separated by spaces.
str.split(" ")
returns an array, which does not have a replace method.
Secondly, you probably don't use regexes for this. JavaScript does not have good support for non-ASCII letters in regexes. See Regular expression to match non-English characters?. If you need to use a regex, there are hints in there.
And BTW, in all regex flavors, \w{1,3}
DOES NOT match "néc"
As you probably know, \w
is [A-Za-z_]
. See http://jsfiddle.net/3YWSC/ for an example.
Are you only trying to match words of non-spaces? Or are you looking to for words of three or less letters only? On the one hand you split across spaces, but on the other you used \w
. I would go with something like Dennis's answer.
Try
var newstr = str.replace(/(\b(\w{1,3})\b(\s|$))/g,'').split(" ");
Using lodash with less then 20 chars:
let a = ['la','rivière','et','le','lapin','sont','dans','le','près'];
a = _.remove(_.uniq(a),n=>_.size(n)>3); // ['rivière','lapin','sont','dans','près']
Using The filter method
let sentence = "Proin néc turpis eget dolor dictǔm lacínia. Nullam nǔnc magna, tincidunt eǔ porta in, faucibus sèd magna. Suspendisse laoreet ornare ullamcorper. Nulla in tortòr nibh. Pellentesque sèd est vitae odio vestibulum aliquet in nec leo .";
let sent = sentence.split(" ").filter((ele) => ele.length > 3).join(" ");
console.log(sent);
精彩评论