Postgres regular expressions and regexp_split_to_array
In postgresql, I need to extract the first two words in the value for a given column. So if the value is "hello world moon and stars" or "hello world moon" or even just "hello world", I need "hello world".
I was hoping to use regexp_split_to_array
but it doesn't seem that I can use this and access the elements returned in the same query?
Do I need to create a funct开发者_如何学JAVAion for what I'm trying to do?
I can't believe that 5 years ago and no one noticed that you can access elements from regexp_split_to_array function if you surround them with parenthesis.
I saw many people tried to access the elements of the table like this:
select regexp_split_to_array(my_field, E'my_pattern')[1] from my_table
The previous will return an error, but the following will not :
select (regexp_split_to_array(my_field, E'my_pattern'))[1] from my_table
You can use POSIX regular expressions with PostgreSQL's substring()
:
select substring('hello world moon' from E'^\\w+\\s+\\w+');
Or with a very liberal interpretation of what a word is:
select substring('it''s a nice day' from E'^\\S+\\s+\\S+');
Note the \S
(non-whitespace) instead of \w
("word" character, essentially alphanumeric plus underscore).
Don't forget all the extra quoting nonsense though:
- The
E''
to tell PostgreSQL that you're using extending escaping. - And then double backslashes to get single backslashes past the string parser and in to the regular expression parser.
If you really want to use regexp_split_to_array
, then you can but the above quoting issues apply and I think you'd want to slice off just the first two elements of the array:
select (regexp_split_to_array('hello world moon', E'\\s+'))[1:2];
I'd guess that the escaping was causing some confusion; I usually end up adding backslashes until it works and then I pick it apart until I understand why I needed the number of backslashes that I ended up using. Or maybe the extra parentheses and array slicing syntax was an issue (it was for me but a bit of experimentation sorted it out).
found one answer:
select split_part('hello world moon', ' ', 1) || ' ' || split_part('hello world moon', ' ', 2);
select substring(my_text from $$^\S+\s+\S+$$) from v;
substring
-------------
hello world
hello world
hello world
(3 rows)
where for the purpose of demonstration, v is:
create view v as select 'hello world moon and stars' as my_text union all
select 'hello world mood' union all
select 'hello world';
if you want to ignore whitespace at the beginning:
select substring(my_text from $$^\s*\S+\s+\S+$$) from v;
精彩评论