开发者

Query data from within a string or a serialized array

I need to make a query from a column that contains a strings that looks like this: Language1, Language2, Language3.

I can not really create a new column for each of the language as their is an undefined number of them.

So i am wondering what would be the most performant / elegant way to query data from within that string?

Should i use serialized data ? What would a query that retrieves all rows that matches lets say "Language2"?

EDIT This table holds an index of criteria that i开发者_开发问答 use to search users from my site. Language is one of the criteria, so will be city, country and a bunch of others. Problem is that each user can have an undefined number of languages.

Second Edit Here is a view of my table as it is right now:

Query data from within a string or a serialized array


I was looking for a solution for the same scenario on querying data within a serialised array and found that using SUBSTRING_INDEX() function does the job. An example below:

| language             |
------------------------
| a:1:{i:1;s:3:"183";} |

to get the digit '183' (without the quotes) the query would be something along the lines of:

SELECT TRIM(BOTH '"' FROM SUBSTRING_INDEX(SUBSTRING_INDEX(language,';',2),':',-1)) AS language_id
FROM language_table;

And it should give you a result of the following below without the quotes:

| language_id          |
------------------------
| 183                  |


No, you should normalize your table schema. Yes, you can even cope with the undefined number of values.

Make a separate table that will hold the relationship references.

table_languages

ref_id | language

Sample data would be:

user_id | language
-------------------
   1   | Language 1
   1   | Language 2
   1   | Language 3
   2   | Language 2

Sample query would be:

    SELECT *
      FROM users u
INNER JOIN languages l
        ON u.user_id = l.user_id
       AND l.language = 'language2'
     WHERE u.country = 'abc'
       AND u.city = 'def';


Although Shef and ajreal are correct, in that it'd be better from a design point of view to normalize the tables, to answer your actual question, you can just use LIKE :

SELECT * FROM personnel WHERE languages LIKE '%English%';

You might run into problems, however, if the languages are free-text, as someone might have entered 'english' or some other capitalization, so then you need:

SELECT * FROM personnel WHERE UPPER(languages) LIKE '%ENGLISH%';

But as a general rule, these get messy when you have valid values that are substrings for other valid values. To get around this, you have to serialize the value with the delimiter at the beginning and end of each serialized string:

;value3;value4;value12;

This allows you to search for

LIKE '%;value1;%'

without having it match value11 or value12.

Sometimes, you actually want the substring matching behavior, for example, if someone answered British English, you might want English to match. (although, if they enter British, you're S.O.L) ... but if you're getting into these issues, you likely want to define a hierarchy or use an encoding, as you can't depend on substrings working.


Don't do this.
One example for normalization

create table language
(
  id smallint(10) unsigned not null default 0,
  name varchar(255) not null,
  primary key(id),
  key (name)
);

-- assuming your table is named as data
create table data
(
  id int(10) not null default 0,
  -- plus any of your existing column definition
  key(id)
);

-- build a relation table to tied knot between language and data
create table data_language
(
  data_id int(10) not null not null default 0,
  language_id smallint(10) unsigned not null default 0,
  key (data_id, language_id)
);

-- when select
select data.id, language.id, language.name
from data
inner join (data_language, language)
on data.id=data_language.data_id and language.id=data_language.language_id
where language.name='English'
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜